When to Trust the Cheap Check: Weak and Strong Verification for Reasoning
Framework distinguishing weak verification (self-consistency, proxy rewards) from strong verification (human inspection) in LLM reasoning loops.
Framework distinguishing weak verification (self-consistency, proxy rewards) from strong verification (human inspection) in LLM reasoning loops.
Reverso foundation model for efficient zero-shot time series forecasting across diverse domains.
FAMOSE framework uses ReAct paradigm with LLM agents for autonomous feature engineering in tabular data.
Deep learning system using YOLOx for automated e-waste classification and material sorting.
Black-box adversarial attacks on large vision-language models using fine-grained detail targeting to overcome gradient-free optimization challenges.
Framework for multi-round human-AI collaboration using counterfactual harm and complementarity principles to ensure conversational AI reliably improves decision quality.
Margin-aware reward modeling framework with self-refinement for RLHF/RLAIF alignment pipelines, reducing reliance on human preference data through augmentation.
Efficient KV cache prefetching for LLM inference using GPU-native media ASICs, addressing bandwidth limitations in remote cache reuse scenarios.
Speech to Speech Synthesis Network for voice style transfer and impersonation combining speech recognition and synthesis.
MobCache framework using LLMs for scalable large-scale human mobility simulation with caching optimization.
Study showing AI safety datasets overrely on obvious triggering cues and fail to reflect realistic adversarial attacks.
SEMAS self-evolving multi-agent system for industrial IoT predictive maintenance with real-time anomaly detection.
Empirical study of adversarial code comments manipulating LLM vulnerability detection across Python, JavaScript, and Java.
Lightweight federated learning with attention mechanism for tomato disease recognition on edge devices.
Large-scale deanonymization attack using LLM agents with internet access to re-identify pseudonymous online profiles.
Using reference-guided LLM-evaluators as soft verifiers to improve LLM alignment in non-verifiable domains.
Comparison of simple baselines against code evolution techniques across mathematical bounds, agentic scaffolds, and ML competitions.
Hybrid-Gym environment for training coding agents on diverse software engineering tasks beyond single GitHub issues.
Double Machine Learning framework to estimate causal impact of football formations on match outcomes.
NeST selective neuron tuning approach for parameter-efficient LLM safety alignment without full fine-tuning overhead.
MALLVi multi-agent framework combining LLMs and vision for closed-loop robotic manipulation with environmental feedback.
LLM-WikiRace benchmark for evaluating long-term planning, reasoning, and knowledge navigation in language models over Wikipedia.
Multi-objective optimization and quantum hybridization of Allegro interatomic potential model for molecular property prediction.
Transformer architecture applied to longitudinal cohort data modeling with attention mechanisms for temporal dependencies.
Dynamic joint assortment and pricing optimization with decision-dependent customer arrivals using bandit algorithms.
DeepContext framework for stateful monitoring of multi-turn LLM conversations to detect adversarial intent drift and bypass safety guardrails.
BrainRVQ: EEG foundation model using dual-domain residual quantization and hierarchical autoregression for brain signal reconstruction.
LLM4Cov: offline agent learning framework for hardware verification testbench generation using execution-aware LLM agents without online reinforcement learning.
Phantom: automated agent hijacking attack via structural template injection, bypassing LLM safety measures with higher success rates and transferability.
Greedy multi-path verification algorithm accelerating speculative decoding by optimizing token acceptance in draft models.
Hybrid quantum-classical system combining variational quantum circuits, QUBO optimization, and post-quantum cryptography for finance.
PRIMO model quantifying predictive importance of modalities in multimodal LLMs when data is incomplete or asynchronous.
Neural ranking system for personalized exercise recommendation addressing long-tailed engagement distribution.
Cross-lingual text classification methods for analyzing multilingual social media discourse across multiple languages.
Analysis of Thompson Sampling performance in Bayesian reinforcement learning under model misspecification.
Comparison of deep reinforcement learning versus mean-variance optimization for portfolio allocation.
Streamlined spectral algorithm for community detection in stochastic block models with reduced preprocessing steps.
Theoretical analysis of when and why graph neural networks succeed in semi-supervised node regression tasks.
Domain generalization approach leveraging unlabeled data in anti-causal settings where outcomes cause observed features.
Diffusion-based method for generating high-dimensional samples from moment constraints with maximum entropy guarantees.
Privacy-preserving mechanism enabling verification that LLM inference providers run the correct model without replacing it with a weaker one.
Framework for detecting temporal contamination in LLM backtesting by identifying post-cutoff knowledge leakage during training.
Quantum generative modeling approach using fixed entangling unitaries with optimized single-qubit rotations.
Analysis of representation collapse in Transformer-based neural machine translation models using angular dispersion metrics.
AI agent for medical diagnosis that uses LLMs to ask follow-up questions and reason over differential diagnoses iteratively.
Survey of open datasets used in learning analytics, educational data mining, and AI in education research.
Spectral method for discovering novel categories in unlabeled data using cross-modal representation learning.
Taxonomy for uncertainty quantification in long-form LLM outputs to detect hallucinations, addressing limitations of existing short-form methods.
Study of position and label biases in LLM multiple-choice question answering via synthetic benchmark evaluation.
Learning-based robotic camera system for autonomous cinematic motion control using visuomotor control.