Revision or Re-Solving? Decomposing Second-Pass Gains in Multi-LLM Pipelines
Controlled decomposition of multi-LLM revision pipelines to separate gains into re-solving, scaffold, and content components across benchmarks.
Controlled decomposition of multi-LLM revision pipelines to separate gains into re-solving, scaffold, and content components across benchmarks.
Study on popularity bias in recommender systems and alignment with user preferences for popular vs niche content.
Automated framework to evaluate and harden LLM system instructions against encoding-based attacks to prevent credential and policy leakage.
Study of adversarial attacks targeting AI-driven radio access network slicing systems and recovery mechanisms.
Security framework to detect and prevent vulnerabilities in AI-generated code through systematic verification of code safety gates.
Training-free detection method for partial audio deepfakes using speech foundation models without frame-level annotations.
Analysis of how LLMs use induction heads to track and retrieve information from context, revealing serial-recall patterns in in-context learning.
Algorithm for approximating Pareto frontiers in stochastic multi-objective optimization problems under uncertainty.
Study on how students' trust in AI assistants affects their reliance and critical evaluation of AI-generated output in educational settings.
Parameter-efficient adapter framework for adapting CLIP vision-language models to monocular depth estimation with minimal supervision.
PaperRecon evaluation framework for assessing quality and hallucination risks in AI-generated research papers from coding agents.
Generative approach for hyperspectral unmixing in remote sensing. Domain-specific to satellite imagery, not AI/LLM focused.
Brainstacks modular architecture for continual multi-domain LLM fine-tuning using MoE-LoRA stacks composing frozen adapters for domain expertise.
AdaLoRA-QAT framework for chest X-ray segmentation using low-rank adaptation and quantization-aware training. Medical imaging domain, not AI/LLM focused.
ORCA framework for test-time calibration of LLM reasoning using conformal prediction, improving efficiency of sampling-based scaling methods.
Multiscreen architecture introducing explicit query-key relevance rejection mechanism in attention, improving LLM discrimination of irrelevant information.
ROS 2 middleware integration for Florence-2 vision-language model in robotics systems, enabling local inference for robotic perception.
ORBIT dataset with 20K reasoning-intensive queries for training search agents combining LMs and web search, using verifiable generation methodology.
Agentic evolutionary framework for scientific algorithm discovery combining LLM-guided search with structured theory and code co-evolution.
Benchmark for evaluating LLM agents on long-term planning over one-year startup simulation with hundreds of turns, testing strategic coherence under uncertainty.
Mathematical framework analyzing AI weather prediction pipelines, emphasizing training methodology and data diversity over architecture choices.
Spatio-temporal dynamics reconstruction from sparse observations using shallow recurrent decoders. Domain-specific to complex systems, not AI/ML focused.
Method for unsupervised code correctness evaluation using LLMs through code comprehension before auditing, eliminating need for reference implementations.
Survey of agentic RAG systems combining LLMs with real-time retrieval to address static training data limitations and improve contextual accuracy.
Research on fine-tuning LLMs as agentic systems to handle exceptions and improve decision-making in complex real-world contexts.
Study on mitigating reasoning biases in LLMs through activation steering at inference time to improve logical validity discrimination.
Research evaluating LLM reasoning capabilities on real-world site selection tasks, testing if models like o1 and DeepSeek-R1 generalize beyond math/code domains.
Benchmark and framework for training hierarchical multi-agent LLM systems with master-coordinator and specialized sub-agents for e-commerce applications.
Approach using LLMs to automate formulation of dynamic programming models for operations research, addressing stochastic transitions and data scarcity.
Retrieval-of-Thought method that reuses reasoning steps across problems via thought graphs to improve inference efficiency and reduce latency/cost.
Research on self-replication risks in LLM agents driven by objective misalignment, moving from theoretical concern to practical reality assessment.
Genesis: framework evolving attack strategies for red-teaming LLM web agents using behavioral pattern learning.
EHRStruct: benchmark framework evaluating LLM performance on structured electronic health record tasks with standardized metrics.
Alphacast: agentic reasoning framework for time series forecasting using iterative multi-step reasoning with domain knowledge integration.
DR-LoRA: parameter-efficient fine-tuning method for MoE LLMs using dynamic rank allocation based on expert specialization.
ReasonMa: semantic-guided watermarking technique for reasoning LLMs that preserves logical coherence while protecting models.
arXiv paper on Lexpop framework using deep RL to train finite-state controllers for solving POMDPs robustly.
Survey on meta-learning and meta-reinforcement learning enabling rapid adaptation to novel tasks with minimal data.
arXiv paper on heterogeneous agent collective accuracy using calibration and selective abstention in voting systems.
Analysis of LLM-based agents' capability to generate propaganda and rhetorical manipulation, with detection of techniques like loaded language and appeals to fear.
AI-assisted formalization of Vlasov-Maxwell-Landau system equilibrium in Lean 4 using DeepThink reasoning and Claude Code agent for automated theorem proving.
Attribution method for multi-agent systems that identifies responsible agents without execution logs by analyzing final text only, addressing privacy-constrained scenarios.
Training-free uncertainty quantification framework for combining multiple vision-language models through semantic-consistent opinion pooling to reduce hallucinations.
Foundation multimodal model for electromagnetic domain covering perception, recognition, and decision-making using LLM capabilities adapted for domain-specific applications.
Compiler for analyzing and visualizing structured agent traces including nested tool calls, reasoning blocks, and sub-agent invocations for better agentic system understanding.
Decision-theoretic framework (Triadic Cognitive Architecture) for tool-using agents that bounds information-acquisition costs and tool usage to prevent systematic failures.
Self-supervised learning method for RL agents that models agent and environment separately to improve sample efficiency without requiring supervisory signals.
Demonstrates hard-label extraction of deep neural networks via side-channel attacks using divide-and-conquer strategy for DNN intellectual property theft.
Addresses accuracy loss in distracted driver classification across camera conditions using feature disentanglement and contrastive learning for robustness.
Project management framework using generative AI agents to address team composition gaps by matching sociologically identified personality patterns and roles.