Residual Connections and the Causal Shift: Uncovering a Structural Misalignment in Transformers
Analysis of structural misalignment in Transformers between residual connections and causal masking in next-token prediction.
Analysis of structural misalignment in Transformers between residual connections and causal masking in next-token prediction.
Theoretical framework for meta-learning defining practical universality and distinguishing algorithm-implicit learning approaches.
Evaluation of reasoning-oriented LLMs on machine translation showing explicit reasoning degrades translation quality.
Multi-agent LLM system for comedy writing with community discussion feedback stored as social memory affecting output quality.
Object tracking model using joint-embedding predictive architecture with occlusion handling and model adaptation.
Geometric analysis of hallucinations in small-sized LLMs through embedding space clustering in multi-step and agentic settings.
Analysis of cybercriminal discussions about AI adoption from cyber threat intelligence forum data.
Framework for referring image segmentation using visual attention mechanisms to exploit context for fine-grained object segmentation.
Vision model study on scanpath metrics and center bias in hard-attention models using gaze tracking datasets.
Atomix: Runtime providing transactional semantics for LLM agent tool calls with epoch tagging and safe rollback mechanisms for reliable agentic workflows.
Theoretical analysis of temperature scaling properties for controlling uncertainty in probabilistic models and LLM stochasticity.
Goldilocks RL uses adaptive curriculum learning to optimize task difficulty and improve sample efficiency in reasoning model training.
Theoretical analysis of RLVR training dynamics explaining how outcome-based rewards enable long-horizon reasoning in transformers.
CT-Bench multimodal dataset with 20,335 lesions from CT studies for training AI models on lesion understanding and report generation.
Neural network framework for exploring shape functionals and Blaschke-Santaló diagrams in convex geometry optimization.
Neural process-based method for selecting specialized models as tools in agentic healthcare systems for multi-task clinical queries.
BFS-PO RL algorithm optimizes inference efficiency in large reasoning models by reducing overthinking and computational costs.
BHyGNN+ unsupervised representation learning approach for heterophilic hypergraph neural networks.
AnchorWeave method for maintaining spatial consistency in long-horizon camera-controllable video generation using local spatial memories.
PhyScensis uses LLM agents with physics reasoning to generate realistic 3D scene arrangements for robotic simulation data collection.
ThermEval benchmark for evaluating vision-language models on thermal imagery for applications like surveillance and autonomous driving.
Spectral convolution techniques for geometric deep learning on non-Euclidean data structures like graphs and manifolds.
Cold-start personalization method using structured world models and RL to infer user preferences with limited interaction budget.
Research on diffusion models using canonicalization to handle symmetries in molecular graph generation tasks.
PAPerBench benchmark studies how context length in LLMs affects privacy leakage and personalization quality across large-scale evaluation.
Study on game-playing weak neural networks under fixed-scale quantization, proving representational barriers for impartial game mastery.
Framework for learning enriched trajectory representations enabling AI agents to make better decisions across different domains and tasks.
RV-Syn: data synthesis method for generating high-quality mathematical reasoning data using structured function libraries for LLM training.
Decompositional study analyzing which factors impede LLM performance on counterfactual reasoning tasks and generalizing reasoning capabilities.
Benchmark evaluating persuasion capabilities of frontier LLMs on harmful topics, assessing model propensity for harmful persuasion attempts.
CoT compression framework using step entropy metrics to reduce redundancy in LLM chain-of-thought reasoning and inference costs.
Using LLMs as oracles for ontology alignment with human-in-the-loop approaches to improve mapping quality for large ontologies.
Analysis of planning capabilities in decoder-only language models, examining horizon and branch awareness in transformer architectures.
GuidedSampling: inference-time algorithm steering LLMs to generate diverse candidate solutions, improving performance on complex tasks.
SAFER method for risk-constrained sampling in LLMs to ensure trustworthy outputs in risk-sensitive applications like question answering.
OmniVideoBench: evaluation benchmark for multimodal LLMs on audio-visual understanding tasks with comprehensive synergistic reasoning assessment.
ParaCook benchmark for evaluating time-efficient collaborative planning in multi-agent systems using LLMs for long-horizon reasoning.
Agentic framework using LLMs to solve complex vehicle routing problems with autonomous decision-making and improved solution feasibility.
AlphaOPT uses LLMs with self-improving experience libraries to automate optimization problem formulation from natural language into mathematical models and solver code.
HCLA: human-centered multi-agent system for anomaly detection in digital asset transactions using conversational workflow.
Dataforge: LLM-powered agentic platform for autonomous data engineering including cleaning, normalization, and feature engineering.
AgenticSciML: multi-agent system with 10+ specialized agents for automated design of scientific machine learning architectures.
ARCTraj: dataset of human reasoning trajectories on abstract visual reasoning tasks with temporal action sequences.
Method for measuring representativeness of scenario datasets for autonomous vehicle testing and safety assurance.
Three-stage framework for synthesizing and selecting long chain-of-thought training data for multimodal large reasoning models.
Recontextualization technique reduces specification gaming in language models without modifying training signals.
LLM agent system that extracts causal feedback fuzzy cognitive maps from text with adaptive structure modification.
Qualitative evaluation of diverse behavior planning approaches across stories, cities, and game domains.
Perspective on explainable AI combined with causal reasoning for extracting insights from foundation models.
Analysis of regulatory gaps in frontier AI deployment, focusing on internal company uses versus external deployment.