Modeling Human Behavior in a Strategic Network Game with Complex Group Dynamics
Compares methods for learning behavioral models in a strategic network game to understand human network dynamics.
Compares methods for learning behavioral models in a strategic network game to understand human network dynamics.
PLAICraft is a large-scale, multi-modal, time-aligned dataset of Minecraft interactions for training embodied AI agents with vision, speech, and action.
VerifyBench benchmarks reference-based reward systems used in reinforcement learning training of reasoning models like o1 and DeepSeek-R1.
WINA is a training-free sparse activation method for accelerating LLM inference by selectively activating neurons based on weights.
XENON is an LLM-based agent that algorithmically corrects flawed knowledge through experience-based learning for long-horizon planning in Minecraft.
FreqPolicy accelerates flow-based visuomotor policies for robotic manipulation using frequency consistency for real-time inference.
DiffusionBlocks enables block-wise neural network training via diffusion interpretation to reduce memory bottlenecks in transformers.
Studies optimal ordering of chain-of-thought reasoning steps in transformers for arithmetic and multi-step reasoning tasks.
Theoretical analysis of graph transformers' expressive power using logic, covering both real numbers and floating-point settings.
Model-agnostic dynamic feature selection method with uncertainty quantification for budget-constrained decision-making scenarios.
MedReasoner uses reinforcement learning to ground clinical reasoning to pixel-level regions in medical images via multimodal LLMs.
Pinet introduces an output layer using orthogonal projections to enforce convex constraints in neural networks during training and inference.
FairTabGen uses LLMs to generate high-quality synthetic tabular healthcare data with fairness constraints from limited samples.
COGITAO is a benchmark framework for studying compositionality and generalization in visual reasoning tasks, inspired by ARC-AGI.
Empirical study of pre-trained model reuse and integration in open-source projects, defining Software Dependencies 2.0.
PolicyPad system supporting collaborative policy design for LLMs in high-stakes domains via rapid prototyping and iteration.
FeatBench evaluates LLM code generation for realistic repository-level feature implementation with minimal data leakage.
Training re-evaluation curves diagnostic enabling better data curriculum design by characterizing batch retention across LLM training.
Analysis of expert routing patterns in multilingual Mixture-of-Experts LLMs revealing language-specific dynamics across layers.
StarEmbed benchmark for evaluating time series foundation models on irregular astronomical observations of variable stars.
Technique for reducing LLM vocabulary size losslessly to improve auto-regressive text generation efficiency without performance loss.
Dynamic Gaussian Splatting method incorporating uncertainty for monocular 4D scene reconstruction under occlusion.
Computational model explaining interaction between semantic and episodic memory for learning and recall in cognitive science.
CreativityPrism framework for holistic, scalable evaluation of LLM creativity across diverse scenarios without heavy human involvement.
Cluster-PFN uses Transformers for unsupervised Bayesian clustering with uncertainty quantification, handling missing values.
Q3R regularizer enabling parameter-efficient low-rank training and pre-training for large deep learning models.
Method for enforcing instruction hierarchy in LLMs to handle competing directives from multiple sources for reliable decision-making.
LOCA framework enabling AI agents to solve Olympiad-level physics problems via logical chain decomposition and verification.
Language-Guided Invariance Probing benchmark evaluating vision-language model robustness to paraphrases and semantic changes on 40k images.
Bayesian optimization approach for beam alignment in intelligent indoor wireless environments under mobility constraints.
Geometric analysis of Mixture-of-Experts architectures using Jacobian-PCA spectral methods to understand routing and function geometry.
StableQAT framework for stable quantization-aware training of large models at ultra-low bitwidths for efficient deployment.
Graph transformer architecture with cardinality-preserving attention for molecular property prediction in drug discovery.
Privacy risks of vision-language models inferring sensitive locations from photos with street-level precision.
Protean Compiler framework using machine learning to optimize compiler phase ordering, addressing long-standing optimization problem with agile fine-grain approach.
Vision-language models for autonomous vehicle safety assessment and planning, integrating VLM representations into perception, prediction, and planning pipelines.
Analysis of whether LLM self-referential language reflects internal computation or confabulation via Pull Methodology tracking vocabulary-activation correspondence.
Virtual platform for controlled experimentation in social media environments to study communication and opinion formation dynamics with multiple simultaneous participants.
Vision-Language-Action models improved via test-time verification scaling to reduce intention-action gap in robot instruction following, offering alternative to policy learning scaling.
Framework for designing generative social robots using LLMs for educational tutoring, addressing hallucinations, overreliance, and privacy risks in responsible AI deployment.
Statistical model capturing multi-scale structure of natural language relating entropy rate to semantic chunking in LLMs.
Active learning method for medical imaging that explains selection decisions based on clinically meaningful features.
Framework arming NL2SQL agents with database-specific tribal knowledge to improve translation accuracy on real-world databases.
Case study analyzing emergence of social dynamics in AI agent societies through Moltbook's open-ended multi-agent environment.
Geometric analysis showing hallucinations in small LLMs exhibit looser clustering than genuine responses in embedding space.
Indic-TunedLens: interpretability framework for multilingual LLMs in Indian languages with shared affine transformations.
Weight-space detection method for backdoor attacks in LoRA adapters without requiring execution or knowing trigger patterns.
Method closing distribution gap in adversarial training for LLMs to improve robustness against simple in-distribution exploits.
Cross-domain orchestration framework for managing federated AI-as-a-Service deployments with network-compute integration.
AI-Paging system for runtime selection and execution of AIaaS model instances via network-based intent matching.