Discovering Differences in Strategic Behavior Between Humans and LLMs
Discovering Differences in Strategic Behavior Between Humans and LLMs
Discovering Differences in Strategic Behavior Between Humans and LLMs
LiveMedBench: A Contamination-Free Medical Benchmark for LLMs with Automated Rubric Evaluation
Found-RL: foundation model-enhanced reinforcement learning for autonomous driving
MERIT Feedback Elicits Better Bargaining in LLM Negotiators
Abstraction Generation for Generalized Planning with Pretrained Large Language Models
Flow of Spans: Generalizing Language Models to Dynamic Span-Vocabulary via GFlowNets
Neuro-symbolic Action Masking for Deep Reinforcement Learning
To Think or Not To Think, That is The Question for Large Reasoning Models in Theory of Mind Tasks
OmniSapiens: A Foundation Model for Social Behavior Processing via Heterogeneity-Aware Relative Policy Optimization
Spend Search Where It Pays: Value-Guided Structured Sampling and Optimization for Generative Recommendation
Integrating Generative AI-enhanced Cognitive Systems in Higher Education: From Stakeholder Perceptions to a Conceptual Framework considering the EU AI Act
See, Plan, Snap: Evaluating Multimodal GUI Agents in Scratch
SynergyKGC: Reconciling Topological Heterogeneity in Knowledge Graph Completion via Topology-Aware Synergy
Reinforcing Chain-of-Thought Reasoning with Self-Evolving Rubrics
Can LLMs Cook Jamaican Couscous? A Study of Cultural Novelty in Recipe Generation
CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion
GameDevBench: Evaluating Agentic Capabilities Through Game Development
FormalJudge: A Neuro-Symbolic Paradigm for Agentic Oversight
Large Language Models Predict Functional Outcomes after Acute Ischemic Stroke
A Practical Guide to Agentic AI Transition in Organizations
"Humans welcome to observe": A First Look at the Agent Social Network Moltbook
The Anatomy of the Moltbook Social Graph
TokaMark: A Comprehensive Benchmark for MAST Tokamak Plasma Models
AgentTrace: A Structured Logging Framework for Agent System Observability
Reverse-Engineering Model Editing on Language Models
Multi-encoder ConvNeXt Network with Smooth Attentional Feature Fusion for Multispectral Semantic Segmentation
Multimodal Information Fusion for Chart Understanding: A Survey of MLLMs -- Evolution, Limitations, and Cognitive Enhancement
Anonymization-Enhanced Privacy Protection for Mobile GUI Agents: Available but Invisible
Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study
When LLMs get significantly worse: A statistical approach to detect model degradations
Silence Routing: When Not Speaking Improves Collective Judgment
On the Use of a Large Language Model to Support the Conduction of a Systematic Mapping Study: A Brief Report from a Practitioner's View
Red-teaming the Multimodal Reasoning: Jailbreaking Vision-Language Models via Cross-modal Entanglement Attacks
Exploring Semantic Labeling Strategies for Third-Party Cybersecurity Risk Assessment Questionnaires
PEST: Physics-Enhanced Swin Transformer for 3D Turbulence Simulation
PRISM-XR: Empowering Privacy-Aware XR Collaboration with Multimodal Large Language Models
MalMoE: Mixture-of-Experts Enhanced Encrypted Malicious Traffic Detection Under Graph Drift
NMRTrans: Structure Elucidation from Experimental NMR Spectra via Set Transformers
AD$^2$: Analysis and Detection of Adversarial Threats in Visual Perception for End-to-End Autonomous Driving Systems
Omni-Safety under Cross-Modality Conflict: Vulnerabilities, Dynamics Mechanisms and Efficient Alignment
Beyond SMILES: Evaluating Agentic Systems for Drug Discovery
Anatomy-Preserving Latent Diffusion for Generation of Brain Segmentation Masks with Ischemic Infarct
EVA: Towards a universal model of the immune system
EvoCodeBench: A Human-Performance Benchmark for Self-Evolving LLM-Driven Coding Systems
Cosmo3DFlow: Wavelet Flow Matching for Spatial-to-Spectral Compression in Reconstructing the Early Universe
Towards Autonomous Mathematics Research
When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models
Versor: A Geometric Sequence Architecture
Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models
Quantum Integrated Sensing and Computation with Indefinite Causal Order