SleepLM: Natural-Language Intelligence for Human Sleep
SleepLM is a foundation model family enabling natural language interpretation and interaction with human sleep polysomnography data.
SleepLM is a foundation model family enabling natural language interpretation and interaction with human sleep polysomnography data.
MMKG-RDS is a framework for synthesizing training data using multimodal knowledge graphs to improve domain model reasoning capabilities.
Conceptual paper questioning the definition and feasibility of AGI, arguing for AI specialization over general capabilities.
PseudoAct introduces pseudocode-based planning for LLM agents to reduce token consumption and improve stability on long-horizon multi-tool tasks.
ODAR-Expert presents an adaptive routing framework for optimizing accuracy-efficiency tradeoffs in LLM reasoning via active inference instead of uniform sampling.
Method for hierarchical failure attribution in multi-agent LLM systems using causal graphs to improve observability and debugging of complex agent interactions.
ProductResearch proposes a multi-agent framework using trajectory distillation to train LLM-based agents for complex e-commerce product research tasks.
Auton framework addresses architectural mismatch between stochastic LLM outputs and deterministic backend systems for agentic AI deployment.
MERaLiON2-Omni: 10B multilingual MLLM for Southeast Asia addressing perception-logic tradeoffs in omni-perception tasks.
Domain generalization method leveraging reasoning chains in MLLMs to improve robustness under domain shift.
EMO-R3 applies reflective reinforcement learning to improve emotional reasoning capabilities in multimodal LLMs.
RUMAD: reinforcement learning approach to multi-agent debate that adapts topology to task complexity while maintaining debate neutrality.
RF-Agent uses LLM-based tree search to automatically design reward functions for control tasks with improved historical feedback utilization.
Pessimistic auxiliary policy approach for offline reinforcement learning to mitigate overestimation from out-of-distribution actions.
Scenario-context rollout reinforcement learning for portfolio rebalancing under market regime shifts and distribution changes.
CIRCLE: six-stage framework for evaluating AI systems under real-world conditions and user variability beyond model-centric metrics.
Turing test evaluation of 9 state-of-the-art speech-to-speech systems with human judgments on conversational naturalness.
Bi-level RL-heuristic optimization for winter road maintenance routing on UK strategic and local road networks.
Position paper on Artificial Agency Program proposing resource-bounded, curiosity-driven agents as embedded systems within human-tool extended systems.
Fine-grained off-policy guidance improves exploration in reinforcement learning from verifiable rewards for complex reasoning in large language models.
LemmaBench: live, updatable benchmark evaluating LLMs on research-level mathematics by extracting lemmas from arXiv papers.
Deep learning approach to flexible job shop scheduling with buffer and material constraints for production optimization.
Method for uncertainty quantification in multimodal LLMs using semantic volume metrics to identify unreliable outputs.
Minimal agentic baseline for automated theorem proving that enables systematic comparison across AI-based prover architectures with iterative refinement and library search.
DARE-bench introduces a benchmark for evaluating LLMs on multi-step data science tasks with focus on instruction adherence and process fidelity.
QD-MAPPER uses Quality Diversity and Neural Cellular Automata to automatically generate diverse maps for evaluating multi-agent path finding algorithms.
Social network analysis of Moltbook, an AI-native platform, reveals rapid stratification and hierarchical structures emerge within 12 days across 15K+ agent accounts.
Demonstrates agentic tool-augmented LLMs achieve RAG-level performance using keyword search without vector databases.
TTE-v2 hybrid multimodal retrieval framework extending reasoning-driven bi-encoder architectures with improved performance.
Discriminative framework for semantic chunking of ultra-long documents improving topic segmentation and retrieval.
Domain-partitioned hybrid RAG for legal document reasoning across Indian statutes, codes and precedents.
SPRIG democratizes GraphRAG with CPU-only linear-time pipeline using NER co-occurrence graphs and PPR for multi-hop QA.
Higress-RAG optimizes enterprise RAG with dual hybrid retrieval, adaptive routing and CRAG to reduce hallucination.
Examines responsible AI dashboard design for early-stage HealthTech companies balancing ethics and innovation constraints.
Hello-Chat end-to-end audio language model for realistic social interactions with emotional resonance.
Task-Lens profiles existing speech datasets for low-resource Indian languages to enable cross-task NLP research.
Vul2Safe framework uses token-level RL rewards and LLM self-reflection for secure code generation from LLMs.
Analyzes frequency tuning for quantum machine learning with trainable encodings to reduce circuit depth.
Brain-OF omnifunctional foundation model jointly pretrained on fMRI, EEG and MEG brain imaging modalities.
DesignSense dataset and reward model framework for graphic layout generation using human preference learning.
Develops unified theory showing human supervision as information bottleneck explaining error floors in LLM training from human feedback.
SALIENT uses frequency-aware paired diffusion for detecting rare lesions in CT scans with class imbalance challenges.
Optimization framework for edge directions and weights in guidance graphs for lifelong multi-agent path finding.
TaCarla introduces a comprehensive benchmark dataset for autonomous driving perception and planning tasks.
FedDAG addresses federated learning heterogeneity by clustering clients using combined data and gradient similarity metrics.
SegReg proposes latent-space regularization for U-Net medical image segmentation models to improve generalization through structured embeddings.
ANTShapes: simulator for generating neuromorphic vision datasets addressing limited DVS data availability for anomaly detection.
Study showing divergence between human and LLM behavior on probabilistic inference tasks requiring non-deterministic reasoning.
Rudder: LLM agent-based prefetching steering for distributed GNN training to optimize irregular communication patterns.
HMKGN: hierarchical multi-scale graph network for whole-slide image analysis and cancer survival prediction.