HEARTS: Benchmarking LLM Reasoning on Health Time Series
HEARTS benchmark for evaluating LLM reasoning on health time series across multiple physiological modalities and temporal dependencies.
HEARTS benchmark for evaluating LLM reasoning on health time series across multiple physiological modalities and temporal dependencies.
Symbolic ML approach for failure detection in chemical processes, emphasizing interpretability and safety over neural methods; includes ethylene oxidation case study.
AI-powered platform using LLM personas to teach deliberative democratic skills and consensus-finding through simulated discussion scenarios.
ML competition for agricultural vision focusing on data-centric approaches and model generalization under distribution shifts in real field conditions.
Evaluation framework for tabular foundation models using proper scoring rules to assess full predictive distributions, not just point estimates.
Fine-tuning method for Vision Transformers using concept guidance to reduce spurious correlations and improve robustness to distribution shifts.
Clinical feasibility study of LLM-based conversational diagnostic AI (AMIE) in real primary care workflows with safety evaluation.
Study of emotion as latent factor affecting LLM reasoning and attention mechanisms, rather than just a prediction target.
Motion forecasting for autonomous vehicles handling open-world scenarios with imperfect perception and evolving object taxonomies.
Vision-language-action model for autonomous driving using perception-planning distillation to improve visual encoding and trajectory planning stability.
Research on how LLMs handle compositional language tasks (adjective-noun relationships), comparing external performance with internal model representations.
Multi-modal network for UAV traffic scene understanding with benchmark. Computer vision for autonomous systems, not core AI/ML research.
Research comparing LLM performance in healthcare triage across evaluation formats. Shows evaluation methodology significantly affects model assessment outcomes.
KEPo: research on knowledge graph poisoning attacks against GraphRAG systems. Analyzes vulnerabilities when LLMs rely on external databases.
RoboClaw: Agentic framework unifying data collection, policy learning, and deployment for long-horizon robotic manipulation using Vision-Language-Action systems.
Controlled experiments showing language models prefer correct answers due to data compressibility structure rather than truth, using small transformers on contradictory corpora.
MobileKernelBench benchmark evaluating LLM capability to generate efficient computational kernels for mobile devices, with systematic investigation of code generation limits.
LoV3D: Vision-language model pipeline for longitudinal brain MRI analysis that grounds neurological disease progression reasoning in regional volume measurements.
Theoretical neural architecture integrating perception, memory, prediction, and control as unified computational framework inspired by neuroscience evidence.
POMDP-based approach for optimizing when to update task completion time announcements in project management, balancing accuracy and stakeholder trust.
Research on model stitching technique for Vision Foundation Models, testing representational compatibility across models with different training objectives and data sources.
Method for incorporating text into time-series forecasting by bridging the modality gap between qualitative text and quantitative forecasting signals through semantic space alignment.
MetaKE addresses knowledge editing in LLMs using bi-level optimization to fix specific facts without degrading general capabilities, identifying semantic-execution misalignment issues.
Empirical study of model collapse in large language models trained recursively on synthetic data.
Dual-stream voice anonymization attacker using spectral and self-supervised learning features for privacy evaluation.
Attention-based anomaly detector for multivariate time series using predictable query dynamics.
Self-evolution framework using Minimum Bayes Risk decoding for error span detection in machine translation without human annotations.
Large-scale open-source software engineering environment for training AI agents with executable, verifiable tasks and dynamic feedback.
Method using preference optimization to generate physically-plausible humanoid motion from text descriptions.
Critical analysis of graph transformer architectures for predicting outcomes from longitudinal electronic health records.
Method for continual fine-tuning of pre-trained models on sequential tasks with parameter-free task retrieval and no forgetting.
Clustering algorithm for longitudinal data analyzing time-dependent variables across individuals with shared temporal characteristics.
Code agent framework with structured memory enabling adaptive learning from project evolution and past successful reasoning trajectories.
World model architecture using spherical kernel operators to handle shifting data distributions in latent space transitions.
Federated framework combining lightweight LLMs with personal knowledge graphs for privacy-preserving personalized recommendations.
Neuro-symbolic architecture combining self-supervised learning with verifiable logic rules to mitigate spurious correlations and shortcut learning.
Multimodal foundation model for EEG-text alignment robust to channel heterogeneity for brain signal analysis applications.
Deep learning approach for microclimate prediction from geospatial imagery incorporating spatial relationships in temperature modeling.
Self-distillation method reducing computational cost of chain-of-thought reasoning by training models to generate correct predictions from truncated reasoning.
Zero-shot LLM approach for surgical duration prediction combining retrieval-augmentation with Bayesian averaging, avoiding need for fine-tuning.
Tree-based continual learning framework for non-stationary data distributions with constrained computational resources in time series applications.
Sparse autoencoders foundation for learned sparse retrieval, decomposing LLM representations into interpretable latent features for efficient document retrieval.
Hybrid reinforcement learning framework for dynamic vehicle routing with emission constraints and demand acceptance optimization.
Physics-informed autoencoder architecture for recovering material parameters and temporal evolution from video in continuum mechanics.
Multi-model inference optimization reusing identical KV caches across models to reduce memory consumption in agentic AI systems.
Federated learning method for LoRA fine-tuning of LLMs addressing statistical and functional heterogeneity across model layers.
Generative model approach for aircraft design using simulation-based inference with diffusion models and hierarchical probabilistic methods.
Benchmark quantifying LLM robustness by measuring model sensitivity to prompt variations, typos, and paraphrases in real-world conditions.
Framework reframing LLMs as code generators for interpretable decision-making in high-stakes scenarios, improving reproducibility over black-box approaches.
KV cache optimization technique for multi-agent LLM systems that reuses decoding caches to reduce memory usage and latency in collaborative AI tasks.