Beyond Accuracy: Risk-Sensitive Evaluation of Hallucinated Medical Advice
Risk-sensitive evaluation framework for LLM hallucinations in medical advice, assessing clinical harm severity beyond factual correctness.
Risk-sensitive evaluation framework for LLM hallucinations in medical advice, assessing clinical harm severity beyond factual correctness.
Automated black-box pipeline detecting unverbalized biases in LLM chain-of-thought reasoning without predefined categories using task-specific evaluation.
RooflineBench: Roofline model-based benchmarking framework for characterizing performance of Small Language Models on edge hardware.
Agentic system for respiratory disease diagnosis using multimodal sound generation and active adversarial curriculum learning.
Framework extending Item Response Theory to measure AI model propensities and behavioral tendencies beyond capability metrics.
Agentic system for hierarchical urban geospatial modification using multimodal models to handle dependency-aware city planning changes.
Heterogeneous graph transformer framework for predicting high-potential small-medium enterprises using public business data.
Analysis showing test-time training with KV binding can be expressed as learned linear attention mechanism.
Federated learning aggregation method using gradient-based weighting to address client drift and data heterogeneity.
Safety filtering framework for flow-based generative models providing formal guarantees that generated samples satisfy hard constraints.
Training approach for collaborative AI agents using strategic risk aversion to improve generalization when paired with new partners.
Method for reconstructing video content from brain fMRI activity using hierarchical semantic guidance.
Adversarial dataset and training method to improve multimodal LLM robustness when handling visually complex scenes.
Framework for systematically mapping unsafe regions in LLMs using MAP-Elites quality diversity approach to characterize failure modes.
Improved FSDP implementation supporting structure-aware training and non-element-wise optimizers for large-scale model training.
Document analysis system for semi-structured documents with tables, charts, and hierarchical content for question-answering tasks.
Analysis of why diffusion language models converge to autoregressive decoding instead of truly parallel generation despite theoretical advantages.
Defense method for LLMs against toxic outputs using representation erasure-based preference optimization, more robust than DPO/NPO approaches.
Machine unlearning technique for generative recommendation systems using LLMs to remove sensitive user attributes from model parameters.
Quantum machine learning models using angle encoding with analysis of circuit depth scaling for fixed and trainable-frequency approaches.
Brain-OF is first omnifunctional foundation model jointly pretrained on fMRI, EEG, and MEG brain imaging data.
EvoX framework combines LLM-driven optimization with evolutionary search for automated discovery of programs, prompts, and algorithms.
Information-theoretic analysis of human supervision as bottleneck explaining persistent LLM errors from annotation noise and subjectivity.
Framework using LLM guidance to annotate concepts for interpretable Concept Bottleneck Models with uncertainty awareness.
FedDAG improves federated learning under data heterogeneity by combining data and gradient similarity for client clustering.
Theoretical work proving neural operators can discover functional clusters in infinite-dimensional spaces.
Active learning approach for querying values of subadditive set functions with applications to combinatorial optimization.
Rudder uses LLM agents to steer prefetching optimization in distributed GNN training for adaptive performance.
Analysis of learning dynamics when multiple platforms compete for users, showing convergence to poor models under certain conditions.
Flowette uses flow matching with graph neural networks to generate graphs with recurring subgraph motifs.
SDMixer proposes sparse dual-stream architecture for multivariate time series forecasting using frequency and temporal analysis.
Systematic ablation study of initialization and normalization strategies for GNNs in blockchain fraud detection.
Benchmark evaluating multimodal fusion of EHR and chest X-rays for clinical decision support under missingness and fairness constraints.
BTTackler diagnoses training problems in deep learning to guide efficient hyperparameter optimization beyond accuracy-based methods.
Theoretical analysis of single-loop stochastic bilevel optimization convergence for meta-learning and hyperparameter optimization.
FlexGuard proposes continuous risk scoring for LLM content moderation that adapts to varying strictness levels across platforms and time.
FedRot-LoRA addresses rotational misalignment in federated LoRA fine-tuning of LLMs, improving communication-efficient training on decentralized data.
Diffusion-based method for time series anomaly detection using selective denoising instead of conditional reconstruction strategies.
MoST contrastive learning method for disentangled mode-specific representations in multi-mode tensor time series.
Geometric analysis of transformer training trajectories revealing low-dimensional drift direction and transverse oscillatory dynamics.
BDGxRL uses Diffusion Schrödinger Bridge to address dynamics gaps in cross-domain reinforcement learning without target reward supervision.
OPTIAGENT uses LLM-based agentic framework with physics-driven optimization for automated optical design and lens system configuration.
MAGE multi-scale autoregressive generation framework for offline RL addressing long-horizon tasks with sparse rewards via hierarchical decomposition.
Provable identifiability framework for nonlinear multi-view canonical correlation analysis via subspace identification.
Learning-based pathfinding using neural networks to approximate informed heuristics for grid-based search across different map topologies.
MPU framework for privacy-preserving knowledge unlearning in LLMs without sharing server parameters or client forget sets.
Actor-critic pretraining approach for PPO that leverages expert data to reduce environment interactions required for RL training.
Theoretical investigation of offline reinforcement learning with general function approximation and parametric policies beyond state-wise methods.
Q-learning approach for learning safe policies from expert demonstrations with unknown constraints in constrained MDPs.
FedNSAM addresses sharpness-aware minimization in federated learning under high data heterogeneity, ensuring both local and global model flatness.