Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping
Sat2Sound framework predicts soundscape distribution using satellite images and vision-language models for geospatial audio understanding.
Sat2Sound framework predicts soundscape distribution using satellite images and vision-language models for geospatial audio understanding.
SpatialScore: comprehensive benchmark for evaluating spatial intelligence of multimodal LLMs with data-driven and agent-based assessment approaches.
GoT-R1: reinforcement learning framework enhancing multimodal LLM reasoning for complex visual generation with precise spatial relationships and attributes.
Fine-tuning approach for LLMs to predict diverse user behaviors, addressing overfitting to frequent behaviors while capturing long-tailed behavior distribution.
World models for interactive video generation with action conditioning and autoregressive decoding to support planning and future prediction.
Progressive multimodal network for quantifying fish feeding intensity in aquaculture using sensor fusion and conflict resolution between modalities.
Framework using LLMs for few-shot code generation to create safety-critical driving scenarios in CARLA simulator for autonomous driving evaluation.
Mathematical analysis of coarse-grained arithmetic applied to the St. Petersburg paradox in decision theory.
LLM-based autonomous agent for power system voltage control, using experience-driven learning to generate dispatch strategies in distribution networks.
Data Mixing Agent: LLM-based method to automatically re-weight training data domains during continual pre-training, preventing catastrophic forgetting.
PRIX: efficient end-to-end autonomous driving model planning from raw camera pixels without LiDAR, reducing model size and computational requirements.
MDM-OC: framework for scalable, reversible model composition enabling continual learning without task interference or catastrophic forgetting.
Genetic programming approach for symbolic distillation of neural networks, using teacher-student smoothness alignment to improve explainable AI model accuracy.
Protocol for reliable evaluation of low-precision retrieval systems, addressing spurious ties and variability in relevance scoring with reduced numerical precision.
AdvDINO: domain-adversarial self-supervised learning framework for spatial proteomics to handle batch effects in biomedical imaging.
Analysis of LLM use in newsmaking across 40,000+ articles using AI-text detectors, showing increased GenAI adoption in local and college media.
COXNet: cross-layer fusion network for detecting tiny objects in multimodal RGB-thermal imagery for surveillance and autonomous navigation.
FedKLPR: federated learning approach for person re-identification with KL-guided pruning to reduce communication overhead and handle non-IID data.
Proximal SFT: supervised fine-tuning method using trust-region constraints to prevent capability deterioration when adapting foundation models to new tasks.
FlexiFlow: lifetime-aware design framework for integrated computation in disposable products using flexible electronics with kHz speeds.
LLM-based synthetic training reduces maritime domain model costs 261x by using LLMs as teachers for small language model training.
Diffusion model for audio-driven facial animation using keyframe augmentation and speech feature decomposition.
FS-DFM enables fast long text generation using few-step diffusion language models with parallel position generation.
StyleBench evaluates trade-offs between structured reasoning styles and efficiency/robustness in LLM inference.
Position paper analyzing measurement gaps in reinforcement learning with verifiable rewards for LLMs on structured tasks.
SecureVibeBench evaluates code generation security of LLM-powered code agents against realistic vulnerability scenarios.
Consistency models as plug-and-play priors for solving inverse problems with reduced neural function evaluations.
T-BiGAN framework combining Transformers and BiGAN for unsupervised anomaly detection in power grid synchrophasor data.
Hybrid deep learning system for EEG-based brain-computer interface wheelchair control using motor imagery.
Mathematical framework interpreting Transformers as discretizations of integro-differential equations.
Semantic segmentation combining light field and LiDAR modalities for autonomous driving scene understanding.
LLM-based system for generating standards-aligned math word problems customized to student interests and ability levels.
Protein language models for fitness prediction interpreted as inverse reinforcement learning on evolutionary sequences.
HiPRAG uses hierarchical process rewards to improve agentic RAG efficiency, reducing over-search and under-search behaviors.
Unified framework analyzing sequence models (Transformers, SSMs, gated RNNs) through coefficient dynamics lens.
Survey of inductive reasoning in LLMs, covering particular-to-general thinking patterns and knowledge generalization capabilities.
RAGen framework for generating domain-specific question-answer pairs to adapt RAG systems to specialized applications.
Risk-sensitive abstention in bandit algorithms for high-stakes AI where errors are irreparable without expert guidance.
Post-processing methods for MRI brain image inpainting to handle lesions and tumors in medical imaging analysis.
Multi-hop reasoning over knowledge graphs using multi-view RAG with LLMs, addressing Transformer attention specialization patterns.
Optimization proxies trained to minimize optimality gaps while providing worst-case guarantees for large-scale batch economic dispatch problems.
SimBench provides first standardized benchmark for evaluating how faithfully LLMs simulate human behaviors across diverse tasks and metrics.
AtlasKV enables RAG systems to integrate billion-scale knowledge graphs efficiently in limited VRAM by avoiding expensive external retrieval modules.
Proposes DistDF for time-series forecasting using Wasserstein alignment to handle autocorrelated label sequences better than standard approaches.
Method to automatically extract and explain what features human feedback data encodes when training language models, addressing unpredictability in RLHF approaches.
Analysis of multilingual reasoning gaps in reasoning language models, showing deficits stem from language understanding failures in low-resource languages.
Method for interpreting LLM reasoning by resampling multiple chain-of-thought branches to measure causal influence and underlying computation.
LLM-guided decompilation framework using context to improve re-executability of decompiled binaries for security analysis.
Multimodal diffusion approach for robot learning from expert trajectories, modeling interactions between observations, actions, and rewards.
SynthAgent: Framework for web agent adaptation using synthetic data generation with quality filtering to handle hallucinations and trajectory noise.