Federated learning approach for person re-identification that addresses statistical heterogeneity and communication efficiency in privacy-preserving surveillance systems.
Addresses mode collapse in reinforcement learning fine-tuning by introducing polychromic objectives that preserve policy diversity and enable better exploration.
Proposes end-to-end integration of data-driven learning and existing knowledge for predicting transcriptional responses to genetic perturbations in biological systems.
Evaluates whether large vision-language models can effectively guide blind and low-vision individuals, addressing how to measure real-world utility beyond standard metrics.
TempoControl method enables fine-grained temporal control in text-to-video generative models, allowing specification of when visual elements appear in sequences without retraining.
Mathematical analysis of incoherence in goal-conditioned autoregressive models fine-tuned with reinforcement learning.
Multi-agent reasoning framework for interpreting gene clusters in antimicrobial resistance studies using transcriptomic data.
Fair division method for indivisible payoffs in coalitional games using Shapley value.
Conformal prediction framework for assessing correctness of LLM outputs with user-defined tolerance levels.
Benchmarking framework using embeddings to detect gender bias in LLMs used for educational feedback on student essays.
Multimodal framework for myocardial scar segmentation combining ECG signals with cardiac MRI imaging.
DuoTok source-aware dual-track tokenizer preserving high-fidelity reconstruction, predictability, and cross-track correspondence for music language models.
Study showing structured prompts significantly improve LLM evaluation accuracy and reduce prompt-dependent variance in benchmark frameworks like HELM.
OmniFusion modular approach for simultaneous multilingual multimodal translation combining speech recognition and translation in open-source LLM pipelines.
Lumos framework for formally certifying language model system behaviors using imperative probabilistic programming with graph-based prompt generation.
GPERT framework for event-based 3D Gaussian splatting balancing accuracy and temporal resolution using geometric-photometric event camera data.
Study demonstrating evasive injection techniques that bypass ML-based prompt injection detectors in retrieval-augmented LLM systems.
Analysis showing steering vectors in LLMs are fundamentally non-identifiable with large equivalence classes, questioning interpretability of activation steering methods.
FIRE reinitialization method balancing stability-plasticity tradeoff in continual learning for deep neural networks through Frobenius-isometry constraints.
Empirical evaluation of LLM-generated ACSL formal specification annotations for C programs, assessing automatic verification without human assistance.
CoCoDiff training-free style transfer framework using diffusion models and correspondence consistency for fine-grained region-wise semantic preservation.
Empirical evaluation of GPTutor LLM tutoring system comparing embedded proof-review feedback versus chatbot support for discrete mathematics learning.
TaCarla comprehensive benchmark dataset for end-to-end autonomous driving with perception and planning information for vehicle research.
SWE-CI benchmark evaluating LLM-powered agents on repository-level codebase maintenance via continuous integration and multi-step feature iterations.
RoboClaw agentic framework unifying data collection, policy learning, and deployment for long-horizon robotic tasks with vision-language-action systems.
CHIMERA-Bench standardized benchmark dataset for epitope-specific antibody design enabling fair comparison of computational design methods.
OPERA framework for data pruning in dense retrieval models that improves both efficiency and effectiveness of domain-specific finetuning through heterogeneous pair selection.
Self-attention CycleGAN method for harmonizing multi-site MRI data using tri-planar context to address scanner-induced distribution shifts.
Survey of 6,793 Mexican high school students examining how different motivational profiles relate to generative AI tool usage in math and writing.
Demonstrates LLM-based AI agents autonomously executing high energy physics analysis pipelines including event selection, background estimation, and statistical testing.
KidGym benchmark dataset based on children's intelligence tests to evaluate multimodal LLMs on visual reasoning tasks.
Framework using LLMs to automate reward design for multi-agent reinforcement learning by synthesizing executable reward programs.
Experiential Reflective Learning framework enabling LLM agents to self-improve by leveraging past interactions and adapting to specialized environments.
Mechanistic interpretability analysis of how LLMs verbalize confidence scores versus actual accuracy using linear probes and activation steering.
Neuro-symbolic approach combining neural networks with domain knowledge for process anomaly detection in event logs.
Vision2Web: Hierarchical benchmark for evaluating AI agents on website development tasks from UI-to-code to full-stack implementation.
CarbonEdge: Carbon-aware deep learning inference framework for edge computing optimizing environmental impact alongside latency and throughput.
CDH-Bench: Benchmark evaluating vision-language models' commonsense-driven hallucinations when visual evidence conflicts with common sense.
LG-HCC proposes geometry-aware compression for 3D Gaussian Splatting to reduce storage overhead while maintaining rendering quality.
HISA improves efficiency of sparse attention mechanisms by optimizing hierarchical indexing to reduce bottlenecks in token-level key selection for LLMs.
MemFactory: unified inference and training framework for agent memory integration with RL optimization of memory operations in LLM agents.
FigAgent: multi-agent framework for automatic method illustration generation in AI papers via drawing middleware orchestration.
Reduced density matrix method from quantum chemistry for predicting and interpreting phase transitions during deep learning model training.
Optimizer-aware gradient-based online data selection framework for sequential LLM fine-tuning with step-dependent sample utility estimation.
Personalized federated fine-tuning approach for language models on distributed heterogeneous task datasets with improved generalization.
Evolution strategies for Deep RL pretraining offering derivative-free, computationally efficient alternative to standard deep reinforcement learning.
Continual learning framework for resource-constrained agents using stochastic bridge diffusion process for temporal memory management.
Perspective on sustainability challenges in AI-driven molecular and materials discovery across QM data, training, and automation pipelines.
Empirical evidence that classifier-based safety gates fail for self-improving AI systems across multiple model architectures.
Symbolic mixture-of-experts model for predicting cross-location hurricane evacuation behavior with population-level adaptation.