When Test-Time Guidance Is Enough: Fast Image and Video Editing with Diffusion Guidance
Fast image and video editing using diffusion model guidance without costly vector-Jacobian product computations.
Fast image and video editing using diffusion model guidance without costly vector-Jacobian product computations.
Study on relative voice impression estimation predicting perceptual differences between utterances using paralinguistic features.
Learning dense 3D feature fields for articulated object manipulation generalizing across diverse objects using part-aware representations.
CORPGEN: multi-horizon task environment benchmark for evaluating autonomous agents on concurrent long-horizon tasks with dependencies and reprioritization.
Sali-Cache: dual-signal KV-cache optimization framework for efficient long-form video understanding in vision-language models.
Hybrid TGN-SEAL model combining temporal graph networks with SEAL for link prediction in dynamic sparse networks.
Federated ensemble learning approach with progressive personalization addressing statistical heterogeneity across distributed clients.
Study on pinching antenna systems for energy-efficient over-the-air federated learning in wireless networks.
GRAIL: imitation learning approach for goal recognition alignment enabling accurate identification of agent goals from behavior for AI alignment.
AD-Bench: real-world benchmark for evaluating LLM agents on complex advertising and marketing analytics tasks requiring multi-round reasoning.
STATe-of-Thoughts: interpretable inference-time compute method using structured action templates for improved diversity and explainability in LLM reasoning.
SM-EM algorithm for ML optimization reformulating EM iterations as weighted least squares with learnable scaling analogous to Adam optimizer components.
MILD framework for proactive failure prediction in intent-based networks using Mixture-of-Experts with disambiguation module for multi-intent systems.
FMMD: Multimodal peer review dataset from F1000Research for training AI systems for automated scholarly paper evaluation.
Floe: Federated learning framework combining cloud LLMs with edge small language models for low-latency, privacy-preserving real-time inference.
Analysis of LLM benchmark saturation problem: frontier models exhaust new benchmarks quickly, threatening ability to measure AI progress.
Coalition formation model for selfish agents with possibly overlapping coalitions under partial information using offline learning.
Airbnb's extreme classification system for efficient retrieval and audience expansion in two-sided marketplace search.
AdaptManip: Reinforcement learning framework for humanoid robots to autonomously perform navigation, object lifting, and delivery without demonstrations.
InnoEval: Framework for evaluating AI research ideas using LLMs with knowledge-grounded, multi-perspective reasoning and collective deliberation.
LRD-MPC uses low-rank decomposition to improve efficiency of secure multi-party computation for machine learning inference.
PITA dataset with 23M propositional logic statements examining how reasoning traces support LLM reasoning and length generalization limits.
CAIRO framework decouples regression into scale-invariant rank ordering and scale learning stages to handle outliers and heavy-tailed noise.
Frontier AI Risk Management Framework analyzing risks from rapidly advancing AI models and agentic AI systems, version 1.5 technical report.
SWA: Game-theoretic framework for multi-agent LLM systems balancing individual alignment with collective stability through modified inference-time decisions.
Frequentist regret analysis of Gaussian Process Thompson Sampling for sequential decision-making over continuous action spaces.
Two-stage deep reinforcement learning approach for training quadruped robots to climb U-shaped stairs autonomously.
Studies log-concave sampling from constrained and composite distributions using proximal samplers and epigraph transformations.
Uncertainty-aware multimodal segmentation framework combining radiological images and clinical text for medical imaging with cross-modal fusion.
COOL-MC framework for formally verifying and explaining reinforcement learning policies for sepsis treatment using model checking.
Evaluates mathematical reasoning capabilities of LLMs in Sinhala and Tamil languages versus English-like translation representations.
TWISTED-RL framework for robotic knot-tying using hierarchical reinforcement learning agents without human demonstrations.
MATEO: multimodal benchmark for evaluating LVLMs on temporal reasoning and planning with directed acyclic graph task execution orders.
LongAudio-RAG: hybrid framework combining audio-language models with retrieval-augmented generation for multi-hour audio question answering.
VariViT: Vision Transformer architecture supporting variable image sizes without fixed-size patches, addressing medical imaging challenges.
Tabular foundation models applied to association rule mining, outperforming classical and neural approaches especially in low-data regimes.
Quantum reservoir computing evaluation on neutral-atom Rydberg processor for biomarker-based clinical outcome prediction from limited medical datasets.
Graph neural network approach for emergency evacuation planning, formulating Bus Evacuation Orienteering Problem as NP-hard optimization.
Kernel-based optimization framework for finding optimal measurement operators in quantum reservoir computers and quantum extreme learning machines.
Analysis of prefill attack vulnerability in open-weight LLMs, exposing systematic security risks in models relying on internal safeguards.
Evolutionary System Prompt Learning (E-SPL) method for jointly improving LLM contexts and weights through reinforcement learning iterations.
Theoretical study on quantum classifier performance under locality-constrained measurements and noise, analyzing information accessibility.
LLMStructBench: benchmark for evaluating LLMs on structured data extraction and JSON generation from natural language across 22 models and 5 prompting strategies.
Reduced-order modeling framework combining finite element methods and extreme learning networks for parameter-dependent PDEs.
Generic object tracking method using joint-embedding predictive architecture for adaptation and occlusion handling in unseen scenarios.
Research on behavioral self-awareness in LLMs fine-tuned with incorrect data, examining how misalignment emerges and shifts with realignment.
Self-supervised learning approach for speech quality assessment with multi-rate audio using spectral augmentation to predict mean-opinion-score.
Dataset paper on knowledge graph refinement algorithms with schema-level information and neurosymbolic techniques for ontological reasoning.
Study on pre-trained protein sequence embeddings for machine-learning-based protein design, addressing challenges with sparse mutation datasets in bioengineering.
RF-GPT extends LLMs and multimodal models to natively support radio-frequency signals for wireless systems, bridging gap between LLM-based telecom approaches and RF signal processing.