Aeon: High-Performance Neuro-Symbolic Memory Management for Long-Horizon LLM Agents
Aeon: neuro-symbolic memory management system for long-horizon LLM agents addressing context window and attention cost limitations.
Aeon: neuro-symbolic memory management system for long-horizon LLM agents addressing context window and attention cost limitations.
SpikeScore: hallucination detection method for LLMs with improved cross-domain generalization.
ScholarGym: benchmark for evaluating LLM capabilities in information-gathering stage of deep research systems.
Learning decentralized LLM collaboration using multi-agent reinforcement learning without centralized execution protocols.
Study on persuasion propagation: how belief-level intervention affects downstream behavior in LLM agents executing long-horizon tasks.
ROMA: recursive framework for long-horizon multi-agent tasks using task decomposition and structured aggregation to handle context limits and execution complexity.
PATHWAYS: benchmark of 250 web agent tasks evaluating ability to discover and use hidden contextual information across closed/open models.
OmniVideo-R1: framework for improving audio-visual reasoning in multimodal models using query intention and modality attention.
AIRS-Bench: benchmark of 20 ML research tasks for evaluating AI agent capabilities across language modeling, mathematics, bioinformatics, and time series forecasting.
LQA framework for deploying vision-language models on edge devices using quantization and gradient-free test-time adaptation.
Analyzes regime leakage in AI safety evaluation where situational-aware agents exploit differences between evaluation and deployment.
Tests whether GPT-4o possesses Theory of Mind via causal model evaluation, finding it lacks core ToM representations.
Lightweight explainable deep learning system for ECG classification with three leads for cardiovascular disease detection.
TKN network for real-time video prediction using transformer-based keypoint detection with reduced computation and memory.
Analyzes fundamental limits of offline policy selection in reinforcement learning through sample efficiency perspective.
Sparse MeZO improves memory-efficient zeroth-order LLM fine-tuning by using sparse parameter updates during training.
Identifies attention collapse in LLM deeper layers and introduces Inheritune method to create smaller, more efficient models.
Proposes gradient-based methods for data-driven inverse optimization of mixed integer linear programs.
Evaluates ROS-Causal, a causal inference implementation for human-robot spatial interaction in real-world scenarios.
Survey of synergy between Foundation Models and Federated Learning, covering FMs adapted for distributed learning scenarios.
Framework for resource-efficient edge-based fine-tuning of personal LLMs using collaborative edge computing while preserving privacy.
Proposes differentially private customization service for LLMs that enables domain-specific fine-tuning without uploading user data.
SAFE framework automates formal proof generation for Rust code using LLMs via self-evolution to overcome proof data scarcity.
Introduces VCDF, a method-agnostic framework for time series causal discovery that improves robustness across temporal subsets.
Proposes one-line PyTorch modification to momentum-based optimizers creating cautious optimizers (C-AdamW, C-Lion) for improved transformer pretraining.
Evaluates and improves counting abilities of large vision-language models across multiple visual datasets and benchmarks.
Introduces ChemBFN model using Bayesian flow networks for generating novel molecules outside training distribution for drug design.
Proposes CV-DD, a committee voting approach for dataset distillation to create compact representative datasets for efficient model training.
Technique for learning neural network layer width during training without manual hyperparameter tuning or architecture search.
Method improving Direct Preference Optimization through margin-maximization data selection to address parameter shrinkage from noisy annotations.
RapidPen: autonomous penetration testing framework using LLM agents to discover and exploit vulnerabilities from IP addresses.
Deep reinforcement learning framework for autonomous multi-UAV coordination in GNSS-denied search and rescue.
Application of quantum machine learning to stock price prediction using contextual quantum neural networks.
Survey of multimodal generative models for capturing world dynamics across 2D, 3D, video, and 4D representations.
RMOD: inference-time algorithm aligning LLMs to multiple objectives via robust decoding using maximin game theory.
ML research on distilling Graph Neural Networks into MLPs for link prediction using heuristic teacher methods.
Agent-based simulation formalizing AI-human collaboration by modeling distinct optimization and satisficing decision heuristics.
SecRepoBench: benchmark evaluating code agents and LLMs on secure code completion across real-world C/C++ repositories covering 15 CWEs.
Sparse Latent Factor Forecaster model for multi-horizon commodity futures prediction with iterative inference.
Benchmark for retrieval-augmented generation in chemistry domain with curated evaluation datasets and domain-specific corpora.
Benchmarking study on CPU-intensive stream data processing performance in edge computing systems.
Caprese: distillation method for efficient LLM inference that preserves math reasoning capabilities while reducing computational demands.
Federated learning research addressing optimization challenges from heterogeneous client communication and computational capabilities.
ML research on preventing negative transfer in transfer learning through residual feature integration.
Computer vision research on separating reflection and transmission layers in single images using transformer models.
Survey of LLM-based techniques for software quality assurance, covering requirement analysis, code review, and test generation automation.
MoESD demonstrates speculative decoding effectiveness for accelerating sparse Mixture of Experts LLM inference without accuracy loss.
Multi-objective neural network optimization for soft robotic Fin-Ray fingers balancing rigidity and delicate handling in grasping tasks.
Characterization of KV cache behavior in large-scale LLM serving with analysis of cache eviction policies and workload-dependent optimization.
HALT method for post-training LLMs to abstain on tasks outside their capability, reducing hallucination through capability-aligned fine-tuning.