An Empirical Study and Theoretical Explanation on Task-Level Model-Merging Collapse
Empirical study of catastrophic performance degradation in merged task-specialist LLMs, analyzing representation and task-specific feature conflicts.
Empirical study of catastrophic performance degradation in merged task-specialist LLMs, analyzing representation and task-specific feature conflicts.
Goal-conditioned system using endogenous priority functions based on epistemic gaps: ignorance, surprise, and staleness for attention allocation.
GenePlan framework combining LLM-assisted evolutionary algorithms to generate domain-dependent PDDL planners that minimize plan length across problem instances.
LLM-based approach for generating personalized fake news debunking messages using Big Five personality trait alignment.
Context engineering discipline for designing agent decision environments in multi-step autonomous systems, extending beyond prompt engineering to full informational management.
PRECEPT framework for test-time adaptation in LLM agents with structured rule retrieval, conflict-aware memory, and adversarial knowledge detection capabilities.
Benchmark evaluating LLMs' ability to generate interactive HTML-based MiniApps with visual interfaces and customized interaction logic beyond static text.
Unified multimodal parsing framework with hierarchical taxonomy for documents, images, and audio-visual streams using progressive parsing paradigm.
Benchmark using esoteric programming languages to evaluate genuine reasoning vs memorization in LLMs, preventing benchmark gaming through economically irrational language choices.
Safety benchmark for multimodal LLMs focusing on consequence-driven safety for autonomous and embodied agents, introducing OOD-MMSafe with 455 curated query-image pairs.
Training-free data selection method for vision-language models that identifies samples requiring genuine cross-modal reasoning rather than linguistic shortcuts.
Self-evolving multi-agent framework with dynamic cognition and elastic memory orchestration for adaptive agents in non-stationary environments.
Theoretical analysis of chain-of-thought necessity in LLMs through opaque serial depth, formalizing computation constraints in Transformers.
Continual learning approach using local classifier alignment on pre-trained models to mitigate catastrophic forgetting in changing environments.
Policy-parameterized prompt framework for controlling LLM multi-agent dialogue behavior using lightweight state-action policies instead of ad hoc prompts.
Unified benchmarking framework for multimodal medical multi-agent systems addressing architectural fragmentation and standardized evaluation.
Framework for integrating domain knowledge and diagnostic reasoning into pathology multimodal LLMs with cognition-aligned memory mechanisms.
Formal analysis of when confidence-based abstention improves ranked decision systems through rank-alignment and inversion zone conditions.
Study investigating how reasoning affects deceptive behavior in LLMs using moral trade-off datasets, finding reasoning increases honesty unlike humans.
Research on verifying math questions in LLM training, focusing on question validity rather than just correct reasoning paths for mathematical reasoning tasks.
Overflow-Aware Scaling and Macro Block Scaling techniques for MXFP4 quantization to reduce accuracy loss in LLM inference.
Design Conductor: autonomous agent using frontier LLMs to build complete Linux-capable RISC-V CPU (VerCore) end-to-end in 12 hours.
CktEvo: repository-level RTL code benchmark for evaluating LLM performance on iterative hardware design evolution tasks.
SiliconMind-V1: multi-agent LLM framework with debug-reasoning workflows for Verilog code generation without external verification tools.
ALADIN: design-space inference analysis framework for mixed-precision quantized neural networks on embedded AI accelerators.
Experimental study of collective pathology in multi-agent LLM systems, investigating alignment constraints as source of iatrogenic harm.
ARKV: adaptive KV cache management framework for ultra-long context LLM inference with dynamic memory budget constraints.
Cross-platform study of measurement-free ancilla recycling via blind reset on superconducting and trapped-ion quantum processors.
Systematic review and performance evaluation of federated learning techniques for edge computing environments with privacy and efficiency focus.
Auralink SDC: edge-deployed autonomous AI agents for managing EV charging infrastructure with improved fault detection and latency.
Sensitivity-based pruning and quantization framework for compressing reservoir computing models with hardware efficiency trade-offs.
Review of FPGA-based AI accelerator architectural design and performance for deep learning tasks including NLP and autonomous decision-making.
Compressed PagedAttention method combining token-wise KV cache eviction for high-concurrency LLM reasoning with reduced memory bottlenecks.
Layer-wise sensitivity analysis of NVFP4 and MXFP4 quantization formats for LLM inference on advanced hardware architectures.
State space models with permutation equivariance for multivariate time series modeling without artificial variable ordering.
HCAPO framework integrating hindsight credit assignment to improve long-horizon LLM agent performance on multi-step tasks with sparse rewards.
Turn: compiled actor-based programming language with static schema typing for building autonomous agentic software that delegates inference to LLMs.
Transformer model for electronic dance music structure segmentation using energy, rhythm, and timbre analysis instead of lyrical/harmonic similarity.
Framework for creating structured safety arguments for frontier AI systems, adapting aerospace/automotive safety case methodologies.
Multi-level meta-reinforcement learning approach using skill-based curriculum for hierarchical sequential decision making and MDP compression.
Framework automating superconducting qubit experiment design and control sequences using LLMs.
TDAD methodology for developing tool-using agents via behavioral specifications and automated testing, addressing production compliance.
LLM-based framework for scalable task planning in heterogeneous multi-robot systems using natural language.
Study examining relationship between retrieval quality and information coverage in RAG systems for report generation.
Fish Audio S2: open-source text-to-speech system with instruction-following control via natural language descriptions.
GenGNN framework for graph generation achieving comparable performance to transformers with 2-5x faster inference.
Feature selection method for hybrid information systems using fuzzy rough set theory for big data applications.
Adversarial attack method using diffusion models to deceive deep learning-based network intrusion detection systems.
Theoretical framework for selective prediction with risk control combining multiple concentration inequalities and betting-based confidence sequences.
Federated learning technique optimizing client selection under non-IID data distribution for collaborative model training.