Dynamic System Instructions and Tool Exposure for Efficient Agentic LLMs
Instruction-Tool Retrieval (ITR) RAG variant dynamically retrieves minimal system prompts and necessary tool subsets per step for efficient agentic LLMs.
Instruction-Tool Retrieval (ITR) RAG variant dynamically retrieves minimal system prompts and necessary tool subsets per step for efficient agentic LLMs.
Multi-agent computer-use framework with intent-aligned plan memory to stabilize long-horizon execution and reduce error accumulation.
Framework evaluating reasoning faithfulness in large reasoning models through counterfactual intervention on stance consistency and causal influence.
Multi-agent RL method retaining multiple high-value actions via sub-value functions to adapt to shifting value functions.
Predictive Batch Scheduling uses lightweight online predictor to prioritize high-loss samples, accelerating LLM training convergence.
Empirical study analyzing pull request characteristics from five AI coding agents and human reviewer responses using AIDev dataset.
6G wireless systems architecture using intent-driven autonomous agents for multi-dimensional objectives and evolving requirements.
Human-AI collaborative framework for constructing benchmark datasets to standardize ESG/sustainability rating methodologies.
Owen-value based method extending SHAP for hierarchical feature attribution in vision tasks with spatial/semantic dependencies.
Knowledge graphs capturing educational concept dependencies and prerequisites for personalized learning at scale.
Philosophical examination of generative AI's epistemic character and implications for knowledge production in science, education, and institutions.
Parallel algorithm for decomposing hard CircuitSAT instances using specialized constraints and hardness estimations.
JEPA-DNA: pre-training framework for genomic foundation models using joint-embedding predictive architecture to capture functional genomic context.
Texo: minimalist 20M parameter formula recognition model achieving state-of-the-art performance with 80% size reduction through distillation and transfer learning.
Framework for constructing symbolic causal world models online by integrating continuous model learning with meta-interpretive learning in agent decision loops.
Methodological experiment using AI agents in collaborative research workflows for humanities and social sciences, analyzing Taiwan Claude.ai usage data.
Framework for predicting consistent individual-specific human behavior in high-stakes environments by combining LLMs with psychological trait modeling.
Mechanistic interpretability study using linear probing and Bloom's Taxonomy to analyze cognitive complexity in LLM internal neural representations.
Framework for detecting and quantifying temporal data contamination in LLM backtesting to validate whether models leak post-cutoff training knowledge.
Web Verbs framework providing typed abstractions for reliable task composition on agentic web, enabling LLM-based web agents beyond low-level primitives.
Case study training 1.36B-parameter scientific language model from raw arXiv LaTeX sources, documenting end-to-end process for domain-specialized LM development.
MedClarify: LLM-based AI agent for medical diagnosis that iteratively asks follow-up questions to resolve diagnostic uncertainty through differential reasoning.
Method for disentangling task vectors in foundation models using Kronecker-factored approximate curvature without external task data.
Graph-based visual inference approach for complex image retrieval queries involving relationships, compositions, and precise constraints.
Contrastive Variational AutoEncoder for predicting NSCLC patient survival using multi-modal biomedical data with missing modalities.
Privacy-by-Design framework for LLM-based applications targeting children, addressing implementation gaps in privacy regulation compliance.
WarpRec framework bridging gap between research and production recommender systems with backend-agnostic architecture, 50+ algorithms, and 40 metrics.
Benchmarking framework for optimizing AI models on ARM Cortex embedded processors, measuring energy efficiency, accuracy, and resource utilization.
arXiv paper on applying LLMs to telecom domain using dynamic knowledge graphs and retrieval-augmented generation to reduce hallucinations and improve accuracy.
Evaluation framework for chain-of-thought reasoning quality using reusability and verifiability metrics in multi-agent IR pipelines.
KLong open-source LLM agent framework for extremely long-horizon tasks using trajectory-splitting SFT and progressive RL training.
ODESteer unified ODE-based framework for LLM alignment via activation steering with multi-step guidance.
Federated learning ensemble combining SWIN Transformer and CNN for lung disease diagnosis from medical imaging.
AI Gamestore platform for evaluating machine general intelligence using open-ended human games and dynamic benchmarks.
MolHIT hierarchical discrete diffusion model for molecular graph generation improving chemical validity for drug discovery.
AutoNumerics multi-agent framework autonomously designs, implements, and verifies numerical PDE solvers using AI.
CLEF HIPE-2026 evaluation lab for person-place relation extraction from historical multilingual texts.
Taxonomy and empirical study of GPU-accelerated graph-based approximate nearest neighbor search algorithms for large-scale applications.
APEX-SQL agentic framework for text-to-SQL that dynamically explores database schemas to resolve semantic ambiguity in complex enterprise environments.
Robustness evaluation of Mamba state-space models on medical imaging benchmarks under adversarial perturbations and corruptions.
Systematic evaluation showing AI safety datasets overrely on triggering cues and fail to reflect real-world adversarial attacks.
Production C++ implementation of deterministic semantic state substrate using graph engine architecture for inference systems.
Study quantifying stability of transformer attention heads across model instances to assess whether mechanistic interpretability circuits are universal.
Empirical study of comment-based adversarial attacks against LLM code vulnerability detection across Python, JavaScript, and Java.
DeepVision-103K dataset with 103K diverse mathematical problems for training multimodal LLMs using reinforcement learning with verifiable rewards.
PETS framework for efficient test-time scaling via principled trajectory allocation to improve LLM self-consistency under budget constraints.
Geometric analysis of transformer optimization dynamics showing grokking emerges in low-dimensional subspaces during modular arithmetic training.
LiveClin benchmark for evaluating medical LLMs using contemporary clinical case reports updated biannually to prevent data contamination.
Ontology standard for precision fermentation data in biofoundries to improve interoperability across platforms.
Attention mechanism for Wi-Fi indoor localization that weights router information appropriately during signal aggregation.