Just Use XML: Revisiting Joint Translation and Label Projection
Framework for joint machine translation and cross-lingual label projection using XML tags; addresses degraded translation quality in prior combined approaches.
Framework for joint machine translation and cross-lingual label projection using XML tags; addresses degraded translation quality in prior combined approaches.
Security analysis of compound AI systems combining LLMs, software tools, and databases; identifies vulnerabilities from traditional software stack and hardware layers.
Slow-Fast Inference: training-free decoding acceleration leveraging stable attention support patterns within semantic spans during autoregressive generation.
Training-free visual generation via weighted h-transform sampling for coarse-guided synthesis using pretrained diffusion models.
Mathematical proof that chemical reaction networks without hidden layers outperform spiking neural networks on certain classification tasks.
Taxonomy of structured operators beyond convolution for learning-based image processing capturing low-rank decompositions and adaptive representations.
LoV3D: vision-language pipeline for cognitive prognosis from longitudinal brain MRI via regional volume assessments and grounded reasoning.
Multi-label temporal convolutional framework for transcription factor binding site prediction analyzing TF interactions in gene regulation.
LLM-based neural architecture search with feedback memory: closed-loop pipeline for iterative CNN design on consumer GPU without fine-tuning.
LMP2: browser-based self-audit tool for inspecting LLM associations with individuals, with user study findings on privacy and model behavior.
Minimax deep deterministic policy gradient: RL algorithm for stable performance under external disturbances using fractional objectives.
SommBench: multilingual benchmark assessing LLM capabilities in sommelier expertise, evaluating cultural knowledge beyond linguistic encoding.
CRAFT: tendon-driven anthropomorphic robotic hand with hybrid hard-soft compliance for contact-rich manipulation tasks.
Agent-assisted code generation for translating complex RL environments into high-performance implementations with iterative repair and verification.
FlashMotion: few-step trajectory-controllable video generation using distillation to reduce multi-step denoising overhead.
IsoCompute Playbook: scaling laws for optimal compute allocation in LLM reinforcement learning post-training across rollouts, batches, and update steps.
GlyphBanana: agentic workflow for precise text and formula rendering in generative models, addressing out-of-distribution instruction-following challenges.
Theoretical analysis of catastrophic forgetting in continual post-training of generative models, formalizing mass forgetting and component drift mechanisms.
BehaviorVLM: vision-language framework for animal pose estimation and behavioral understanding without human annotation using finetuning-free approach.
MADQA: benchmark with 2,250 questions over 800 PDFs evaluating whether multimodal agents use strategic reasoning or stochastic search in document-intensive workflows.
Proof-Carrying Materials: falsifiable safety certificates for machine-learned interatomic potentials in materials screening via adversarial falsification.
RDNet: adaptive salient object detection network for remote sensing images using dynamic convolution kernels.
LLM-driven system for advancing interdisciplinary scientific research through exploration and reasoning rather than rapid experiment design.
Neural Thickets shows task-specific expert solutions exist near pretrained weights and can be discovered through structured optimization.
Perplexity's security analysis and recommendations for frontier AI agents based on operating general-purpose agentic systems at scale.
Method to accelerate neural network verification by reusing learned conflicts across related queries instead of solving each independently.
SciMDR benchmark and synthesize-and-reground framework for scientific multimodal document reasoning datasets balancing scale, faithfulness, and realism.
Domain-independent dynamic programming paradigm decoupling modeling from solving combinatorial optimization problems with problem-agnostic approach.
Inference-time approach for aligning diffusion models with multiple conflicting objectives and varying user preferences without retraining.
Goal-Oriented Graphs framework enhancing LLM procedural reasoning in interactive environments like Minecraft through improved knowledge retrieval.
Multi-agent system orchestrating collaborative design review where agents analyze graphics holistically with novel exemplar selection approach.
Evaluation framework for ICD medical coding using LLM-guided learning and systematic assessment of model rationales in healthcare.
Study on whether next-token prediction yields usable world models, introducing STRIPS Transformer for symbolic planning from action traces.
Open-source CodeEvolve framework combining LLMs with evolutionary algorithms for algorithmic solution synthesis and optimization.
Jr. AI Scientist autonomous system that mimics novice researcher workflow for AI-driven scientific discovery with risk assessment capabilities.
Machine learning technique using hyperbolic kernels for hierarchical data representation with improved capacity through kernel modulation.
Mobile-Agent-RAG system combining multi-agent coordination with retrieval-augmented generation for long-horizon mobile automation tasks on UI.
Agentic XAI approach using LLM agents to translate technical explanations into accessible narratives for improving trust in AI predictions.
Agentic Learning Ecosystem (ALE) infrastructure for end-to-end agent development, enabling LLMs to operate in real-world environments with iterative refinement.
Multi-agent reinforcement learning system exploring width scaling for broad information seeking tasks, addressing organizational capability bottlenecks.
Temporal knowledge graph forecasting method using entity state tuning to model structural and temporal dependencies without episodic amnesia.
Benchmark and execution environment evaluating AI agents on end-to-end research tasks using containerized ICML/ICLR/ACL paper repositories with 39 sub-tasks.
Research on chain-of-thought reasoning failures in LLMs when scaling compute budgets, proposing limited reasoning space as explanation for performance collapse.
Neural network representations of music and brain activity for EEG-based music identification using acoustic and expectation signals.
Reinforcement learning method for LLM-based agents using retrospective feedback to enable continual adaptation and experiential learning.
Open-source framework for locally-hosted LLM-based agents that autonomously operate computing environments and orchestrate workflows.
Agentic framework for multi-step reasoning over complex tabular data with hierarchical headers using closed-loop decision-making.
EvalAct: Method for retrieval-augmented agents using self-evaluated process rewards to optimize multi-step reasoning via explicit quality assessment actions.
Omni Parsing: Framework for multimodal parsing across documents, images, audio-visual with unified taxonomy and hierarchical levels.
CUAAudit: Meta-evaluation framework for vision-language models as auditors of autonomous desktop computer-use agents.