Techniques for improving data efficiency in LLM RL fine-tuning using difficulty-targeted online selection and rollout replay.
RuleReasoner method combining RL with domain-aware sampling for robust rule-based reasoning across varying rule formats and complexity.
SVD-based quantization method for compressing delta parameters from LLM fine-tuning with analysis of underlying compression mechanisms.
Survey of foundation models for autonomous driving focusing on scenario generation and analysis for simulation-based testing.
HYPER foundation model for inductive link prediction with knowledge hypergraphs, generalizing to novel entities and relation types.
NeuronSeek framework using symbolic regression to discover and construct neural networks with optimized task-driven neurons.
DeepRare multi-agent system using LLMs with traceable reasoning for differential diagnosis of rare diseases through agentic workflow.
Vision Transformer architecture with shared encoder and multi-decoder for climate model downscaling as efficient alternative to regional climate models.
Study on optimal ordering of chain-of-thought reasoning steps in Transformers for mathematical tasks, showing significant impact on reasoning difficulty.
SPATIA multimodal generative model for analyzing spatial transcriptomics data combining cell images, gene expression, and spatial context.
Weighted policy optimization method for improving reasoning in diffusion-based LLMs through RL without requiring exact likelihood computations.
Machine learning approach for predicting low-altitude network coverage using disentangled representation learning on base station operational parameters.
DeepRobot system uses LLMs for robotic task planning with verbal RL feedback loop to align models with real-world robot embodiment and constraints.
Virne benchmark framework for evaluating deep RL methods on network resource allocation in Network Function Virtualization infrastructure.
GEPA uses genetic algorithms and Pareto optimization for prompt evolution as alternative to RL fine-tuning of LLMs, achieving better performance with fewer rollouts.
FGBench dataset and benchmark for evaluating LLM reasoning on molecular property prediction at functional group level with structure-aware interpretability.
Framework integrating Security Chaos Engineering with Breach Attack Simulation platforms for testing organizational cyber defenses.
Research investigating unintended deception in LLMs on benign prompts without explicit hidden objectives, revealing trustworthiness risks in reasoning tasks.
DeepLight deep learning architecture for lightning prediction with Hazy Loss function to account for prediction uncertainty.
Agricultural segmentation framework using hierarchical DINOv2 models for robust plant species and damage detection across devices, seasons, and sensors.
Diagnostic framework for evaluating synthetic dialogue generation for contact centers using structured supervision on call attributes.
Framework extending CybORG environment for training RL agents in autonomous cyber operations with production-worthy simulation accuracy.
Interpretability study brain-scanning LLMs to identify economic concepts guiding financial forecasts and map relative importance without performance reduction.
Adaptive Resampling-based Training method dynamically adjusts training data distribution based on per-class learning difficulty for imbalanced classification.
Integration of LLMs with RL for autonomous cyber operations, using LLM pre-trained knowledge to augment agent decision-making and reduce exploration cost.
RL-based market making strategy modeling limit order book dynamics as stochastic control problem for algorithmic trading.
Lightweight satellite hyperspectral image segmentation using curriculum multi-task self-supervision for onboard processing.
Theoretical analysis of oracle complexity for finding Pareto stationary points in smooth multiobjective optimization problems.
DiffusionNFT introduces online reinforcement learning for diffusion models using forward process, addressing limitations in post-training diffusion model optimization.
Interpretability research tracking feature evolution during language model pre-training using sparse dictionary learning (crosscoders) to understand capability emergence.
Research platform for exploring human-agent collaboration using LLM agents, applying principles from human-mediated computer collaboration to human-LLM partnerships.
AECBench benchmark evaluates LLM robustness and reliability in Architecture, Engineering, and Construction domain with hierarchical knowledge evaluation.
Research showing activation steering technique for controlling LLM behavior systematically breaks model alignment safeguards and makes models comply with harmful requests.
ArXiv paper proposing differentially private two-stage gradient descent algorithm for instrumental variable regression with privacy-utility tradeoffs.
ArXiv paper proposing ReliabilityRAG, provably robust defense against prompt injection and retrieval corpus attacks on RAG-based web search systems.
ArXiv paper introducing LLM DNA method for tracing evolutionary relationships between models via functional representations without task-specific constraints.
ArXiv paper proposing VoiceBridge, one-step latent bridge model for general speech restoration from diverse distortions at 48 kHz fullband quality.
ArXiv paper introducing BiasFreeBench, standardized benchmark for evaluating and comparing bias mitigation methods in LLM responses with consistent metrics.
ArXiv paper introducing EAPrivacy benchmark for measuring physical-world privacy awareness of LLM-powered embodied agents in procedurally generated scenarios.
ArXiv paper studying optimal placement of PDE diffusion layers in hybrid transformer architectures to add local geometric priors along sequence axis.
ArXiv paper proposing RACE Attention, a strictly linear-time attention mechanism enabling long-sequence training beyond quadratic softmax attention limitations.
ArXiv paper introducing SECA, method for eliciting LLM hallucinations using semantically equivalent and coherent adversarial attacks to test reliability.
ArXiv paper addressing data-driven inverse optimization for mixed-integer linear programs by learning both constraints and objective functions from observed decisions.
ArXiv paper comparing context management strategies for end-to-end spoken dialogue state tracking using Speech-LLMs on SpokenWOZ corpus.
ArXiv paper providing theoretical analysis of sample complexity in discrete-state diffusion models for text, sequences, and combinatorial structures.
ArXiv paper examining benchmarking challenges for Time Series Foundation Models, addressing test set integrity issues as training corpora grow large.
ArXiv paper introducing framework for tracing and steering algorithmic primitives underlying LLM multi-step reasoning by linking reasoning traces to internal activations.
ArXiv paper providing first theoretical analysis of low-bit quantization effects on learning performance in high-dimensional linear regression settings.
ArXiv paper proposing top-down semantic refinement technique for improving image captioning quality in Vision-Language Models through multi-step generation.
ArXiv paper identifying critical correctness violations in existing batch speculative decoding implementations and proposing fixes to ensure output equivalence with standard autoregressive generation.