Stable Differentiable Modal Synthesis for Learning Nonlinear Dynamics
Stable Differentiable Modal Synthesis for Learning Nonlinear Dynamics
Stable Differentiable Modal Synthesis for Learning Nonlinear Dynamics
Labels or Preferences? Budget-Constrained Learning with Human Judgments over AI-Generated Outputs
Distributional Computational Graphs: Error Bounds
A Feature Extraction Pipeline for Enhancing Lightweight Neural Networks in sEMG-based Joint Torque Estimation
Empirical Likelihood-Based Fairness Auditing: Distribution-Free Certification and Flagging
Are LLM Evaluators Really Narcissists? Sanity Checking Self-Preference Evaluations
Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training
Humanoid Manipulation Interface: Humanoid Whole-Body Manipulation from Robot-Free Demonstrations
When Evaluation Becomes a Side Channel: Regime Leakage and Structural Mitigations for Alignment Assessment
Reinforcement Inference: Leveraging Uncertainty for Self-Correcting Language Model Reasoning
Efficient reduction of stellar contamination and noise in planetary transmission spectra using neural networks
End-to-End Semantic ID Generation for Generative Advertisement Recommendation
Spend Search Where It Pays: Value-Guided Structured Sampling and Optimization for Generative Recommendation
GeoAgent: RL-based model for geolocation reasoning using fine-grained geographic characteristics and GeoSeek dataset with annotated chain-of-thought.
LLM-driven framework for automated recommender system design using directional feedback instead of scalar metrics to guide architecture evolution.
Technical exploration of LLM control methods beyond prompting, covering activation-level steering techniques. Advances understanding of model behavior.
Exercise recommendation system with equipment constraints and personalization; specialized ML application.
Data engineering patterns for aggregating and normalizing fitness data from multiple sources; practical pipeline work.
Examines model drift through governance perspective, distinguishing behavioral drift from statistical drift in ML systems.
DAG pipeline management tool. Minimal information; not AI-specific or clearly relevant.
Analysis of RL fine-tuned VLMs showing vulnerability to textual perturbations and weak visual grounding despite improved visual reasoning benchmarks.
Performance optimization of llama.cpp on specialized hardware achieving 8.8x speedup. Relevant for efficient LLM deployment strategies.
Frankenstein-style analysis framework isolating specific visual reasoning improvements from RL versus supervised fine-tuning in vision-language models.
Novel generative framework for creating vector sketches that transform semantically through progressive stroke addition, addressing dual-constraint optimization challenges.
T3D proposes trajectory self-distillation framework to enable fast parallel token decoding in diffusion LLMs with fewer refinement steps while maintaining generation quality.
DeepGen 1.0 is a lightweight 5B unified model for image generation and editing using Stacked Channel Bridging, achieving competitive performance to larger models with reduced deployment costs.
ExStrucTiny benchmark evaluates VLM performance on schema-variable structured information extraction from diverse enterprise documents with flexible schemas.
Sci-CoE framework enables LLMs to self-improve on scientific reasoning through co-evolution as both solver and verifier with geometric consensus mechanisms.
dVoting fast voting technique for diffusion LLMs enabling parallel test-time scaling for improved reasoning performance.
Theoretical and empirical analysis of on-policy distillation as dense KL-constrained RL, proposing generalized reward extrapolation.
P-GenRM enables personalized LLM alignment through scenario-specific reward models with test-time user-based scaling, addressing generalization to new users with limited feedback.
StateLM foundation model framework giving LLMs agency to manage their own context and memory via database operations.
GigaBrain-0.5M VLA model trained via world model-based reinforcement learning for improved multi-step action prediction.
DeepSight is a unified toolkit for LLM/MLLM safety covering workflow, evaluation, diagnosis, and alignment with integrated explainability and risk scenario grounding capabilities.
LawThinker autonomous legal research agent using Explore-Verify-Memorize strategy with intermediate step verification in dynamic environments.
Composition-RL optimizes RLVR training by composing verifiable prompts to balance hard and easy examples, mitigating ineffective data and enabling better prompt dataset expansion.
Opinion piece on chatbot limitations and technology evolution. Commentary without technical substance or concrete insights.
Gaia2 benchmark for evaluating LLM agents in realistic, asynchronous, dynamic environments with temporal constraints and collaboration.
TADA: activation steering technique for audio diffusion models to control semantic musical concepts through shared attention layers.
Adaptive framework for intelligent AI agent delegation across decomposed sub-tasks with dynamic adaptation to environmental changes and failure handling.
Region-to-Image Distillation: reduces latency in multimodal LLMs' fine-grained perception by distilling zooming behavior into inference-time efficiency.
Detection method for identifying RLVR training data contamination via structural convergence signatures in reasoning trajectories, addressing benchmark contamination concerns.
Light4D: training-free framework for 4D video relighting under extreme viewpoints using diffusion models with temporal consistency.
MiniCPM-SALA hybrid sparse-linear attention architecture for efficient long-context LLM processing in 9B parameter model.
Code2Worlds: extends coding LLMs to 4D world generation with dynamic physics simulation by addressing multi-scale context and semantic-physical execution gaps.
Framework enabling LLMs to perform in-context exploration through length-incentivized RL, allowing models to generate, verify, and refine multiple reasoning hypotheses within continuous context.
Large-scale study on adapting general-purpose VLMs to e-commerce attribute understanding while preserving generalizability across multi-image noisy product data.
Diffusion language models tailored for CUDA kernel code generation, addressing data scarcity and specialization challenges through parallel token generation approach.
ThinkRouter framework routing reasoning between latent and discrete spaces to improve efficiency based on model confidence dynamics.
ScalSelect training-free data selection method for efficient visual instruction tuning of vision-language models.