Boosting Large Language Models with Mask Fine-Tuning
Mask Fine-Tuning (MFT) introduces a novel LLM fine-tuning method that improves performance by selectively masking model components without updating weights.
Mask Fine-Tuning (MFT) introduces a novel LLM fine-tuning method that improves performance by selectively masking model components without updating weights.
MegaScale-Data addresses computational challenges in training large foundation models from multiple data sources by optimizing dataloader distribution across parallel ranks.
Credit assignment method (QLLM) for multi-agent RL eliminating predefined mixing networks through improved value decomposition and interpretability.
Nemotron-CrossThink extends RL-based self-learning from math reasoning to broader domains using verifiable reward structures and diverse tasks.
PCCL library for performant collective communication in distributed AI training on GPU supercomputers, addressing NCCL limitations.
Typology and analysis of synthetic datasets for clinical dialogue processing, addressing privacy and data governance challenges in healthcare NLP.
Aitomia platform combining LLM-based agents and chatbots to assist with atomistic and quantum chemical simulations setup and analysis.
Analysis of transduction and induction as complementary reasoning paradigms in programming-by-example and few-shot learning contexts.
VideoSafetyEval benchmark with 11.4k video-query pairs across 19 risk categories for evaluating and defending Video LLM safety.
Method for improving LLM reasoning without expensive RL or high-quality demonstrations using weak supervision and incentive signals.
Inference-time alignment method for LLMs that searches in continuous response space using reward models for improved exploration.
SVD-based compression method (ERC-SVD) for efficient LLM deployment with error control and low-rank approximation techniques.
Analysis of implicit regularization in overparametrized deep neural networks and improved out-of-distribution generalization via variational methods.
Graph-search path planning algorithm (UPP) that dynamically balances safety and optimality for autonomous robot navigation.
Gesture recognition system (DiG-Net) for human-robot interaction enabling long-range dynamic hand gesture recognition for assistive robotics.
Dynamic benchmark framework (NetArena) for evaluating AI agents in network automation with production-level complexity and reduced contamination risk.
Adaptive multi-objective reinforcement learning method for balancing exploration and skill diversity in skill-based RL pretraining.
Benchmark for evaluating multimodal LLM-based front-end code generation with modern development frameworks and evaluation metrics.
Curriculum learning approach scheduling tasks from easy to hard to improve LLM reasoning via reinforcement learning, inspired by DeepSeek-R1.
BIS Reasoning 1.0: Japanese benchmark with 1K+ syllogistic problems evaluating belief bias and inconsistent reasoning in LLMs.
Video-guided post-ASR correction for TV series speech recognition handling multiple speakers and domain-specific terminology.
AVA-Bench: systematic evaluation benchmark for vision foundation models addressing blind spots in VQA evaluation protocols.
Lightweight intrusion detection system for early APT detection using novel feature selection method, not AI/ML focused.
TRACED: unsupervised environment design using regret approximation for co-learning to improve deep RL agent generalization.
Rationale-Enhanced Decoding improves chain-of-thought prompting in vision-language models by optimizing intermediate reasoning generation.
Lumos-1: LLM-based autoregressive video generation using discrete diffusion with efficient architecture avoiding external encoders.
Content-style decomposition in visual autoregressive models enabling recontextualization and stylization for creative image synthesis.
SOAR: self-improving method integrating language models into evolutionary program synthesis for challenging tasks like ARC-AGI.
FingerTip 20K: benchmark for proactive mobile LLM agents with 20K tasks, evaluating multimodal agents using contextual data without explicit instructions.
Neural Combinatorial Optimization solver for min-max heterogeneous vehicle routing with multiple vehicles using novel decoding approach.
Method extending monocular depth estimators from perspective to fisheye cameras using calibration tokens for covariate shift alignment.
EvolvR: self-evolving method for story evaluation using LLM-as-judge with pairwise reasoning to improve generation guidance.
Novel benchmarking system evaluating LLM-based agent capabilities for single-cell omics data analysis, assessing planning and code generation.
Systematic study of post-training quantization methods for diffusion LLMs to enable edge device deployment, comparing compression techniques.
UTRL: reinforcement learning framework training LLMs to generate high-quality unit tests automatically, addressing test generation challenges.
Research evaluating Law-Following AI framework for embedding legal compliance in advanced AI agents, analyzing legal personhood constructs and technical feasibility.
Trajectory-based paradigm for efficient 3D point cloud tracking in robotics and autonomous systems.
Reinforcement learning approach for radiology report generation using FactScore-based rewards with reduced data requirements.
Framework evaluating robustness of Vision-Language-Action models under real-world physical variations for robotic tasks.
Slovak parliamentary speech corpus with 66M words and fine-tuned ASR models for low-resource language recognition.
Data-efficient ASR personalization using phoneme-level uncertainty scoring and variational inference. Guides fine-tuning for non-normative speech recognition.
Variational low-rank adaptation method for personalizing speech recognition on impaired speech using foundation models. Addresses acoustic variability and data scarcity.
Autoregressive method for autonomous driving combining HD map construction with persistent traffic rule awareness across extended driving sequences.
Matched-compute study evaluating synthetic data interventions for in-context learning in language models. Tests mechanism-targeted pretraining effects.
Method for reducing LLM agent inference costs through trajectory reduction. Addresses token cost efficiency in multi-turn agent systems for software engineering.
Continuous-time reinforcement learning theory with deterministic policy gradients for continuous state and action spaces.
Deep reinforcement learning optimization method using eigenspectrum analysis and condition numbers to improve sample efficiency and critic network performance.
Technique reducing LLM reasoning model overthinking through decoupled rewards and curriculum scheduling. Addresses excessive token generation without performance gain.
Variational flow matching approach for vector-quantized image generation combining categorical supervision with continuous transport dynamics.
Diffusion model method for diverse text-to-image generation via contrastive noise optimization. Addresses mode collapse in text-guided image synthesis.