Reinforcement learning approach combining vision-language models with neuroscience-inspired reward signals for safe autonomous driving without manual reward engineering.
Evaluation framework (VCoT-Bench) measuring LLM reasoning ability for Rust program verification through intermediate verification steps, not just pass/fail outcomes.
Uncertainty quantification method for Vision-Language-Action robotic models that detects safety-critical moments during continuous control rather than averaging uncertainty signals.
PowerFlow: principled RLIF framework for unsupervised LLM capability elicitation via distribution matching instead of heuristic rewards.
Privacy-preserving LLM agent planning via abstractions preventing exposure of local environment data to cloud services.
Game theory: evolutionarily stable Stackelberg equilibrium solution concept with leader-follower dynamics.
Neural potential with SO(3) equivariance for molecular systems with long-range electrostatic interactions.
Token-level Adaptive Routing: inference-time alignment method for freezing LLMs toward structured reasoning without post-training.
Economics study analyzing spillover effects of AI washing in corporate sustainability claims via semantic analysis.
Automated hyperparameter optimization framework for sparse attention mechanisms using Bayesian optimization and multi-fidelity search.
Benchmark evaluating large vision-language models on rare skin disease diagnosis with long-context reasoning.
Economics paper analyzing corporate AI washing claims and impact on farmers' fintech adoption using CHFS data.
Synthetic data augmentation using generative models for semantic segmentation balancing reliability and diversity.
RL-based adaptive decoder for LLMs that learns task-specific generation policies at test-time for improved output quality.
Sample-efficient reinforcement learning with verifiable rewards for improving LLM reasoning with Bayesian reward estimation.
Research task: automatically extracting and querying structured databases from open web sources for analytical questions.
Hypergraph neural network for medication recommendations leveraging patient relationships and clinical history.
ML research on prostate cancer detection using Vision Transformers on small 162-image dataset with transfer learning.
WASD framework identifies critical neurons as sufficient conditions for explaining and controlling LLM behavior with natural language directives.
Evaluation of vision-language models on inferring human engagement from gameplay video across multiple prompting strategies and games.
Video compression method using diffusion models with sparse information transmission to improve perceptual quality at ultra-low bitrates.
Handbook formalizing AI architectures for motor insurance, covering perception, multimodal reasoning, and production infrastructure for risk assessment.
CAFlow framework applies adaptive-depth flow matching for efficient histopathology image super-resolution with reduced computational costs.
Mechanistic study of how large vision-language models implement counting behavior, combining synthetic benchmarks with interpretability analysis.
ICE-Guard framework detects spurious feature reliance in LLMs for high-stakes decisions through intervention consistency testing on demographic, authority, and framing biases.
Method for scaling vision-language-action robot learning using generative 3D worlds to address sim-to-real gap.
SCISSR: Scribble-based interactive framework for surgical scene segmentation using SAM-style prompting.
CoDA explores adversarial attacks on medical vision-language models and proposes token-space repair methods.
HiMu hierarchical frame selection method for long video question answering with vision-language models.
Study showing Transformers learn robust in-context regression under distributional uncertainty without restrictive assumptions.
SpecForge: Open-source production framework for training draft models used in speculative decoding to reduce LLM inference latency.
ICE framework evaluates LLM explanation faithfulness using statistical intervention testing with randomization baselines.
Systematic analysis and improvements to Elastic Weight Consolidation for continual learning to better estimate weight importance.
Benchmark comparing PETNN, KAN, and classical deep learning models on myMNIST Burmese handwritten digit recognition dataset.
AutORAN uses LLMs for natural language programming to simplify xApp development in Open Radio Access Networks.
LSE framework trains LLMs to self-improve during inference by iteratively refining context based on problem feedback.
OpenT2M: Million-scale open-source dataset with 2800+ hours of motion data for text-to-motion generation in animation and robotics.
REST algorithm for zero-shot object-goal navigation using receding horizon planning and Steiner trees for generating subgoal candidates in unknown environments.
Anderson-Darling leakage assessment method for detecting side-channel leakage in neural networks, improving on TVLA's mean-based approach.
Benchmarking framework for PDF table extraction using LLM-based semantic evaluation on synthetically generated PDFs with LaTeX ground truth.
SSL framework for medical ultrasound image segmentation using contrastive learning with multiscale switching to handle limited labeled data and imaging artifacts.
Mathematical framework distinguishing cognitive amplification from cognitive delegation in human-AI systems for measuring AI impact on human reasoning.
HISR framework improving multi-turn agentic reinforcement learning through hindsight information modulation and segmental process rewards for complex long-horizon tasks.
Neuro-symbolic sim2real image translation framework using structured ontology-guided diffusion for zero-shot domain transfer without labeled real data.
CausalRM method for learning reward models from observational user feedback (clicks, upvotes) as scalable alternative to controlled RLHF annotation.
Study measuring confirmation bias in LLM-based security code review systems and its exploitability in software supply-chain attacks.
Weakly supervised method for generating natural language explanations in chest X-ray classification without explicit explanation annotations.
Ablation study of Group Relative Policy Optimization components for LLM reasoning training, questioning necessity of complex loss functions.
ClawTrap MITM-based red-teaming framework for evaluating security robustness of autonomous web agents like OpenClaw against network-layer threats.
AutoPipe framework for automated configuration of LLM post-training pipelines combining supervised fine-tuning and reinforcement learning under budget constraints.