Spectral Compact Training: Pre-Training Large Language Models via Permanent Truncated SVD and Stiefel QR Retraction
Memory-efficient LLM training via truncated SVD factorization of weight matrices on consumer hardware.
Memory-efficient LLM training via truncated SVD factorization of weight matrices on consumer hardware.
Transformer-based framework for predicting immunotherapy response using biomarkers in small medical datasets.
Token pruning framework for vision-language models using attention dual form perspective without retraining.
Security analysis of backdoor attacks on language models using continuous latent reasoning without token output.
LLM pretraining at exascale using Aurora supercomputer with Mula-1B model and Optimus training library.
End-to-end autonomous driving model using 3D geometry instead of language descriptions for planning.
Bayesian inference framework for multi-dimensional emotion understanding accounting for dependencies among emotions.
Language agents with learnable adaptation policies that optimize test-time learning instead of using fixed hand-crafted policies.
Mixture-of-Experts architecture for actor-level stance detection in geopolitical text classification.
PixelPrune: adaptive visual token reduction for vision-language models using predictive coding for document and GUI tasks.
Dataset and analysis of autonomous coding agents' contributions in real-world projects, examining code quality and team dynamics over time.
Training-free canonical correlation analysis method for improving efficiency of pretrained image encoder representations.
DANCEMATCH framework for motion-based dance retrieval using quantized structure-preserving representations.
WARP: method for repairing adversarial vulnerabilities in transformer NLP models with provable inner-layer repair guarantees.
Reinforcement learning with flow-based policies and distributional RL for trajectory optimization in multi-solution problems.
Dignified Peer framework addressing evasive and sycophantic behavior in aligned LLMs through anti-sycophancy and empathy.
MyPhoneBench: evaluation framework measuring privacy-compliant behavior in mobile phone-use agents during task execution.
Multimodal pipeline analyzing state-funded news coverage of Israel-Hamas war on YouTube Shorts.
Egocentric world simulator generating interaction videos with persistent 3D scene state updates for embodied AI.
Query-conditioned keyframe sampling approach for long-form video understanding with multimodal LLMs using evidential reasoning.
OrgAgent: hierarchical multi-agent framework organizing LLM-based agents into governance, execution, and compliance layers for complex reasoning.
Transfer learning algorithms for nonparametric Bayesian network structure learning under limited data.
Method for fast probing of LLM downstream performance during training using metrics correlated with performance beyond training loss.
Controlled decomposition of multi-LLM revision pipelines to separate gains into re-solving, scaffold, and content components across benchmarks.
Study on popularity bias in recommender systems and alignment with user preferences for popular vs niche content.
Automated framework to evaluate and harden LLM system instructions against encoding-based attacks to prevent credential and policy leakage.
Study of adversarial attacks targeting AI-driven radio access network slicing systems and recovery mechanisms.
Security framework to detect and prevent vulnerabilities in AI-generated code through systematic verification of code safety gates.
Training-free detection method for partial audio deepfakes using speech foundation models without frame-level annotations.
Analysis of how LLMs use induction heads to track and retrieve information from context, revealing serial-recall patterns in in-context learning.
Algorithm for approximating Pareto frontiers in stochastic multi-objective optimization problems under uncertainty.
Study on how students' trust in AI assistants affects their reliance and critical evaluation of AI-generated output in educational settings.
Parameter-efficient adapter framework for adapting CLIP vision-language models to monocular depth estimation with minimal supervision.
PaperRecon evaluation framework for assessing quality and hallucination risks in AI-generated research papers from coding agents.
Generative approach for hyperspectral unmixing in remote sensing. Domain-specific to satellite imagery, not AI/LLM focused.
Brainstacks modular architecture for continual multi-domain LLM fine-tuning using MoE-LoRA stacks composing frozen adapters for domain expertise.
AdaLoRA-QAT framework for chest X-ray segmentation using low-rank adaptation and quantization-aware training. Medical imaging domain, not AI/LLM focused.
ORCA framework for test-time calibration of LLM reasoning using conformal prediction, improving efficiency of sampling-based scaling methods.
Multiscreen architecture introducing explicit query-key relevance rejection mechanism in attention, improving LLM discrimination of irrelevant information.
ROS 2 middleware integration for Florence-2 vision-language model in robotics systems, enabling local inference for robotic perception.
ORBIT dataset with 20K reasoning-intensive queries for training search agents combining LMs and web search, using verifiable generation methodology.
Agentic evolutionary framework for scientific algorithm discovery combining LLM-guided search with structured theory and code co-evolution.
Benchmark for evaluating LLM agents on long-term planning over one-year startup simulation with hundreds of turns, testing strategic coherence under uncertainty.
Mathematical framework analyzing AI weather prediction pipelines, emphasizing training methodology and data diversity over architecture choices.
Spatio-temporal dynamics reconstruction from sparse observations using shallow recurrent decoders. Domain-specific to complex systems, not AI/ML focused.
Method for unsupervised code correctness evaluation using LLMs through code comprehension before auditing, eliminating need for reference implementations.
Survey of agentic RAG systems combining LLMs with real-time retrieval to address static training data limitations and improve contextual accuracy.
Research on fine-tuning LLMs as agentic systems to handle exceptions and improve decision-making in complex real-world contexts.
Study on mitigating reasoning biases in LLMs through activation steering at inference time to improve logical validity discrimination.
Research evaluating LLM reasoning capabilities on real-world site selection tasks, testing if models like o1 and DeepSeek-R1 generalize beyond math/code domains.