The Chicken and Egg Dilemma: Co-optimizing Data and Model Configurations for LLMs
Method for jointly optimizing data mixture and model architecture configurations during LLM training to avoid suboptimal individual choices.
Method for jointly optimizing data mixture and model architecture configurations during LLM training to avoid suboptimal individual choices.
Diffusion-guided pretraining approach for brain graph foundation models using semantic-aware augmentation strategies.
Automated black-box pipeline for detecting unverbalized biases in LLM reasoning traces and chain-of-thought explanations.
Universal diffusion-based framework for converting low-resolution weather forecasts to probabilistic high-resolution predictions without model fine-tuning.
Analysis showing logit distance bounds representational similarity in discriminative models including autoregressive language models.
Large-scale intracranial EEG dataset and benchmarks for epilepsy research and seizure localization using data-driven approaches.
HPMixer model for long-term multivariate time series forecasting using hierarchical patching to capture periodic patterns and residuals.
ART-based topological clustering algorithm that eliminates need for manual parameter tuning and supports continual learning.
Online learning theory framework for contextual brokerage between traders with sequential asset trading decisions.
Theoretical analysis of linear-to-nonlinear transition in random feature models under spiked covariance and input-label correlation.
Framework for embodied AI agents to infer user goals from open-ended dialog using LLMs for efficient task accomplishment.
Knowledge distillation pipeline to compress Dust3r foundation model for efficient 3D reconstruction and visual localization.
Perceiver architecture for auto-regressive language modeling reducing attention complexity from quadratic to semi-linear.
Model-based data filtering framework for multilingual LLM pretraining that identifies diverse, high-quality training samples.
Combining Self-Organizing Maps with Vision Transformers to improve performance on smaller datasets through explicit inductive biases.
Learning user-specialized reward models for reinforcement learning from human feedback to capture individual preference disagreement.
Certified defense against backdoor attacks in deep neural networks using sample-specific smoothing noise.
Framework for handling unstructured data feature extraction with neural networks while accounting for measurement bias in economic analysis.
ReplaceMe: training-free depth pruning method replacing transformer blocks with linear operations for efficient model compression.
Method for inferring entropy production in high-dimensional stochastic systems using nonequilibrium maximum entropy principle.
LLM fingerprinting via semantically conditioned watermarks that survive finetuning and quantization without being easily detected.
∞-THOR framework for long-horizon embodied AI tasks with Needle(s) in Embodied Haystack benchmark for testing long-context reasoning in agents.
Using persona-driven prompting to simulate European Parliament voting behavior with LLMs, addressing political bias in model responses.
Review of nonlinear model order reduction methods for creating computationally efficient dynamical system models in process engineering.
SPECS: method for faster test-time scaling in LLMs through speculative drafts, balancing reasoning accuracy with user-facing latency.
Bongard Problems benchmark using real-world images to test abstract visual reasoning and fine-grained concept identification in models.
Proposes inference-time search algorithm that guides diffusion model sampling with side information for improved image reconstruction in inverse problems.
LayerSync regularizes diffusion models using their own intermediate layer representations to improve generation quality and training efficiency.
Uses LLMs for automated assessment of critical thinking skills in educational contexts, addressing evaluation of evidence and claim reliability.
Benchmarks inference latency of 190 Vision Transformers on mobile devices compared to CNNs, analyzing architectural factors affecting performance.
Extends wireless foundation models to accept multiple input modalities for improved task performance and adaptation across varying conditions.
Applies transformer-based deep learning for joint source-channel coding of non-uniformly distributed HARQ-ACK bits in wireless communications.
Introduces Block-Recurrent Hypothesis explaining Vision Transformer depth as block-recurrent computational flow for mechanistic interpretation.
Proposes world model approach for offline multi-agent RL using local-to-global puzzle solving to overcome conservative policies and improve generalization.
Framework for treating representation reliability as a first-class property in machine learning, beyond traditional predictive uncertainty quantification.
Proposes implementing optical neural networks using linear optical resources and phase-shift encoding for neuromorphic machine learning hardware.
SpikeScore method for detecting LLM hallucinations that generalizes across domains, addressing the gap in cross-domain hallucination detection for real-world deployment.
Theoretical analysis showing fixed-budget and fixed-confidence best-arm identification in K-armed bandits have equivalent optimal sample complexities up to logarithmic factors.
Mathematical study of homology in ample groupoids using Moore complexes and continuous étale homomorphisms, with Mayer-Vietoris sequences.
Proposes exploration-exploitation optimization for dataset distillation to compress large datasets into synthetic versions while maintaining model performance.
Addresses temporal leakage in clinical NLP models for discharge planning, proposing methods to prevent overconfident predictions from deployment artifacts.
Conjugate learning theory framework characterizes trainability and generalization of deep neural networks using convex duality and mini-batch SGD analysis.
CoreCraft is a high-fidelity enterprise RL simulation environment with 2,500+ entities and 23 tools for training generalizable AI agents in customer support scenarios.
STING benchmark measures how LLM agents can be misused over multiple turns and across languages to assist with illegal tasks, testing multi-step harmful goal execution.
RoboGene uses an agentic framework to automatically generate diverse robotic manipulation tasks for training vision-language-action models, addressing data scarcity in robot learning.
Google announces AI Impact Summit 2026 partnerships and investments for broad AI adoption, marketing content with limited technical depth.
Berean Labs open-source autonomous AI penetration testing tool for detecting client-side vulnerabilities, exposed secrets, and web app misconfigurations.
SQL-tap: transparent SQL proxy with new browser-based Web UI for real-time query inspection, EXPLAIN, filtering, and analysis.
California legislative bill on 3D printer firearm prevention technology, not AI-related.
User asks about AI video editing tools availability. Low-effort question without substantive content or research.