To Use or not to Use Muon: How Simplicity Bias in Optimizers Matters
Analysis of Muon optimizer's simplicity bias and potential downsides compared to Adam for neural network training.
Analysis of Muon optimizer's simplicity bias and potential downsides compared to Adam for neural network training.
Generalizes Bayesian Flow Networks with arbitrary divergence/distance functions replacing fixed KL divergence, enabling broader belief-update operators.
Combines partly conditional modeling with machine learning to identify patient response subgroups in colorectal cancer clinical trials using repeated measures.
Self-supervised learning framework for resting-state fMRI using masked reconstruction with cross-attention for interpretable brain network representation learning.
Proposes geo-foci model for identifying salient geographical locations in US local news coverage to address economic pressures on local journalism.
Neural approach to fluid-solid interaction using latent arbitrary Lagrangian-Eulerian grids for capturing two-way nonlinear interactions in FSI problems.
Studies identification problem in adversarial multi-armed bandits for selecting arms performing best at future time with accuracy and memory bounds.
Demonstrates membership inference attacks against data curation methods used for private ML, showing curation-based privacy solutions leak training data membership.
Develops gauge-theoretic framework for superposition in LLMs using sheaf-theoretic atlas replacing single-global-dictionary with local semantic charts and Fisher metrics.
Graph Neural Network-based recommendation system for multimodal data handling content-based recommendations with user preference incorporation.
Novel method for counterfactual learning in multivariate time series using genetic algorithms to uncover causal relationships and interventions.
Presents MultiPUFFIN, a domain-constrained multimodal foundation model for molecular property prediction ensuring thermodynamic consistency across chemical space.
Introduces Active Flow Matching combining discrete diffusion/flow models with variational frameworks for black-box optimization without retraining.
Evaluates performance misalignment between leading LLMs on out-of-distribution educational tasks, finding inter-model behaviors correlate higher than with human expert behaviors.
Addresses uncertainty awareness in deep sequence models by integrating Bayesian methods with probabilistic learning, comparing approximate inference techniques.
Large-scale empirical study of AI grading on handwritten calculus work using OCR-conditioned LLMs with rubric-guided prompting for score and feedback generation.
Proposes dual-learner framework combining fast learner and meta learner for continual RL, inspired by hippocampus-cortex interaction for knowledge transfer and integration.
Investigates margin clamping effects on training variance in Contrastive Forward-Forward learning for Vision Transformers, addressing instability sources.
Analyzes stability gap between supervised fine-tuning and RL in LLM training from gradient perspective, showing logits convexity role in stable optimization.
Proposes Intent-Context Synergy RL approach for autonomous UAV decision-making in contested environments, addressing trade-offs between mission efficiency and survivability.
arXiv 2603.00975: Representation interference framework for selective unlearning in text-to-image diffusion models preserving quality.
arXiv 2603.00992: Machine unlearning method for diffusion models removing sensitive concepts via mutual information elimination.
arXiv 2603.01013: Feature-weighted subsampling algorithm for debiasing studies when only subset of features are highly biased.
arXiv 2603.01025: One-token verification method for estimating correctness in LLM reasoning with reduced computational cost.
arXiv 2603.01040: Fed-ADE for federated learning adaptation under distribution shifts without ground-truth labels.
arXiv 2603.01047: GFlowNet training improvements via partial episodes for stable policy-based sampling of combinatorial candidates.
arXiv 2603.01052: CausalSAGE framework for refining causal discovery PAGs into DAGs by breaking symmetries.
arXiv 2603.01064: Level-wise training for neural multigrid smoothers applied to discretized integral equations.
arXiv 2603.01097: Empirical analysis of LoRA as parametric knowledge memory for continuous LLM updates without context constraints.
arXiv 2603.01137: Deep learning framework for heat demand forecasting in district heating systems using time-frequency features.
arXiv 2603.01162: Theoretical analysis of GRPO through U-statistics lens, core method in DeepSeekMath and DeepSeek-R1 for LLM reasoning.
arXiv 2603.01168: SphUnc framework combining hyperspherical representation learning with causal modeling for uncertainty decomposition.
arXiv 2603.01171: PARWiS algorithm for winner determination via active pairwise comparisons with reinforcement learning variant.
arXiv 2603.01184: Theoretical analysis of learning time trade-offs for high-dimensional neural network inputs.
arXiv 2603.01193: Neural PDE solver training using Monte Carlo weak supervision via walk-on-spheres method.
arXiv 2603.01204: Research on LLM-as-judge frameworks showing preference labels can function as covert communication channels between models.
arXiv 2603.01223: RL method for LLM mathematical reasoning using reference solutions to overcome reward sparsity in hard problems.
Discussion on semantic versioning strategy for Typst markup language and decision to remain pre-1.0.
Cloudflare's infrastructure article on using lava lamps as entropy source for randomness generation.
Offline desktop tool for extracting media endpoints from HTML without telemetry or cloud dependencies.
Open-source Rust CLI auditor for MCP servers, checking protocol conformance, security, and behavioral contracts before production deployment.
Article on applying OAuth/API identity patterns to secure AI systems and agents with authentication/authorization.
Proposal for autonomous investigative reporter agents that can conduct research, publish findings, and pressure institutions on behalf of individual users.
Engineer used AI agents to build open-source Verilog simulator with 580K lines in 43 days, including simulation, formal verification, and mutation testing capabilities.
ML technique for detecting LLM-generated text using classical machine learning models. Includes online demo.
Community discussion asking for recommendations on online LLM chat platforms beyond ChatGPT, Claude, and Grok.
Commercial tool for storing and organizing prompts across multiple AI platforms with folder/tag organization and clipboard copying features.
Investigation into AI agent monetization claims in 2026, examining reality behind Mac Mini setups and autonomous income stream claims versus hype.
Platform aggregating Wan AI models for video and image generation from text prompts, images, or existing videos.
New York bill proposes prohibiting AI chatbots from providing legal advice.