AdaBoN: Adaptive Best-of-N Alignment
Prompt-adaptive Best-of-N alignment strategy using reward models to reduce computational cost of test-time alignment for language models.
Prompt-adaptive Best-of-N alignment strategy using reward models to reduce computational cost of test-time alignment for language models.
Survey on integrating TinyML and LargeML for 6G networks, covering deep learning applications in mobile systems, autonomous vehicles, and smart services.
Attention-aware embedding initialization method for new tokens in LLMs without expensive retraining, addressing vocabulary limitations in specialized domains.
Conditional marked point processes for reliable object detection uncertainty quantification, addressing miscalibrated confidence scores in neural networks.
Self-supervised learning approach adapting joint embedding architecture from video to EEG signals for brain activity analysis with limited labeled data.
Quantum-informed ML framework combining quantum generative models with classical predictors for long-term spatiotemporal chaos prediction.
Supervised fine-tuning method to align LLM agents with rational and moral preferences in strategic economic games, addressing systematic behavioral deviations.
Generative approach to bid shading in real-time bidding advertising using non-convex surplus optimization instead of traditional two-stage methods.
Object-centric representations for visual RL policies using dynamic tokens to improve generalization under visual condition changes without fixed-size slots.
Security evaluation of ML model sharing frameworks and hubs, assessing vulnerabilities in loading shared models and security awareness gaps among practitioners.
Neural quantum states impurity solver for quantum embedding problems. Graph transformer-based NQS for solving Hamiltonians in quantum chemistry.
Dynamic Aware: out-of-distribution detection for trajectory prediction in autonomous vehicles. Adaptive multi-mode approach for distribution shift in AVs.
AutoClimDS: agentic AI system for climate data science. Knowledge graph-based workflows for discovering climate patterns from fragmented data sources.
Formal language theory applied to statistical learning. Proves subregular language classes are linearly separable with simple models.
DataMind: scalable data-analytic AI agents for automated discovery. Open-source agent framework handling diverse-format data files and multi-step reasoning.
HoneyBee: data curation approaches for vision-language reasoning datasets. Analyzes impact of context, content, and format on VLM reasoning capabilities.
CBF-RL: integrates control barrier functions into reinforcement learning training. Enforces dynamic safety constraints during RL policy training, not just inference.
RobotArena ∞: scalable robot benchmarking via real-to-sim translation. Enables rigorous evaluation of robot policies across diverse tasks and environments.
Verifying LLM inference to detect model weight exfiltration via steganography. Defends inference servers against model theft and anomalous behavior.
Portfolio optimization under stochastic dominance constraints with S-shaped utilities. Investigates first and second-order dominance constraints.
AnatomiX: anatomy-aware multimodal LLM for chest X-ray interpretation. Improves spatial reasoning and anatomical understanding in medical imaging.
Study of bioelectrical properties for malignancy detection. Systematic review of 535 datasets on cellular bioelectric parameters across frequencies.
FARM framework for malware family classification under concept drift. Uses triplet autoencoder for few-shot adaptation to covariate and label drift.
LatentChem: latent reasoning interface for chemical LLMs. Decouples chemical computation from discrete tokens to improve efficiency and performance in chemical reasoning.
FastLSQ framework for solving PDEs using Fourier features with analytical derivatives. Achieves high accuracy on 1-6D problems without autodiff.
Pyramid MoA: probabilistic framework for cost-optimized LLM inference via cascading and routing. Balances inference cost and reasoning capability for large language models.
Enhancements to projection pursuit tree classifier with visual diagnostic methods for high-dimensional classification. Addresses limitations in multi-class settings.
IROSA: framework combining foundation models with imitation learning for robot skill adaptation via natural language. LLM application to robotics.
Disentangled Safety Hypothesis: mechanistic study of LLM safety showing decoupling between harmfulness detection and refusal. ML interpretability research.
Benchmark evaluating frontier AI models on multi-step cyber attack scenarios. Agent capability measurement across extended action sequences.
Agentic framework for multimodal query processing with adaptive tool orchestration across text/image/audio/video. Research on agent coordination and tool selection.
Proof-Carrying Materials: falsifiable safety certificates for machine-learned interatomic potentials. ML research on reliability guarantees for scientific models.
Codex Security: AI agent for code security that analyzes repository architecture and trust boundaries before validating findings with humans.
AI-as-Code approach for agent factories.
Open-source AgentFactory orchestrates fleet of coding agents (Claude, Codex, Spring AI) through automated pipeline for issue resolution and code shipping.
Open-source framework for personal AI agents running entirely on-device with efficiency-aware evaluations and learning loop using local trace data.
NPM package enabling free OpenAI API access via ChatGPT OAuth tokens. Creates localhost proxy to ChatGPT backend API with Vercel AI SDK provider support.
AI automation tool to summarize Datadog monitoring alerts and escalate issues, reducing manual dashboard review.
Multi-threaded Redis replacement in Rust (5.6x faster, 1MB Docker image) with drop-in compatibility and concurrent architecture.
LessWrong editor UI update with Lexical framework and WYSIWYG improvements.
Video about sewage facility becoming bird sanctuary. Off-topic.
Discussion of mental fatigue and workflow challenges when working with LLMs like Claude and Codex, and recovery strategies.
Multi-agent workflow orchestration system supporting Gemini, Qwen, Claude with role-based agents, background execution, and visual workflow editing.
Report on Iranian drone strikes against AWS data centers in UAE used for AI infrastructure.
GitHub Action detecting LLM output drift in CI/CD by replaying workflows and diffing outputs to prevent silent model changes reaching production.
CLI tool for managing ETL transformation pipelines with artifact versioning and SQLite provenance tracking.
Anecdotal story about data scientist using AI and ChatGPT to develop cancer vaccine for dog.
Dashboard for real-time observability into Claude Code sessions, tracking costs, tool usage, and subagent execution without code changes.
Hypergraph data structure implementation in Zig language with research community modeling example.
ByteDance delays Seedance 2.0 video generation model launch due to copyright disputes with Hollywood studios.