Show HN: ContextUI open sourced – Local first AI workflows for humans and agents
Analysis piece on bottlenecks in exponential AI output growth. Limited technical depth.
Analysis piece on bottlenecks in exponential AI output growth. Limited technical depth.
Research on designing programming languages optimized for LLM code generation and interaction.
Slack MCP Server v2.0.0 with deterministic diagnostics and stable tool contracts. AI agent infrastructure for Slack integration.
API for compressing LLM prompts achieving 40-60% token savings with minimal overhead.
Article title about frontier model training methodologies. No content for evaluation.
Fine-tuned 14B model achieving 30% accuracy on NYT Connections puzzle vs GPT-4o's 22.7%. Original ML benchmark result.
LMStudio tool for loading and using local LLM models remotely with end-to-end encryption.
Physics-based simulator for distributed LLM training and inference optimization.
Netcup domain registrar DNSSEC infrastructure failure causing DS record mismatches. Technical but unrelated to AI/ML interests.
Open-source 32B model demonstrates introspection capabilities through logit analysis. Improved prompting enhances performance on detecting injected concepts in activations.
OpenAI Codex and Figma integration using MCP standard to enable code-to-design bidirectional conversion. Uses AI agents to interface with external systems.
Real2Sim2Real framework for deformable linear object manipulation using likelihood-free inference and visual perception for robotic agents.
Quantum machine learning framework for predicting ADME drug properties using quantum circuit search with imbalanced data handling.
Proposes metric measuring AI ability to complete long software tasks by comparing model performance to human domain expert completion time.
Test data generation method for SQL code generation services using high-fidelity synthetic data to model complex data structures and semantic relationships.
kDOT: discrete optimal transport framework for voice conversion using barycentric projection in pretrained speech embedding space instead of averaging strategies.
BARREL identifies pathological reasoning patterns in Large Reasoning Models and improves factual reliability. Enables models to admit ignorance instead of confident false answers.
Knowledge fusion method for LLMs via modular SkillPacks. Enables efficient cross-capability transfer for multi-task integration, compression, and continual learning.
First non-Euclidean neural quantum state ansatz using hyperbolic GRU for Variational Monte Carlo approximation of quantum many-body ground states.
Novel multi-Boolean architecture for weight-binarized LLMs. Framework with multi-kernel Boolean parameters reduces complexity without severe post-training performance loss.
Shows RLVR with GRPO can improve LLM mathematical reasoning using spurious rewards with little/no correlation to correct answers, challenging reward signal assumptions.
Establishes foundations for provable copyright protection in generative models. Revisits near access-freeness and defines conditions for copyright guarantee.
Comprehensive benchmark for ECG time-series data addressing unique characteristics and specialized downstream applications of bioelectrical signals.
CASCADE: hybrid LLM-powered JavaScript deobfuscator at Google combining Gemini coding capabilities with compiler IR transformations for code comprehension.
Position paper arguing ML fairness research should quantify structural injustice via social determinants rather than focusing only on sensitive attributes.
Controlled experiments examining whether LLMs incorporate external label definitions or rely on parametric knowledge. Tests expert-curated, LLM-generated, and perturbed definitions.
MedicalPatchNet: self-explainable architecture for chest X-ray classification using patch-based independent classification and aggregation for transparency.
PeruMedQA benchmark dataset of Peruvian medical exam questions in Spanish. Evaluates LLM performance on non-English, Latin American medical domain tasks.
Investigates sim-to-real gap in visual navigation by comparing simulator-trained and real-world-trained policies. Demonstrates simulator policies can match real-world performance.
Preference-aligned audio captioning framework using RLHF with CLAP-based reward model trained on human-labeled preferences. Addresses gap between supervised learning and real preferences.
DivEye detector for AI-generated text using diversity metrics. Improves detection of synthetic text while providing interpretability over black-box classifiers.
Theoretical analysis proving minimax convergence rates for learning pairwise interactions in single-layer attention models. Rate independent of embedding dimension and token count.
Automated data generation framework for multi-step bimanual mobile manipulation tasks. Uses imitation learning to reduce costly human teleoperation data collection.
Framework for generating multimodal datasets with controllable mutual information between modalities. Enables systematic study of MI estimators and multimodal self-supervised learning.
Largest multilingual scaling laws study with 774 experiments across 400+ languages. Introduces Adaptive Transfer Scaling Law (ATLAS) for monolingual and multilingual pretraining.
Sequential transducer model for recommendation systems handling ultra-long user histories. Explores memorization with transformer-like architecture at scale.
Likelihood-free inference approach adapting domain support for stochastic systems. Addresses misspecified support in robotics agent deployment scenarios.
Cosmos-Predict2.5 foundation model for world simulation unifying text/image/video generation. Leverages vision-language model for grounded physical AI predictions.
Theoretical proof that optimal learning rates transfer across widths in MLPs with μP parameterization. Shows learning rate converges to nonzero constant at infinite width.
Graph neural network approach for estimating nonstabilizerness in quantum circuits. Addresses quantum advantage through stabilizer Rényi entropy estimation.
Knowledge distillation approach for efficient retinal OCT classification models. Reduces computational demands while maintaining clinical-grade performance.
Deep learning model for classifying neoplastic tubular adenomas in colonoscopy. Uses Mamba architecture for digital pathology and colorectal cancer risk stratification.
Deep reinforcement learning application for 5G RAN energy optimization. Uses GNN-based agent to optimize radio unit sleep scheduling and resource slicing.
LLM agent for symbolic equation discovery using multi-step scientific reasoning. Guides model through hypothesis formation, data analysis, and equation validation.
Evaluates HiFloat low-bit formats on Ascend NPUs for LLM inference. Compares INT8 and 4-bit floating-point for efficiency-accuracy tradeoffs.
Evolutionary System Prompt Learning method jointly improves LLM contexts and weights via reinforcement learning. Enables autonomous self-improvement for agentic systems.
Multi-agent LLM framework for robotic manipulation with closed-loop visual feedback. Integrates language and vision models for task planning in dynamic environments.
LLM agents diagnose and repair infeasible supply chain optimization models. Demonstrates closed-loop agent task decomposition for operations research problems.
Research on prompt injection vulnerabilities in LLM agents via skill files. Identifies security risks in agent supply chains and skill-based attacks.
Economic analysis of AGI's impact on labor and growth. Argues human verification becomes the bottleneck as AI decouples cognition from biology.