A Mechanistic Analysis of Looped Reasoning Language Models
Mechanistic analysis of looped reasoning language models examining internal dynamics and latent state evolution compared to standard feedforward models.
Mechanistic analysis of looped reasoning language models examining internal dynamics and latent state evolution compared to standard feedforward models.
Uses reinforcement learning on physics simulators to train models solving Physics Olympiad problems, addressing lack of large-scale physics QA datasets for reasoning models.
SHANG++: Accelerated stochastic gradient descent methods robust to multiplicative noise in gradient updates.
LABBench2: Improved benchmark for evaluating AI systems and agents on biology research tasks with real-world capabilities.
VTC: DNN compilation method using virtual tensors to eliminate data movement in neural network workloads including LLMs.
Pipeline and best practices for log analysis in AI systems to understand model behaviors, with code examples in Inspect framework.
User study evaluating effectiveness of interval-based counterfactual explanations for improving understanding and trust in black-box models.
Benchmark measuring humanization and anti-detection capabilities of mobile GUI agents against platform countermeasures.
Demonstrating LLMs can generate UI interfaces and content together with proper prompting and tool integration.
Object-Oriented Programmatic World Modeling (OOWM) for embodied reasoning and planning in robotic tasks using LLMs.
MobiFlow: Benchmark for mobile agents using trajectory fusion for real-world GUI task evaluation without system-level APIs.
Architecture for maintaining persistent identity in AI agents through multi-anchor memory to prevent catastrophic forgetting.
Spatial Competence Benchmark (SCBench) evaluating large models on spatial reasoning, environment representation, and planning tasks.
ECHO: Speculative decoding optimization for LLM inference in high-concurrency serving with sparse gating.
Benchmark evaluating LLMs as text-only controllers for exploration and navigation in gridworlds under partial observability.
Method for self-calibrating LLMs at test-time through discriminative distillation to reduce overconfidence without labeled data.
Dataset and annotation study of COVID-19 vaccination regret experiences from YouTube comments.
ML framework for predicting 5G network downlink performance using measurements from commercial smartphones.
Machine learning models to assess media literacy skills and identify disinformation among students.
Deep learning model using TCN and attention-based LSTM for predicting stock repurchases in Chinese financial markets.
Study of backdoor security vulnerabilities in flow-matching Vision-Language-Action models used for robotics, exploiting vector field dynamics.
NeuroPath system for motor imagery decoding from EEG signals for brain-computer interfaces in prosthetics and rehabilitation.
Lightweight speech activity-based approach for real-time voicemail detection in telephony using tree ensemble classification.
Theoretical paper establishing mathematical isomorphisms between ant colony behavior, ensemble learning, and gradient descent in deep networks.
Zero-shot modular pipeline for traffic accident detection, localization, and classification without labeled training data.
Quantifies data complexity by measuring face density (instance count) as driver of machine learning hardness in crowded scenes.
Addresses surrogate-to-hard transfer gap in spiking neural networks for on-sensor vision using sharpness-aware training.
Vision transformer patch aggregation method for weakly-supervised anomaly and target segmentation in industrial and agricultural applications.
Deep learning approach for fair disease diagnosis in chest CT addressing compound failures from class imbalance and demographic underrepresentation.
ASTRA silicon-photonic accelerator for transformers using stochastic computing to reduce computation and memory demands.
VaFES framework uses generative deep learning to directly model free energy surfaces for rare event simulation without dataset sampling.
Method for disentangling physical signals from measurement artifacts in multi-sensor astrophysics data using machine learning.
Pioneer Agent automates continuous improvement of small language models in production through closed-loop data curation, failure diagnosis, and iteration control.
COMPOSITE-STEM benchmark with 70 expert-written tasks for evaluating AI agents on physics, biology, chemistry, and materials science problems.
Research on activation steering in LLMs showing steered states are non-surjective, with implications for interpretability and safety.
Studies BERT pretraining for DNS exfiltration detection, isolating effects of domain-specific pretraining on security classification.
MEMENTO teaches LLMs to compress reasoning into dense summaries, reducing context and compute requirements. Releases OpenMementos dataset of 228K examples.
Method combining stereo vision and language models for 3D volume estimation of objects from images.
Cable-driven field robot design for agricultural operations with energy efficiency analysis and lifecycle impact assessment.
Proposes hybrid fine-tuning paradigm for LLMs combining full and parameter-efficient approaches with convergence analysis framework.
Evaluates reproducibility of ColBERT-v2 and ConstBERT retrieval models across different query types, finding architectural limitations on long narrative queries.
Commercial tool aggregating multiple AI model APIs behind single interface. Generic LLM comparison service.
Clickbait video title about Opus 4.6 codec. No substantive content provided.
Context compression daemon (entroly) for LLM applications reducing token usage by 90% through self-evolving compression. Claims token-negative learning.
MCP server integrating Claude with personal finance data via Model Context Protocol. Enables AI agents to access bank accounts, cards, and investments read-only.
Legal news about attorney disciplinary charges for AI misuse. Not relevant to AI/tech development.
Rust open-source PAM module replacing SSH keys with short-lived OIDC tokens secured via DPoP cryptographic proof. Developer tool for infrastructure.
Cognitive architecture claiming 83+ modules running on Mac with consciousness theories and affective steering. Lacks technical depth and verifiable claims.
Meta training photorealistic AI avatar of Mark Zuckerberg for internal employee engagement. Corporate PR, limited technical detail.
Integration making Shopee e-commerce products machine-readable for ChatGPT and Perplexity through structured data. Enables AI agent product discovery.