WhisperRT -- Turning Whisper into a Causal Streaming Model
Modification of Whisper ASR model to enable low-latency streaming transcription through architectural and training changes.
Modification of Whisper ASR model to enable low-latency streaming transcription through architectural and training changes.
Vision-language model for robotic manipulation using embodiment-agnostic pointing representation to address generalization in embodied AI.
System co-design for efficient on-device LLM inference on NPU hardware, optimizing attention operations for privacy-preserving deployment.
Diffusion-based causal inference method for spatio-temporal data with unmeasured confounders and multi-resolution observations.
Theoretical work on distributed mean estimation with 1-bit communication constraints using interval queries, achieving near-optimal sample complexity.
Google developer tool using deep learning to automatically fix copy/paste code, predicting required edits from formatting to cross-language translation.
Knowledge editing method for LLMs enabling sequential updates through null-space alignment, improving robustness in continual model editing scenarios.
AI system for credit scoring of Malaysian MSMEs using bank statement data as alternative to traditional credit bureau data.
Scientific machine learning approach using implicit neural representations for 3D gravity inversion, modeling subsurface density as continuous field.
Data envelopment analysis method for dynamic efficiency evaluation across multiple organizational dimensions with regularization for large-scale settings.
Data-driven nonlinear state estimation method for model-free processes using RNNs with noisy nonlinear measurements.
Image hashing method using foundation models for efficient large-scale retrieval with compact binary codes instead of high-dimensional embeddings.
Study of RAG limitations in healthcare domain, showing how retrieval-augmented generation fails when source documents contain contradictory or outdated information.
Sphinx synthetic environment for visual perception and reasoning with procedurally generated puzzles covering 25 task types, enabling precise evaluation and large-scale dataset construction.
Analysis showing optical context compression via vision tokens is functionally equivalent to lossy autoencoding, questioning effectiveness of DeepSeek-OCR's compression pipeline.
Human-in-the-loop approach for visual classification through iterative concept deliberation, addressing subjective vision tasks in content moderation and curation.
Application of Kolmogorov-Arnold neural networks to model thermal decomposition kinetics in lithium-ion batteries with state-of-charge dependence.
Synthetic Aperture Radar dataset for ship type classification using deep learning models, focused on maritime activity monitoring.
DDFT protocol measures epistemic robustness of language models under stress conditions, distinguishing knowledge gaps from verification mechanism failures beyond static benchmarks.
Mechanistic interpretability study of how Diffusion Transformers generate correct spatial relations in text-to-image generation.
ConvoLearn dataset of 2,134 tutor-student dialogues for fine-tuning dialogue-based AI tutors grounded in knowledge-building theory.
Open-source educational platform teaching ML fundamentals to students aged 12-17 using LEGO robotics.
Pretraining approach using post-trained models to incorporate reasoning and safety behaviors earlier in LLM development.
Regularization techniques for improving multimodal representation learning by addressing collapse and inconsistency issues.
Few-shot fine-tuned language models for diagnosing intermittent CI pipeline failures in software development.
Compact embeddings for fast text-based wildlife observation retrieval from large biodiversity archives.
Cross-modal learning for bird species recognition using audio-to-image retrieval without paired training data.
Training LLMs to resist cognitive biases in reasoning via reinforcement learning rather than prompting.
Dexterous robotic manipulation policy fine-tuning using diffusion models and normalizing flows for real-world scenarios.
Research on predicting LLM success from internal pre-generation activations to optimize inference efficiency in reasoning tasks.
SSLogic agentic meta-synthesis framework where LLM agents iteratively generate and refine task specifications for logic reasoning.
Training-free few-shot anomaly detection using subspace modeling of vision foundation model features.
Analysis of noise models and mitigation strategies in photonic quantum machine learning systems.
Training framework for geometric and neuromorphic AI using alternative arithmetic substrates.
SwiftGS system for rapid 3D satellite surface reconstruction via meta-learned Gaussian primitives.
Canonical Security Telemetry Substrate for standardizing cybersecurity data formats for AI-driven detection.
Firefly algorithm adaptation for mixed-variable optimization problems.
Weakly convex ridge regularizer for 3D non-Cartesian MRI reconstruction.
Early warning system for GPU hardware failures using structural observability beyond numeric telemetry.
OptiMer framework for optimizing data mixture ratios during continual LLM pre-training without manual tuning.
Brain tissue segmentation from MRI using deep learning and foundation models.
Multimodal dataset of 601k text annotations and 385k audio recordings across 10 African languages.
Realistic backdoor attack methods for federated learning using semantically meaningful triggers.
Novel neural architecture primitive based on field theory and metriplectic dynamics.
S0 tuning method for efficient LLM adaptation via state matrix optimization, outperforming LoRA on code generation tasks.
Neural architecture for generating online handwriting with stroke continuity and stylistic consistency.
Stub article about mempalace AI memory system benchmark.
OpenClaw provider plugin routes LLM requests through Claude Code CLI with persistent worker pool and OAuth, enabling Claude Pro/Max access without API credentials.
LLM-based workout plan generator for personal trainers in India with WhatsApp integration and exercise library.
Ship Safe v7.0.0: AI-powered security platform running 19 specialized agents to scan code for vulnerabilities including LLM/agentic AI security risks.