Uncovering a Winning Lottery Ticket with Continuously Relaxed Bernoulli Gates
Method for finding sparse subnetworks in neural networks using continuously relaxed Bernoulli gates, improving lottery ticket hypothesis efficiency.
Method for finding sparse subnetworks in neural networks using continuously relaxed Bernoulli gates, improving lottery ticket hypothesis efficiency.
Research framework quantifying uncertainty in AI visibility metrics for generative search, addressing non-deterministic citation behavior.
Benchmark evaluating vision language models on generating plant simulation configurations for digital twins via in-context learning.
PathoScribe framework using LLMs for semantic retrieval and clinical reasoning over pathology reports to unlock institutional knowledge.
VoxEmo benchmark for evaluating speech emotion recognition using speech LLMs with generative interfaces, addressing prompt sensitivity and emotional ambiguity.
BiCLIP extends vision-language models to specialized domains using structured geometric transformations based on canonical relationships.
Automated tensor-relational decomposition method for large-scale sparse tensor computation on relational database systems.
Semantic Level of Detail (SLoD) framework enables continuous resolution control for knowledge graphs in AI memory systems via heat kernel diffusion.
Arbiter framework detects interference patterns in LLM coding agent system prompts using formal evaluation rules, tested on Claude Code, Codex CLI, and Gemini CLI.
Security framework identifying distinct vulnerabilities in multi-agent systems with delegated tool authority and inter-agent communication.
Analysis of gender fairness disparities in audio deepfake detection systems.
AI phenomenology framework examining subjective human-AI experiences beyond performance metrics and usability scales.
Research framing LLM context windows as L1 cache; proposes demand paging and virtual memory hierarchy for efficient token reuse.
Automated detection and root-cause analysis pipeline for flaky tests in quantum software systems.
PlayWorld: autonomous pipeline training action-conditioned video models for robot simulators from large-scale datasets.
WS-Net uses state-space modeling and attention mechanisms for hyperspectral image unmixing, addressing weak signal collapse in abundance estimation tasks.
Sim2Act improves simulation-to-reality transfer for robot policies using adversarial calibration and perturbation methods to handle prediction errors in decision-critical regions.
Doki is a text-native interface for generative video creation, enabling users to author videos through natural language writing instead of specialized video editing tools.
GST-VLA introduces Gaussian spatial tokenization for vision-language-action models, adding 3D geometric structure awareness to improve robot perception and decision-making.
Finetuned LLMs extract sentiment signals from textual data to forecast aluminum commodity prices, exploring when these signals are most predictive.
Survey paper on latent world models and vision-language-action systems for autonomous driving, covering taxonomy, evaluation frameworks, and challenges.
Vision-language retrieval framework for skin cancer case search using composed image-text queries with global and local representation alignment.
VIVID-Med uses frozen LLMs as structured semantic teachers for pretraining medical vision transformers, improving clinical image analysis.
Language-driven embodied navigation system using semantic priori-maps and chain-of-thought prompting for functional buildings.
Human-in-the-loop framework for post-training vision-language-action models in robotic dexterous manipulation tasks.
Diffusion model for image super-resolution with quality-aware and uncertainty-guided modules for real-world degradation.
Research on mitigating catastrophic forgetting in class-incremental learning using causal feature expansion methods.
Rubric-guided reinforcement learning framework for dense image captioning that improves diversity and generalization over supervised distillation from VLMs.
Method learning netlist representations from LLM-generated imperfect RTL code, scaling beyond small circuits using self-correction and structural learning.
Geologically-informed attention transformer for lithology identification from well logs, integrating domain priors with interpretable deep learning.
Visuomotor control for humanoid robots learning natural whole-body behaviors from human egocentric video without expensive teleoperation data.
Full-duplex speech-to-speech dialogue system combining cascaded ASR-LLM-TTS without VAD segmentation, enabling natural conversational interaction.
Framework bridging discrete diffusion language models with autoregressive models to enable non-sequential global reasoning and plan revision in multi-agent systems.
Study analyzing emotion as a latent representational factor in LLM reasoning and attention mechanisms, rather than just a prediction target.
Simulation framework for analyzing human-robot interaction dynamics by modeling human biomechanics and motor responses in physical collaborative systems.
Virtual try-off system reconstructing flat-garment representations from dressed person images by bridging on-body appearance and canonical layouts.
Deep learning approach combining traffic sign, vehicle, and lane detection with behavioral cloning for autonomous vehicle perception and control.
Energy-efficient spiking neural network architecture for temporal event-based sensory processing with improved decoding capabilities.
3D scene reconstruction method extending Gaussian Splatting with noise robustness for multi-view image synthesis under real-world artifacts.
Medical image segmentation framework handling missing modalities through consistency learning among expert models for robust multimodal fusion.
Multi-modal benchmark dataset for spacecraft perception and pose estimation using synthetic data for autonomous space operations and debris removal.
VR agent pipeline that integrates prosodic emotional context from speech into LLM dialogue processing for emotionally-aware conversational responses.
Audio effect control system using gram-guided retrieval for digital audio workstations, with focus on bridging semantic gap between user intent and signal-processing parameters.
Research evaluating LLMs as interactive agents in adversarial, time-sensitive zero-sum environments, assessing strategic reasoning and decision-making beyond static benchmarks.
TaSR-RAG uses taxonomy-guided structured reasoning to improve retrieval-augmented generation systems, addressing context redundancy and multi-hop reasoning challenges in LLM-based knowledge systems.
Testing-time adaptive graph neural network for cross-domain anomaly detection addressing domain shift challenges.
Dataset condensation technique for classical clinical models enabling privacy-preserving synthetic data generation.
Contrastive learning method for skeleton-based action recognition using multi-view mini-max game framework.
Multiple instance learning approach for mammography classification using foundation model features with weak supervision.
Offline-to-online reinforcement learning method for safe robot policy alignment using action space constraints.