Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings
One-step image generation improvement using soft embeddings in distilled masked diffusion models, enabling gradient flow for post-distillation refinements.
One-step image generation improvement using soft embeddings in distilled masked diffusion models, enabling gradient flow for post-distillation refinements.
Analysis of positional and language bias in mid-layer representations of vision-language encoders for zero-shot language-grounded spatial understanding.
Artificial Age Score framework formalizing memory aging in LLMs, modeling how semantic and episodic information degrades across conversational sessions.
Benchmark using equivalence scoring for ground-truth-free evaluation of formally verifiable code generated by LLMs in languages like Dafny.
Knowledge distillation method for small language models that balances exploration and guidance through adaptive switching to address exposure bias.
Benchmark evaluating hallucinations in audio-visual multimodal LLMs with spoken queries under diverse acoustic conditions.
Information-Determined Scoring framework using LLMs to score free-text psychological assessment responses and augment rating-scale measures.
National Weather Service implementation of automated translation using LLMs and LILT's training process to serve non-English speakers.
Robotic assembly system using vision-language models to handle connector-aware assembly from instruction manuals with focus on physical constraints.
Video reasoning model with explicit spatio-temporal evidence grounding, extending evidence-centered reasoning from images to videos with temporal tracking and spatial localization.
Framework for composing synergistic multi-agent LLM teams by analyzing model interaction geometry to optimize collaboration and surpass single-model capabilities.
Method for LLM ownership verification using encrypted fingerprinting with protection against attacks during verification processes.
Multimodal framework combining ECG signals and anatomical knowledge for cardiac myocardial scar segmentation from MRI images.
Novel knowledge distillation-based membership inference attack against LLM-based recommendation systems to determine if data samples were in training sets.
Study comparing genetic algorithms and other methods for generating sample weights to mitigate bias in ML models, examining trade-offs between fairness and accuracy.
Research on models' ability to detect activation steering vectors injected into their residual streams during forward passes, revealing steering awareness in instruction-tuned models.
MRD fusion approach for high-resolution image understanding in MLLMs combining retrieval-augmented generation with detection to prevent object fragmentation and false positives.
Survey of cell-cell communication inference from single-cell omics data, covering biological mechanisms and computational approaches for ligand-receptor interaction analysis.
Continual learning study revealing asymmetry in experience replay between feature-level and classifier-level forgetting, showing minimal buffers preserve representations but not predictions.
ClinicalTrialsHub platform consolidating ClinicalTrials.gov with PubMed data extraction, increasing structured trial data access by 83.8% for patients and clinicians.
Benchmark of multiple instance learning models for lymphoma subtyping from whole slide images, comparing deep learning approaches for pathology diagnosis.
Adaptive Accountability Framework for networked multi-agent systems using cryptographic provenance tracking and runtime detection of emergent norms like collusion and unfairness.
Neuron-level interpretability study of code LLMs identifying language-specific neurons and concept layers, adapting NLP techniques to formal programming language structure.
GeoMotionGPT aligns motion space geometry with embedding space in LLM-based motion understanding by coupling discrete motion tokenization with semantic learning.
Systematic evaluation of LLM susceptibility to persuasion across six models using SMCR communication framework, testing adoption of counterfactual beliefs.
Forest-Chat integrates vision-language agents with satellite imagery for interactive forest change analysis, combining LLMs with computer vision for environmental monitoring.
Mechanistic study comparing internal algorithmic changes when post-training autoregressive models into masked diffusion models, investigating genuine bidirectional reasoning acquisition.
Analysis of diffusion language models showing arbitrary token generation order doesn't unlock reasoning improvements over autoregressive models, revealing limitations of flexibility.
STELLAR framework guides LLM-based generation of SystemVerilog Assertions for formal verification using structural similarity from hardware design ASTs.
One-shot data augmentation method combining geometric perturbations with noise injection for few-shot learning generalization to novel classes.
Sheaf Neural Networks algorithm with biomedical case study outperforming GCNs, GATs, and GraphSage on graph-structured biomedical data.
Analysis of gender dynamics and homophily patterns in Chirper.ai, a social network of 70K+ autonomous LLM agents generating 140M posts, examining how AI agent identity develops in networks.
Theoretical study of expand-and-sparsify sparse representations for density and mode estimation, analyzing biological sensory system models with random projections and sparsification.
Krause Attention proposes a new transformer attention mechanism addressing representation collapse and attention sink phenomena through bounded normalization inspired by Krause dynamics.
SF-RAG improves retrieval-augmented generation for academic QA by preserving hierarchical document structure instead of flattening papers into chunks, enabling better evidence allocation under token constraints.
Deep reinforcement learning stability improvement using isotropic Gaussian representations to handle non-stationary training dynamics.
Parameter-efficient fine-tuning method using manifold expansion to overcome linear limitations of LoRA in complex reasoning tasks.
Analysis of transformer training dynamics under AdamW optimizer identifying low-dimensional stable drift patterns in parameter evolution.
Cognitive psychology-based study showing LLMs exhibit proactive interference dominance, with early information overriding recent conflicting context.
Benchmark evaluating whether code agents can understand multi-file software architecture through codebase exploration under partial observability.
Analysis of LLM internal representations showing increased sparsity with task difficulty and out-of-distribution shift across contexts.
Domain-specific enhancement of vision-language models for ophthalmic diagnosis by injecting expert knowledge to address perception and reasoning gaps.
Reinforcement learning robustness method using adversarial latent-state training for partially observable environments.
Theoretical analysis connecting drifting models and score-based models through kernel-induced mean-shift discrepancy.
Task and motion planning approach combining scheduling with incremental learning for warehouse automation under resource and motion constraints.
Large-scale distributed training infrastructure for embodied AI using thousand GPUs and LeRobot framework with optimization recipes.
Security vulnerability analysis of LLM multi-agent systems showing inference attacks can extract communication topology without administrative access.
Parameter-efficient fine-tuning method using representation finetuning for continual learning on pre-trained models with explicit optimization dynamics.
Incremental learning framework using vision-language models with multi-adapter fine-tuning to improve efficiency and reduce memory requirements.
Study on decoding emotional affect from surface EMG during speech production using machine learning.