Robust Adaptation of Foundation Models with Black-Box Visual Prompting
Black-box visual prompting method for parameter-efficient transfer learning of foundation models without full parameter access.
Black-box visual prompting method for parameter-efficient transfer learning of foundation models without full parameter access.
SPRIG: Genetic algorithm for optimizing system prompts in LLMs to improve task performance.
MissNODAG: Framework for learning cyclic causal graphs from incomplete data using differentiable methods.
Sparse Gradient Descent algorithm for variable selection in convex piecewise linear regression models.
Score-matching causal discovery algorithm extended for temporal data on networks.
XAI-based method combining explainability with concept drift detection for monitoring model performance degradation.
Framework for constructing confidence sets for changepoints in sequential analysis using data-dependent stopping times.
World models using disentangled representations to transfer semantic knowledge from distracting videos for RL agents.
Digital twins framework for optimizing CI/CD build processes to reduce duration, failures, and flakiness.
Online test-time adaptation for spiking neural networks on neuromorphic chips to handle distribution shifts.
FSD framework combining vision-language models with robotic action models for zero-shot manipulation in novel scenarios.
Review of ML/AI applications in food processing, classification systems, and food informatics.
Neural network surrogate for learning evolution operators in time-dependent Schrödinger equations with unitarity constraints.
Gaussian mixture models as computationally efficient proxy for LLM+RAG systems combining multiple models.
COinCO dataset with 97,722 images created via diffusion-based inpainting for training context-aware vision models.
Machine learning methods for learning Hamiltonian components of open quantum systems.
Large deviations approach to accelerate constrained sampling algorithms for probability distributions.
Technique to recover LLM training on decentralized/spot nodes from partial model loss without full checkpoints.
Method for LLMs to reliably cite source documents seen during training without external retrievers at inference time.
Vision Transformer framework reconstructs cloud-obscured satellite imagery using time-series data for crop mapping.
SciGA-145k dataset for training models to automatically design graphical abstracts for academic papers using visual data.
CATNet applies graph convolutional networks to predict catastrophe bond spreads using relational data structures.
Modification of Whisper ASR model to enable low-latency streaming transcription through architectural and training changes.
Vision-language model for robotic manipulation using embodiment-agnostic pointing representation to address generalization in embodied AI.
System co-design for efficient on-device LLM inference on NPU hardware, optimizing attention operations for privacy-preserving deployment.
Diffusion-based causal inference method for spatio-temporal data with unmeasured confounders and multi-resolution observations.
Theoretical work on distributed mean estimation with 1-bit communication constraints using interval queries, achieving near-optimal sample complexity.
Google developer tool using deep learning to automatically fix copy/paste code, predicting required edits from formatting to cross-language translation.
Knowledge editing method for LLMs enabling sequential updates through null-space alignment, improving robustness in continual model editing scenarios.
AI system for credit scoring of Malaysian MSMEs using bank statement data as alternative to traditional credit bureau data.
Scientific machine learning approach using implicit neural representations for 3D gravity inversion, modeling subsurface density as continuous field.
Data envelopment analysis method for dynamic efficiency evaluation across multiple organizational dimensions with regularization for large-scale settings.
Data-driven nonlinear state estimation method for model-free processes using RNNs with noisy nonlinear measurements.
Image hashing method using foundation models for efficient large-scale retrieval with compact binary codes instead of high-dimensional embeddings.
Study of RAG limitations in healthcare domain, showing how retrieval-augmented generation fails when source documents contain contradictory or outdated information.
Sphinx synthetic environment for visual perception and reasoning with procedurally generated puzzles covering 25 task types, enabling precise evaluation and large-scale dataset construction.
Analysis showing optical context compression via vision tokens is functionally equivalent to lossy autoencoding, questioning effectiveness of DeepSeek-OCR's compression pipeline.
Human-in-the-loop approach for visual classification through iterative concept deliberation, addressing subjective vision tasks in content moderation and curation.
Application of Kolmogorov-Arnold neural networks to model thermal decomposition kinetics in lithium-ion batteries with state-of-charge dependence.
Synthetic Aperture Radar dataset for ship type classification using deep learning models, focused on maritime activity monitoring.
DDFT protocol measures epistemic robustness of language models under stress conditions, distinguishing knowledge gaps from verification mechanism failures beyond static benchmarks.
Mechanistic interpretability study of how Diffusion Transformers generate correct spatial relations in text-to-image generation.
ConvoLearn dataset of 2,134 tutor-student dialogues for fine-tuning dialogue-based AI tutors grounded in knowledge-building theory.
Open-source educational platform teaching ML fundamentals to students aged 12-17 using LEGO robotics.
Pretraining approach using post-trained models to incorporate reasoning and safety behaviors earlier in LLM development.
Regularization techniques for improving multimodal representation learning by addressing collapse and inconsistency issues.
Few-shot fine-tuned language models for diagnosing intermittent CI pipeline failures in software development.
Compact embeddings for fast text-based wildlife observation retrieval from large biodiversity archives.
Cross-modal learning for bird species recognition using audio-to-image retrieval without paired training data.
Training LLMs to resist cognitive biases in reasoning via reinforcement learning rather than prompting.