Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive Architecture
Self-supervised learning approach for ECG signal representation using masked modeling from unlabeled medical data.
Self-supervised learning approach for ECG signal representation using masked modeling from unlabeled medical data.
Investigation of gender bias in Bangla language models with benchmark datasets for sentiment analysis, toxicity detection, hate speech, and sarcasm.
Method for learning disentangled visual concepts in image generation to improve multi-aspect creative generation while reducing concept confusion.
Analysis of polysemanticity in LLMs revealing neurons exhibit multiple semantic meanings, challenging discrete neuron attribution for model interpretation.
Research on large-scale simulation of LLM-driven generative agents for studying human behavior and social dynamics through computational approaches.
Sequential model editing method with editing anchor compression to constrain parameter drift and maintain LLM general abilities during knowledge updates.
Budget-friendly proxy model framework for post-hoc interpretability of LLMs, enabling actionable explanations for prompt engineering and optimization.
Autoregressive super-resolution framework decomposing extreme upsampling into intermediate scales with preference alignment for improved scalability.
Agentic framework for synthetic image data generation and validation addressing data scarcity and label noise in vision tasks like detection and segmentation.
Safety enhancement for medical vision-language models using synthetic demonstrations to improve rejection of harmful clinical queries.
Listener-rewarded thinking approach using reinforcement learning to train robust reward models for generative text-to-image and video models.
Theoretical analysis providing quantitative guarantees for post-training quantization methods OPTQ and Qronos applied to LLMs and neural networks.
Keyframe selection method using visual subtitles for improved long video understanding with multimodal LLMs under context length constraints.
DINOv2-based segmentation framework for plant species and damage detection in herbicide trials, addressing domain drift across real-world conditions.
Investigation of multimodal LLMs for automating usability evaluation of user interfaces by analyzing visual UI context and textual instructions.
Kolmogorov-Arnold Network variant with autoregressive weights for time series forecasting, comparing performance against LLMs and ARIMA.
Spatial-temporal weather forecasting model with adaptive boundary alignment for improved global and regional predictions.
Configuration-aware LoRA adaptation for efficient fine-tuning of quantized LLMs on heterogeneous edge devices with privacy preservation.
Monte Carlo Tree Search approach for multi-attribute controllable summarization without per-attribute fine-tuning, enabling flexible constraint satisfaction.
Co-denoising framework for transferring manipulation skills from human videos to robots by bridging morphological differences.
Security research on defending AI-based videoconferencing systems against pose-expression latent hijacking attacks using biometric detection.
Automated pipeline for scaling reinforcement learning datasets to pretraining scale, addressing data bottleneck in RL for LLM training.
Post-deployment learning framework for Vision-Language-Action policies using retrieved execution memories to improve embodied agent performance.
Data augmentation framework for robotic manipulation using Vision-Language-Action models to improve learning from limited demonstration datasets.
LLM-based framework for predicting flight delays using textual aeronautical information and aircraft trajectory data for air traffic management.
Computational analysis comparing 17,790 articles between Grokipedia (AI-generated) and Wikipedia examining textual and structural biases.
EGMOF: hybrid diffusion-transformer for metal-organic framework generation with inverse design capabilities for materials discovery.
Inference-time optimization using evolutionary algorithms on prompt embeddings for diffusion model control without fine-tuning.
Structured uncertainty framework for LLM agents with tool-calling to generate principled clarifying questions for ambiguous user instructions.
Language-conditioned humanoid robot control using LLM with unified motion vocabulary for free-form command execution and embodied AI.
Bharat Scene Text dataset and benchmark for Indian language scene text recognition addressing script diversity and font variations.
AV-SpeakerBench: multimodal LLM benchmark with 3,212 questions evaluating audiovisual speech understanding and speaker-speech alignment in video.
Analysis of flow-based diffusion models revealing two-stage behavior through oracle velocity field computation and memorization-generalization tradeoffs.
Research on adversarial perturbations for object detectors using black-box attacks to expose vulnerabilities and understand attack mechanisms.
Research on self-distillation methods for teaching language models to leverage cognitive skills like verification and backtracking without base model exposure.
Research on relational visual similarity in computer vision showing how humans perceive analogical relationships beyond attribute similarity.
Framework combining mechanism design and online learning for sequential mechanism design where principal learns agent beliefs while ensuring truthfulness.
Mechanistic study of self-reflection emergence in RL-trained LLMs, proposing two-stage decision-sampling hypothesis to explain unified optimization producing distinct capabilities.
White-box adversarial attack method on computer vision models using SHAP values to generate imperceptible evasion attacks.
Training-free framework for human video animation using cached reference frames to model long-range dependencies while preserving temporal coherence.
Analysis showing layer pruning of LLMs degrades generative reasoning tasks beyond surface degradation, causing loss of algorithmic capabilities.
Method addressing prompt misguidance in diffusion-based super-resolution by using tiled prompts for localized semantic guidance.
Multi-agent framework for smart contract auditing using specialized agents for planning, execution, and recovery with coordination protocols.
Study demonstrating LLM biases when simulating misinformation susceptibility, showing models overstate attitudes and ignore network effects present in humans.
Qualitative study of 33 K12 teachers' perspectives on using conversational AI agents to scaffold group collaboration in classrooms.
Adaptive framework for demand forecasting model selection addressing horizon-induced performance degradation in inventory planning.
Pipeline combining subquadratic retrieval and GPU-accelerated kernels for analyzing immune repertoires at population scale.
Dataset of parasitoid wasps and hymenoptera for taxonomic identification and biodiversity monitoring.
Knowledge distillation method for distilling RL-trained LLMs with chain-of-thought reasoning into smaller student models while preserving reasoning capabilities.
Theoretical analysis explaining why Adam optimizer outperforms SGD through second-moment normalization using stopping-time and martingale analysis.