Large-scale online deanonymization with LLMs
LLM agent with internet access performs large-scale deanonymization of Hacker News users and other pseudonymous profiles.
LLM agent with internet access performs large-scale deanonymization of Hacker News users and other pseudonymous profiles.
Using reference-guided LLM evaluators as soft verifiers to improve LLM alignment in non-verifiable domains through RLVR.
DemosQA benchmark evaluating monolingual and multilingual LLMs on Greek question answering tasks.
Flow-based continuous denoising language models outperform discrete diffusion in generation speed and quality with fewer steps.
Randomized trial showing AI-mediated feedback via FeedbackWriter improves student revisions compared to human-only feedback.
Two-timescale analysis showing feedback-truth gap arises when noisy feedback is absorbed faster than task structure evaluation.
Verbalized Action Masking technique for controlling exploration in RL post-training of LLMs with iterative action-space pruning.
User studies on designing informative but non-overwhelming trace interfaces for human oversight of agentic AI systems.
AdaptOrch framework for optimizing multi-agent orchestration topology over individual model selection as LLM performance converges.
MALLVi multi-agent framework combines LLMs and vision for closed-loop robotic manipulation. Integrated planning with environmental feedback.
Xray-Visual: unified vision model trained on 15B image-text and 10B video-hashtag pairs from Meta. Large-scale multimodal ML.
Uses LLMs to automatically discover multi-agent RL algorithms for imperfect-information games. Automates MARL algorithm design.
Explores user control in conversational VQA systems for blind users. Studies customization and steering in assistive AI.
RankEvolve uses LLM-driven evolutionary search to discover improved retrieval algorithms. Automates algorithm discovery for information retrieval.
Proposes symbolic alternative to message-passing GNNs for interpretability. Addresses expressivity limits and explainability in graph learning.
Cross-lingual study of euphemism detection between Turkish and English. Multilingual NLP research.
Spectral graph analysis framework for psychological patterns in Persian poetry using ML annotation. NLP application to literary analysis.
Unified framework for exploiting locality in scalable multi-agent RL. Addresses dimensionality curse in MARL systems.
Studies early-warning signals of grokking in neural networks using loss-landscape geometry. Analyzes generalization phase transitions.
DDiT proposes dynamic patch scheduling for efficient diffusion transformers. Reduces computational cost via adaptive tokenization.
Research on using LLMs to extract user stories from UI mockups. Explores automated requirement generation via ML.
Persona2Web benchmark for evaluating personalized web agents using LLMs with user context. First benchmark for personalization in web agents.
LLM-powered conversational agents with error recovery through dialogue diagnosis and recovery planning, improving robustness beyond error prevention.
AI-enhanced tensor methods and in-context learning applied to behavioral neuroscience discovery pipelines, automating data preparation and annotation.
Research on applying in-context learning and tensor methods to accelerate behavioral neuroscience discovery pipelines.
FATE: ensemble method for proactive anomaly detection in time-series with uncertainty quantification for early warning signals.
Wink: framework for recovering coding agents from misbehaviors including instruction deviation, infinite loops, and tool misuse.
Study evaluating cross-lingual text classification approaches for multilingual social media analysis across nine million tweets.
Research showing weight initialization signs persist in neural networks and create bottleneck for sub-bit model compression.
Statistical research on sample size analysis for probabilities of causation using delta method approach.
AdvSynGNN: graph neural network architecture for robust node classification handling structural noise and non-homophilous topologies.
FLoRG: federated fine-tuning technique for LLMs using low-rank Gram matrices and Procrustes alignment across distributed clients.
Research comparing deep reinforcement learning to mean-variance optimization for portfolio asset allocation and risk management.
TIFO: machine learning method for nonstationary time series forecasting addressing distribution shift using frequency operators.
Novel vector quantization approach for VAEs that decouples representation learning from discretization to address training instability and codebook collapse.
Unified multimodal model for time series that bridges numerical generation and semantic understanding tasks using vision-centric approach.
Empirical study comparing how linear and quadratic attention mechanisms perform in-context learning on linear regression tasks, analyzing learning quality, convergence, and generalization.
Continual uncertainty learning approach for robust control of mechanical systems using deep reinforcement learning.
Deep learning method for crystal structure prediction using universal fine-grained symmetry inference.
Systematic evaluation of LLM robustness in long-context code question answering across multiple programming languages.
Study examining how conversational agents' linguistic personality expressions affect user perceptions and decisions.
ASTERIS self-supervised transformer-based denoising algorithm for astronomical imaging with spatiotemporal information.
Federated latent space alignment approach for multi-agent semantic communications in AI-native systems.
X-Value benchmark for cross-lingual values assessment evaluating LLMs' ability to assess content values globally.
Flickering Multi-Armed Bandits framework where available actions change dynamically based on agent history.
Study examining how lexical and syntactic perturbations affect LLM evaluation benchmark scores and model ranking reliability.
WebFAQ 2.0 multilingual dataset with 198M FAQ pairs across 108 languages and mined hard negatives for dense retrieval.
SubQuad pipeline for immune repertoire analysis addressing quadratic cost of affinity evaluations.
Prompt-driven self-improving optimization technique for graph out-of-distribution detection in neural networks.
Analysis of security vulnerabilities in embodied AI systems including LLM-driven agents, autonomous vehicles, and service robots.