Jump Start or False Start? A Theoretical and Empirical Evaluation of LLM-initialized Bandits
Theoretical and empirical evaluation of using LLM-generated preferences to warm-start contextual bandits, examining alignment with actual user preferences.
Theoretical and empirical evaluation of using LLM-generated preferences to warm-start contextual bandits, examining alignment with actual user preferences.
Analysis of stability in post-hoc feature attribution methods for vision systems under input perturbations, introducing evaluation suite.
LLM-based code generation for security vulnerabilities using CAPEC and CWE frameworks, addressing gaps in existing vulnerability datasets.
Study of cultural bias in LLM text generation, introducing task of culturally-adapted artwork descriptions for different audience groups.
Integrative review of generative AI impact on entrepreneurship across opportunity recognition, evaluation, resource assembly, and venture launch stages.
Research on safety alignment vulnerabilities in LLMs, examining jailbreak-tuning and weight orthogonalization methods that can disable safety guardrails.
Comparative study of LLM vs human coordination in group games, revealing volatility and action bias differences in adaptive strategies.
Vision-language model extension for referring image segmentation using autoregressive decoding and reinforcement learning refinement.
System grounding LLM-generated explanations in formal representations to enable interactive exploration of mathematical proofs.
Tool for developing research ideas through dynamic literature contextualization and critique using LLMs.
Security analysis of memory-based LLM web agents, demonstrating environment-injected poisoning attacks through persistent memory exploitation.
Vision foundation model applied to rapid building damage mapping from post-earthquake imagery for disaster response.
Continual graph learning method addressing feature drift in non-exemplar settings using analytic continual learning.
Self-supervised depth estimation for articulated vehicles using cross-vehicle 3D geometric consistency.
Game benchmark with 124 bugs for evaluating LLMs' ability to autonomously discover bugs as QA engineers in dynamic environments.
Distributed training approach for graph neural networks using communication-free sampling and hybrid parallelism.
Theoretical analysis of reinforcement learning alignment limitations in LLMs, demonstrating generalization failures through compound jailbreak attacks.
Efficient model compression using randomized subspace iteration for low-rank decomposition of pretrained models.
Study of sycophancy propagation in multi-agent LLM systems, examining how agents' awareness of others' biases affects collaborative discussions.
Large-scale empirical study of coordination dynamics in LLM multi-agent systems, analyzing scaling behavior and power laws in collective cognition.
Agentic framework using LLMs for automated clinical trial evidence synthesis and meta-analysis with eligibility-aware study selection.
Using sparse autoencoders to understand geometric structure of belief representations in transformer models and LLMs.
Token-space adversarial attacks on reward models used in RLHF, introducing token mapping perturbation attack paradigm beyond semantic manipulation.
Framework for reducing computational overhead in 3D multimodal LLMs through adaptive token reduction for resource-constrained deployment.
AI agent system for document forgery detection using evidence-grounded reasoning, combining detection, localization, and explanation for document safety.
Controlled replication study examining vocabulary constraints versus linguistic structures in LLM reasoning, testing E-Prime effects on cognition.
Systematic evaluation framework for LLM formal reasoning capabilities using Chomsky hierarchy and computational complexity theory.
Multimodal LLM benchmark for autonomous driving with vehicle, infrastructure, and cooperative viewpoints, evaluating reasoning across V2X conditions.
Multi-sensor foundation model merging HiRISE, CTX, and THEMIS Mars remote sensing data via equal validation loss alignment strategy.
Multi-domain benchmark for industry code generation across finance, automation, and aerospace using LLMs, addressing single-domain limitations.
Evaluation of active preference learning versus random sampling in online DPO for modern LLMs, showing random sampling is surprisingly competitive.
Formal framework for verifiable delegation chains in multi-agent AI systems, defining properties for authorization tracking and policy enforcement.
Framework for improving data literacy in AI-assisted analysis by disrupting cognitive passivity through guided reasoning rather than direct answers.
Diffusion transformer method for inverse tone-mapping, converting 8-bit SDR video content to perceptually accurate 10-bit HDR.
Rubric-based RL framework bridging response-level and token-level rewards for LLM alignment in instruction following tasks.
Benchmark dataset for pavement distress assessment using vision-language models, requiring quantitative analysis and interactive decision support.
Task-specific LLM framework for generating SystemVerilog assertions for hardware verification, addressing data scarcity and accuracy challenges.
Quantization-aware vision token pruning for multimodal LLMs, optimizing coupled compression techniques for resource-constrained deployment.
Framework for synthesizing novel-view video sequences from single images using diffusion models with geometry-aware expansion strategy.
First comprehensive security analysis of Agent Skills, an open standard for modular LLM agent packages, covering threat taxonomy and vulnerabilities.
Conditional diffusion model for reconstructing 3D ocean states from sparse surface observations using satellite and in situ data.
End-to-end training method for localizing temporal video segments matching sentence queries, addressing task discrepancy in video backbone optimization.
Workshop on integrating LLMs with graph-structured data, covering algorithms and systems for bridging LLMs, graph databases, and ML for practical applications.
Study of weight-space model merging for multilingual machine translation, evaluating behavior when combining independently fine-tuned models.
Procedural geometry data generation and visual grounding using vision-language models for geometry education as referring image segmentation.
Legal analysis of Anthropic's AI constitution document as governance framework, discussing limitations in military and surveillance contexts.
Split-and-conquer framework for detecting partial deepfake speech using boundary detection and segment-level classification stages.
Council Mode: multi-agent consensus approach mitigating hallucinations and bias in MoE LLMs through coordinated expert activation.
Learning method using provenance-based input gradient guidance to improve model discrimination robustness with synthetic training data.
Study of annotator competence development and subjective judgment changes during social influence recognition annotation tasks.