What Breaks Embodied AI Security:LLM Vulnerabilities, CPS Flaws,or Something Else?
Analysis of security vulnerabilities in embodied AI systems including LLM-driven agents, autonomous vehicles, and service robots.
Analysis of security vulnerabilities in embodied AI systems including LLM-driven agents, autonomous vehicles, and service robots.
Machine learning framework for trustworthy clinical decision-making with feature stability under incomplete data.
Framework for integrating voice communications with UAV-assisted emergency networks using semantic perception.
SpectralGCD method for generalized category discovery using spectral concept selection and cross-modal representation learning.
Research on improving LLM-based recommendation systems using self-hard negatives from intermediate layers for better preference learning during fine-tuning.
Taxonomy and comparative study of uncertainty quantification methods for detecting hallucinations in long-form LLM outputs.
Study on generative-retrieval architectures in web search and how LLMs have transformed information retrieval practices.
arXiv paper on Jolt Atlas, a zero-knowledge ML framework for verifiable ONNX tensor operation inference using lookup arguments.
Audit of personal data associations in 8 LLMs using LMP2 privacy probe, examining how models retain and surface personal information.
arXiv paper on image copy detection using self-supervised learning with patch-level contrastive learning for manipulated content.
arXiv paper on training neural networks with Boolean threshold functions where all node values are strictly ±1.
arXiv paper introducing LORA-CRAFT, a parameter-efficient fine-tuning method using Tucker decomposition on transformer attention weights.
arXiv paper identifying transformer attention heads functioning as membership filters, analyzing their spectrum of testing strategies across language models.
Position paper arguing current ECG representation learning benchmarks must be revised to align with clinically meaningful objectives.
arXiv paper systematically evaluating mechanistic interpretability in single-cell foundation models using 37 analyses and 153 tests.
Position paper proposing AI co-design for autonomous particle accelerator operation with minimal human intervention.
arXiv paper introducing MASPO, a reinforcement learning method improving gradient utilization and probability mass handling for LLM reasoning.
arXiv paper on gyral folding-based cortical networks for Alzheimer's and Lewy body dementia diagnosis.
arXiv paper analyzing how normalization strategies impact Transformer expressivity for time series representation learning.
arXiv paper on Deep-Flow, an unsupervised anomaly detection framework for autonomous vehicles using optimal transport conditional flow matching.
arXiv paper testing whether speech LLMs behave identically to ASR-to-LLM cascades across four models and six tasks.
arXiv paper on relevance-guided online meta-learning for geospatial discovery under resource constraints and dynamic environments.
arXiv paper on anytime-valid statistical watermarking for distinguishing machine-generated content from human text in LLMs.
arXiv paper on variance control in asynchronous off-policy RL for LLMs, addressing high variance from stale rollouts in critic-free methods.
arXiv paper analyzing weak vs strong verification mechanisms in LLM reasoning systems, examining cost-reliability tradeoffs in verification loops.
arXiv paper on Reverso, a time series foundation model for zero-shot forecasting that scales to hundreds of millions of parameters.
FAMOSE: ReAct-based agent for automated feature engineering in tabular data that autonomously explores and generates optimal features without domain expertise.
Black-box adversarial attack method on Large Vision-Language Models using fine-grained detail targeting to address gradient-free optimization challenges.
MARS framework for reward modeling using margin-aware training and self-refinement to reduce reliance on costly human-labeled preference data.
Novel pruning technique for Diffusion Language Models that optimizes inference efficiency by reconsidering attention sink preservation assumptions.
Research on embodied AI agents using LLMs for open-ended dialog to infer and accomplish diverse user goals efficiently and robustly.
GAI: multi-agent LLM framework with reflection and dialogue for collective reasoning to drive innovation.
Framework studying AI-assisted human decision-making where humans learn through repeated interactions with algorithms.
Method for learning user-specific reward models in RLHF to capture individual preferences in LLM training.
Evaluation framework for assessing health-focused LLMs on personalized response quality with scalable methodology.
Theoretical correspondence between bounded GNNs and first-order logic fragments characterizing expressive power.
∞-THOR: framework for long-horizon embodied AI tasks with benchmark testing long-context reasoning across extended trajectories.
SPECS: method for faster test-time scaling in LLMs using speculative drafts to reduce latency while maintaining performance.
Formal causal explanations for image classifier decisions using logic-based approaches.
Bongard-RWR+: benchmark for abstract visual reasoning with fine-grained concepts using real-world images.
Embodied AI system enabling autonomous drones to make adaptive decisions for sudden events using visual language models.
PROBE: benchmark for measuring proactive problem-solving in LLM agents across extended contexts and time horizons.
SCL: modular agent architecture separating cognition into five phases with soft symbolic control governance layer for LLM agents.
CaveAgent: framework converting LLM-as-text-generator to LLM-as-runtime-operator with dual-stream architecture for long-horizon task execution.
Offline multi-agent RL using local-to-global world models to enable conservative policies to generalize beyond dataset support.
AUTOBUS: neuro-symbolic AI system combining LLMs with deterministic logic for autonomous business process reconfiguration and execution.
SpikeScore: method for cross-domain hallucination detection in LLMs that generalizes across different domains better than existing approaches.
ADP-MA: framework for autonomous data processing using meta-agents that monitor, manage, and optimize end-to-end pipelines after deployment.
Federated learning framework addressing device heterogeneity and non-IID data with differential privacy using bi-level optimization.
EduEVAL-DB dataset for training AI tutors to evaluate pedagogical quality of educational explanations across K-12 subjects.