Navigating the Concept Space of Language Models
ConceptMap tool enables scalable exploratory discovery of human-interpretable concepts in sparse autoencoders trained on LLM activations.
ConceptMap tool enables scalable exploratory discovery of human-interpretable concepts in sparse autoencoders trained on LLM activations.
Konkani-Instruct-100k synthetic dataset and benchmarks address LLM performance gaps for low-resource Indian language across multiple scripts via instruction tuning.
Cognitive psychology-inspired study reveals LLMs drop formatting instruction compliance by 2-21% under concurrent task load, identifying prospective memory vulnerabilities.
Fine-tuned lightweight LLM generates hierarchical JSON representations of scientific sentences preserving semantic meaning for structured knowledge extraction.
MDKeyChunker pipeline enables structure-aware chunking of Markdown documents and single-call LLM enrichment with metadata extraction for improved RAG accuracy.
Philosophical comparison of how LLMs gather data versus human scientific knowledge construction and discovery processes.
Mixture of Demonstrations approach improves GraphRAG performance for domain-specific QA by selecting high-quality demonstrations to reduce irrelevant retrieved information.
Computational analysis of upper entropy algorithms for uncertainty quantification in credal set-based probability models.
Native GUI agent framework ReCAP adds CAPTCHA-solving capability to vision-language models using self-corrective training and automated reasoning-action data generation.
Synthetic Mixed Training combines synthetic QAs and documents to improve LLM knowledge acquisition beyond RAG performance in data-constrained domains.
Safe reinforcement learning approach using preference-based constraint inference for learning complex, subjective safety constraints with minimal expert demonstrations.
AI agent optimizes operator performance on Huawei Ascend NPUs by addressing knowledge bottleneck through episodic learning for tiling and kernel programs.
StateLinFormer: linear-attention navigation model with persistent memory for long-term navigation tasks, combining flexibility with efficiency.
Dual-Criterion Curriculum Learning proposes a meta-learning approach using dual criteria for difficulty assessment in temporal data training.
PoiCGAN introduces poisoning attack methods against federated learning systems using feature-label joint perturbation.
APreQEL proposes adaptive mixed precision quantization to reduce memory and computational costs of LLMs for edge device deployment while maintaining performance.
Time-LLM model for predicting wafer-level spatial etch depth distributions in plasma etching process monitoring.
Analysis of deep learning generalization gap in sleep disorder staging with Grad-CAM interpretability and iSLEEPS clinical dataset.
LLMORPH automated testing tool for LLMs using metamorphic testing to detect NLP task failures without human-labeled oracles.
LLMLOOP framework automating iterative refinement of LLM-generated code and test cases through automated feedback loops.
Theory of LLM information susceptibility analyzing fundamental limits of LLM-mediated optimization in agentic systems.
Ukrainian Visual Word Sense Disambiguation benchmark with 10-image choices for evaluating word sense disambiguation in Ukrainian.
Swiss-Bench SBP-002: trilingual benchmark of 395 expert-crafted regulatory compliance tasks across FINMA, Legal-CH, and EFK domains.
Self-supervised learning method for spectral unmixing in fluorescence microscopy using data-driven approach.
Probing study revealing how LLMs internally represent different ethical frameworks with asymmetric transfer patterns across model sizes.
Echoes dataset with 3,577 music tracks for deepfake detection spanning multiple AI music generation systems.
BIRCH-Trees benchmark for estimating individual tree height and species from RGB UAV imagery for forest monitoring.
Training-free out-of-distribution detection using multi-layer prototype fusion approach for robust deep learning deployment.
Privacy-preserving LLM system for disambiguating clinical acronyms in healthcare without transmitting data to external servers.
Machine learning approach for robotic fruit harvesting using active reachability estimation to improve efficiency in unstructured environments.
Measurement methodology for identifying assessment items where LLMs perform differently than humans using theory-grounded evaluation.
Analysis of early-exit decoding in modern LLMs showing reduced efficiency gains due to improved architectures with lower layer redundancy.
Study of filtered vector search algorithms in PostgreSQL for semantic search and GenAI applications, evaluating real-world database performance.
Continuous-time diffusion models for generating synthetic electronic health records with mixed numerical and categorical features.
Self-paced curriculum learning for RL using closed-form Gaussian updates to improve efficiency in high-dimensional contexts.
Intent-Based Networking using AI to translate high-level natural language intents into network policies with automated compliance assurance.
Human-in-the-loop Pareto optimization for motor skill training and rehabilitation, characterizing task difficulty vs. performance trade-offs.
Bayesian latent transport framework for domain-adaptive foundation models addressing distribution mismatch and uncertainty propagation in limited-supervision scenarios.
Cognitive Firewall: hybrid edge-cloud architecture for securing browser-based LLM agents against indirect prompt injection attacks using split-compute security checks.
LLM-informed model-based planning for object search using LLM likelihood estimates and environment costs.
Neural Regression Collapse phenomenon across network layers showing feature sparsity and low rank structures.
AgentPex: Framework for detecting procedural failures in agentic traces including workflow routing and tool usage violations.
Method for finding representations in language models via adversarial perturbation without implausible constraints.
Benchmark (PoliticsBench) measuring political bias in eight LLMs using multi-turn roleplay evaluation.
Research on activation function curvature role in adversarial robustness using Recursive Curvature-Tunable Activation Family.
Discussion of user experience design for generative AI in education emphasizing human-AI epistemic partnership.
Investigation of vision-language model robustness under distribution shifts using visual deductive reasoning tasks.
HDPO method augmenting RL with privileged self-distillation for LLM mathematical reasoning on unsolvable cliff prompts.
Luna: C++ implementation of alpha-CROWN bound propagation for neural network formal verification.
Multi-agent robotic platform using AI agents for adaptive chemical laboratory automation handling diverse experimental tasks.