GroupRank: A Groupwise Paradigm for Effective and Efficient Passage Reranking with LLMs
GroupRank: Efficient passage reranking paradigm using LLMs with groupwise ranking to balance efficiency and accuracy.
GroupRank: Efficient passage reranking paradigm using LLMs with groupwise ranking to balance efficiency and accuracy.
LiveCLKTBench: Benchmark pipeline for reliably measuring cross-lingual knowledge transfer in multilingual LLMs with time-sensitive queries.
Framework for process-centric evaluation of agentic software systems, analyzing execution trajectories and reasoning beyond outcome metrics.
Theoretical framework for sparse dictionary learning in neural networks, analyzing piecewise biconvexity and spurious minima in mechanistic interpretability.
WisPaper: AI agent system for academic paper discovery and organization, addressing semantic search and workflow fragmentation challenges.
Multimodal expert fusion approach for interpretable Alzheimer's disease diagnosis from neuroimaging data.
VPR-AttLLM framework using LLM semantic reasoning to improve geo-localization of crowdsourced flood imagery.
Method for multi-subject image generation with distinction capability, integrating composition and distinction in subject-driven synthesis.
Multimodal RAG system enhanced with knowledge graphs for audio-visual retrieval, extending LLM capabilities to multimodal domains.
Study on imitation learning for autonomous driving, addressing the gap between privileged expert demonstrations and sensor-limited student observations in simulation.
Research on variance-aware tree policies for Monte Carlo Tree Search, improving upon UCB-based methods used in AlphaZero-style algorithms.
CricBench: benchmark for evaluating LLMs on multilingual cricket analytics and domain-specific Text-to-SQL tasks.
Survey of Brazilian K-12 teachers' perceptions on AI in education, examining AI literacy and adoption across 346 educators.
Research questions whether small proxy model training reliably guides data curation decisions for full-scale frontier AI model pretraining.
Disco-RAG improves retrieval-augmented generation by capturing discourse structure and synthesizing knowledge from dispersed evidence.
Enhanced-FQL(λ) reinforcement learning framework with fuzzy eligibility traces and interpretable fuzzy rules for continuous control.
Defensive poisoning technique merges triggers to remove backdoors in instruction-tuned LLMs vulnerable to data poisoning attacks.
HAERAE-Vision benchmark with 653 real-world underspecified visual questions reveals vision-language model limitations with informal queries.
SODACER: Safe reinforcement learning framework with dual-buffer adaptive clustering for nonlinear system control.
GanitLLM: Bengali mathematical reasoning model with difficulty-aware curriculum-based GRPO training pipeline.
Coverage-enhanced latent actions framework for controlling multimodal conversational agents with reinforcement learning.
Game-theoretic analysis of how expanding AI agent capabilities affects strategic interaction in bargaining, negotiation, and persuasion.
EZ-MIA: Training-free membership inference attack against fine-tuned language models to audit privacy risks from data memorization.
Deep learning model for drug response prediction combining chemical substructures with cellular pathway states using differential attention.
Cross-modal domain adaptation approach transferring image dataset knowledge to LiDAR for synthetic training data generation.
Analyzes parallelism and generation order in Masked Diffusion Language Models across 8 models and 58 benchmarks.
Uses persona-based evaluation with LLMs to support inclusive cycling infrastructure design by simulating diverse user experiences.
Systematic analysis of demographic bias in LLM-generated targeted messaging across GPT-4o, Llama-3.3, and Mistral-Large models.
MERMAID: Multi-agent system for fact-checking using LLMs with memory-enhanced retrieval and iterative reasoning to assess veracity of claims.
Agent memory system beyond RAG addressing agent-specific needs: bounded coherent dialogue retrieval with decoupling and aggregation.
Unified framework explaining LLM steering methods (fine-tuning, LoRA, activation interventions) as dynamic weight updates from control signals.
El Agente Estructural multimodal agent for autonomous molecular geometry generation and manipulation using natural language and vision.
Fake-HR1 hybrid-reasoning model for synthetic image detection balancing chain-of-thought reasoning with computational efficiency.
Study of electromagnetic fault injection attacks on embedded deep learning models analyzing influence of number representations on resilience.
AdvSynGNN resilient graph neural network architecture addressing structural noise and non-homophilous topologies via adversarial synthesis.
SubQuad pipeline for immune repertoire analysis combining subquadratic retrieval with GPU-accelerated affinity kernels and multimodal fusion.
UBio-MolFM universal molecular foundation model framework for bio-system simulation bridging quantum accuracy and biological scale.
Pyramid MoA hierarchical mixture-of-agents architecture with decision-theoretic router for cost-optimized anytime LLM inference.
Gome agent for machine learning engineering using gradient-based optimization instead of tree search, scaling LLM-based reasoning.
SteerEval benchmark for evaluating LLM controllability across language features, sentiment, and personality at multiple specification levels.
Physics-informed surrogate model for ferroelectric NAND retention analysis reducing computational cost from day-scale to second-scale.
BiCLIP extends vision-language models to specialized domains via structured geometric transformation and domain canonicalization.
Causally-informed feature expansion method for class-incremental learning addressing catastrophic forgetting through feature collision mitigation.
Analysis of prompt injection attacks as role confusion where models infer text source by content style rather than origin.
Survey of resource consumption threats in LLMs including excessive generation attacks, resource efficiency requirements, and mitigation strategies.
Study of LLM alignment evaluation focusing on routing from concept detection to behavioral policy, using Chinese language models as case study.
Diffusion-based image super-resolution framework addressing inference efficiency vs reconstruction quality trade-off with learnable noise prediction.
Suiren-1.0 family of molecular foundation models for organic systems with 1.8B parameters pre-trained on 70M samples.
Cross-view geo-localization method for UAV navigation in GNSS-denied environments using aerial and satellite image matching.
Curriculum learning framework using cross-entropy games to automatically build general capabilities and discover skills in language models.