Language-Guided Structure-Aware Network for Camouflaged Object Detection
Language-guided network for camouflaged object detection in computer vision using textual semantic priors.
Language-guided network for camouflaged object detection in computer vision using textual semantic priors.
MolEvolve combines LLM guidance with evolutionary search for interpretable molecular optimization, addressing activity cliffs.
LLMs assess teacher-child interactions in Chinese preschools for scalable early childhood education monitoring.
Studies fairness in recommender systems, examining relationship between fair model representations and fair recommendations.
ClawKeeper adds safety mechanisms to OpenClaw autonomous agent runtime, addressing vulnerabilities in tool integration and command execution.
OneSearch-V2 improves generative retrieval for search systems with latent reasoning and self-distillation. Industrial-scale framework.
Large-scale annotated video demonstration dataset for computer-use agents enabling automation of complex desktop workflows with continuous video sequences.
Integration of causal machine learning into clinical decision support systems with clinician-facing interfaces for interpretable treatment-specific reasoning.
Multimodal system for reuniting lost pets using animal vocalizations and cognitive science insights beyond appearance-only matching.
Autoresearch pipeline using Claude Code LLM agent to autonomously discover novel white-box adversarial attack algorithms outperforming 30+ existing methods.
Multi-dimensional evaluation framework for uncertainty attribution methods in explainable AI with aligned proxy tasks and metrics.
Mobile GUI agent using rejection fine-tuning to learn from failed trajectories and improve credit assignment for long-horizon tasks.
Video-language foundation model pretraining on surgical procedure videos for zero-shot event recognition in intraoperative settings.
Framework combining diffusion-based world models with selective enhancement for temporally coherent augmented reality applications.
Sociolinguistic analysis of bias in automatic speech recognition systems using Newcastle English dialect data.
Empirical study comparing chunking strategies for RAG systems in oil and gas documents, evaluating fixed-size, recursive, semantic, and structure-aware approaches.
Agentic video understanding framework using Vision-Language Models with active planning to seek evidence from raw video during reasoning.
Free-Market Algorithm metaheuristic using distributed supply-and-demand dynamics for open-ended optimization with emergent fitness.
Adversarial attack methods to protect images from malicious diffusion-based image-to-video generation models.
Vision-Language Models for converting rasterized figures into editable SVG vector graphics automatically.
Study of RAG systems applied to AI policy analysis using AGORA corpus, examining reliability challenges in dense legal language domains.
Vision-Language Models for supporting human decision-making in high-stakes domains like medical diagnosis through collaborative human-AI systems.
Study of AI agents powered by LLMs in multi-echelon supply chain simulation investigating emergent strategic behavior and dynamics like the bullwhip effect.
Graph-based evaluation framework for domain-specific LLM benchmarking using clinical guidelines transformed into queryable knowledge graphs with dynamic query instantiation.
GeoSketch: Neural-symbolic approach for geometric reasoning in MLLMs using auxiliary line construction and affine transformations for problem solving.
SAG-Agent: LLM-based agent using dynamic knowledge graphs for long-horizon reasoning in strategy games via GUI interaction without APIs.
CastMind: Agentic reasoning framework for time series forecasting using iterative refinement with temporal features, domain knowledge, and case-based references.
Pharos-ESG: Multimodal framework for parsing and labeling ESG reports with hierarchical document understanding and narrative generation.
Generative Adversarial Reasoner: Framework using adversarial RL to improve LLM reasoning capabilities and reduce mathematical errors through co-evolved reasoner-discriminator training.
Research on enabling ultra-long-horizon autonomous agents with cognitive accumulation for multi-week ML engineering experiments.
Evaluates LLM performance on perspective-taking and knowledge state estimation tasks comparing cognitive abilities to chimpanzees.
CollectiveKV framework reduces inference latency in Transformer-based sequential recommendation systems through KV cache optimization.
CIRCLE framework for evaluating AI systems across six lifecycle stages, bridging gap between benchmarks and real-world deployment outcomes.
Framework for evaluating logical reasoning agents with agentified assessment, standardized interfaces, and structured failure tracking.
TikZilla: Dataset and reinforcement learning approach for scaling text-to-TikZ scientific figure generation from high-quality training data.
GPT4o-Receipt: Benchmark of 1,235 receipt images comparing AI-generated vs authentic documents evaluated by LLMs and humans.
Framework for relationship-aware safety unlearning in multimodal LLMs addressing relational safety failures without collateral damage.
DomAgent: Framework combining knowledge graphs and case-based reasoning with LLMs for domain-specific code generation tasks.
PhySe-RPO: Diffusion-based framework for surgical smoke removal using physics and semantics-guided relative policy optimization.
Counterfactual learning approach for CVR estimation in recommender systems addressing data sparsity and sample selection bias.
Survey on enterprise financial risk analysis using big data and LLM technologies for financial prediction and management.
Moonwalk: Inverse-forward differentiation technique addressing backpropagation's memory limitation for training deeper neural networks.
DIDLM: Multi-sensor SLAM dataset with infrared, depth, LiDAR, 4D radar for adverse weather and low-light robotic navigation scenarios.
Theoretical analysis of feature learning in Leaky ResNets using Hamiltonian mechanics and representation geodesics.
Method for heterogeneous treatment effect estimation from observational data using local proximity balancing to reduce treatment selection bias.
Dynamic Neural Potential Field: Learning-enhanced MPC framework coupling Transformer-based predictor with classical optimization for robot obstacle avoidance.
SGMA: Framework combining symmetry-guided experience augmentation and memory inference to improve reinforcement learning efficiency for legged locomotion.
Evaluation framework for large language models addressing randomization in coupled token generation with causal modeling approach.
Unicorn: Multi-agent reinforcement learning approach for adaptive traffic signal control in heterogeneous urban networks.
KINESIS: Model-free reinforcement learning framework for human motion imitation with musculoskeletal constraints and biomechanical joint modeling.