CRFT: Consistent-Recurrent Feature Flow Transformer for Cross-Modal Image Registration
CRFT transformer-based framework using feature flow learning for robust cross-modal image registration in coarse-to-fine approach.
CRFT transformer-based framework using feature flow learning for robust cross-modal image registration in coarse-to-fine approach.
SemLink tool using Siamese Sentence-BERT for semantic-aware automated test oracles detecting hyperlink rot and semantic drift in web applications.
Systematic analysis and benchmark comparing LLM-based automated penetration testing frameworks for autonomous security testing.
Analysis of diffusion-based image compressors' robustness to bit-flip errors compared to classical and learned codecs.
CAKE benchmark with 188 expert-validated questions evaluating LLMs' understanding of cloud-native software architecture across Bloom's taxonomy levels.
Fine-tuning technique using instance-level knowledge scores to reduce LLM hallucinations by aligning pre-training and fine-tuning knowledge.
Demographics-agnostic training method for mitigating bias in wake-up word detection across diverse speaker populations.
EEG-MFTNet deep learning architecture combining multi-scale temporal convolutions and transformers for cross-session motor imagery decoding in BCIs.
Representation-level evaluation metric for learner representations in educational AI systems measuring distinctiveness between students.
Neural network pruning formulated as QUBO optimization problem with principled objective formulations capturing filter interactions.
Swiss-Bench 003 benchmark extending HAAS framework to evaluate LLM reliability and adversarial security in Swiss regulatory and financial contexts.
Method for automated dental superimposition comparing 3D intraoral scans and 2D photos for human identification in forensic contexts.
Technique for improving text-to-image diffusion model interpretability through selective aggregation of cross-attention maps from relevant attention heads.
Neural network method using ReLU networks for generating graphs constrained by specified graph edit distance for cheminformatics and data augmentation.
Benchmark evaluating vision-language models' ability to understand multimodal puns combining visual and textual elements.
Successor representation method for zero-shot unsupervised RL in visual environments using saliency-guided representations and consistency policy learning.
Evaluation method for LLM-based issue resolution agents beyond pass rates, assessing compliance with implicit design constraints and architectural conventions.
Formal security framework for MCP-based AI agents, including threat taxonomy, verification models, and defense mechanisms for tool-connected LLM systems.
Study on surface compliance in LLMs: models agree with knowledge edits but don't internalize changes, affecting reliability of edited parametric memory.
Qualitative case study examining legal professionals' perceptions on AI governance, regulatory gaps, and institutional readiness in Nigeria.
CritBench: evaluation framework for cybersecurity capabilities of LLM agents in operational technology (OT) environments like IEC 61850 digital substations.
Multi-stage validation framework for trustworthy clinical information extraction using LLMs at scale without annotation-intensive reference standards.
Evaluation of LLM personality simulation using psychometric profiles and life story generation, comparing model outputs against real human psychological data.
Framework using graph priors to improve structural coherence in part-based image synthesis by modeling spatial and semantic relationships.
Method using dual self-consistency reinforcement learning to synthesize TikZ graphics code from images, addressing precision challenges in multimodal LLM code generation.
Framework modeling paraphrasing as affine transformations in transformer embedding spaces to improve interpretability of language model latent spaces.
Research on how social dynamics in multi-agent LLM systems (conformity, expertise perception, dominance) undermine objective decision-making by representative agents.
Research paper LLM4CodeRE uses domain-adapted LLMs for malware decompilation analysis and reverse engineering of obfuscated code.
Research paper on lightweight multimodal VLM adaptation for thermal drone imagery species recognition and habitat analysis via projector alignment.
Research paper on Gym-Anything, a framework converting any software into agent environments for training computer-use agents on complex, long-horizon tasks.
Research paper introducing Polynomial Mixer (PoM), a linear-time token mixing mechanism replacing self-attention in transformers with preserved universality.
Shot-based quantum encoding distributes quantum resources for efficient data loading in quantum neural networks.
Synthetic pipeline generates doctor-patient conversations for training and evaluating long-form audio summarization models.
MIGT taxonomy addresses governance of machine identities and automated agents in enterprise and geopolitical contexts.
Analyzes multi-token prediction's gradient inductive bias for developing coherent world models compared to next-token prediction.
MMEmb-R1 incorporates chain-of-thought reasoning into multimodal embeddings with pair-aware selection and adaptive control mechanisms.
Diffusion model approach for converting low dynamic range video to HDR through scene radiance estimation.
Test-time training method updates LLM fast weights at inference to adapt dynamically to new information streams.
UserCentrix is a hybrid agentic orchestration framework for smart spaces combining memory augmentation with multi-agent coordination.
ARIEL framework pairs expert-vetted biomedical tasks with LLMs for evaluation and optimization of AI research assistants.
Fine-tunes open-source LLMs for smartphone app control by learning action semantics rather than syntax, reducing API costs.
URSA framework enables LLMs to conduct autonomous research through complex reasoning, planning, coding, and multi-agent collaboration.
MedGemma is a medical vision-language foundation model collection designed for healthcare AI tasks with privacy preservation.
Agent-based model framework for simulating cascading climate risks in supply chains with adaptive firm behavior and economic network effects.
Extends Nash learning from human feedback to multiplayer setting, addressing non-transitive and heterogeneous preference capture in LLM alignment.
DeepSearch applies Monte Carlo Tree Search to overcome training plateaus in reinforcement learning from verifiable rewards for language model reasoning.
Introduces Supervised Multi-Dimensional Scaling to analyze and compare feature manifold hypotheses in language models' latent spaces.
TS-Agent enables LLMs to reason over raw time series data directly without converting to text/images, reducing hallucination and knowledge leakage.
DRIFT method automates mathematical theorem formalization for LLMs by decomposing statements and retrieving prerequisite knowledge in formal languages.
Critiques rule-based and reward-based approaches in RL ethics, proposes virtue ethics framework for more robust machine ethics.