Knowledge Distillation with Structured Chain-of-Thought for Text-to-SQL
Knowledge distillation method using structured chain-of-thought to improve text-to-SQL performance in small language models while reducing cost and security risks.
Knowledge distillation method using structured chain-of-thought to improve text-to-SQL performance in small language models while reducing cost and security risks.
SDUM presents a universal deep learning framework for MRI reconstruction across diverse protocols using Restormer-based architecture.
KnowVal combines visual-language reasoning, driving knowledge, and value alignment for autonomous driving using knowledge-augmented approaches beyond data-driven learning.
Dataset (judgeWEL) for named entity recognition in Luxembourgish using LLMs to verify automatically labeled data for underrepresented languages.
Benchmark and framework (Grand-SMOT) for semantic multi-object tracking using multimodal LLMs to handle complex relational queries.
Wavelet-based Transformer for intraday trading using multi-scale decomposition and reinforcement learning on financial time series.
arXiv: LMMRec framework using LLMs to model user motivations in multimodal recommendation systems from heterogeneous information sources.
arXiv: LatentChem decouples chemical reasoning from natural language by using latent representations instead of chain-of-thought prompting.
arXiv: GOT-JEPA visual object tracking using joint-embedding predictive architecture with model adaptation and occlusion handling.
arXiv: Hybrid-policy reinforcement learning approach for enhancing multi-modal LLM reasoning with controlled exploration strategies.
arXiv: ECHOSAT global tree height mapping using vision transformers on multi-sensor satellite data for forest monitoring.
arXiv: Evaluating small language models for zero-shot and one-shot role classification in robot leader-follower interaction tasks.
arXiv: FlashOptim memory-efficient optimizers for mixed-precision neural network training reducing parameter storage overhead.
arXiv: ProtoDCS framework for robust test-time adaptation of Vision-Language Models under distribution shift and open-set scenarios.
arXiv: Using LLMs with structured prompts to generate auxiliary lemmas for constraint solving with inductive definitions.
Study of parental moderation preferences for children's GenAI chatbot interactions using LLM-generated probe scenarios.
BLOCK: open-source two-stage MLLM pipeline generating Minecraft character skins from text descriptions via 3D preview synthesis.
Multi-dimensional evaluation of LLM safety benchmarks assessing their academic influence and code repository quality.
Analysis of performative chain-of-thought in reasoning LLMs showing discrepancy between generated explanations and internal model beliefs.
Design study on LLM-assisted tools for systematic literature reviews to reduce cognitive load and enable iterative exploration.
Investigation of tokenizer pretraining impact on physics foundation models for emulating complex multiphysics simulations.
Expert perspectives on integrating foundation models and AI agents into clinical computational pathology with translational readiness assessment.
Framework using structured perturbations to evaluate LLM performance on high-stakes grant proposal reviews across quality dimensions.
Human-aware behavior design for mobile robot chemists in autonomous self-driving laboratories.
Weakly supervised deep learning for gland segmentation in colorectal cancer histopathology without pixel-level annotations.
Neurobiologically-inspired learning algorithm for temporal pattern recognition as alternative to backpropagation through time.
EXPLORE-Bench benchmark evaluating multimodal LLMs' ability to reason about long-horizon physical consequences from egocentric viewpoints.
AraModernBERT adapter for Arabic NLP with transtokenized embeddings and 8K token context window support.
Dataset creation using Wikidata to detect cultural biases in LLMs across Latin American languages and contexts.
Semantic embedding injection in neural transducers for low-latency streaming automatic speech recognition.
Epistemic Support-Point Filter for recursive estimation using maximum entropy and falsification principles.
Proposes harmonic loss as alternative to cross-entropy for training deep neural networks with improved interpretability.
Method to provably compute adversarial examples for black-box neural networks using Contract And Conquer approach.
Human-inspired reasoning approach for robust speech deepfake detection with improved generalization to unseen audio domains and interpretability.
Analysis of cue-conflict benchmarks for measuring neural network shape-texture bias, identifying instability in stylization-based bias estimation methods.
Historical Consensus approach for preventing posterior collapse in VAEs through iterative Gaussian mixture prior selection based on data covariance spectral properties.
Uncertainty quantification framework for neural operator PDE surrogates emphasizing computational efficiency and spatial localization of epistemic uncertainty.
Interventional time series data generator for training causal foundation models, extending prior-data fitted networks to time series with ground-truth interventions.
Graph tokenization framework combining reversible serialization with BPE to enable transformer models to process graph-structured data as token sequences.
Analysis of routing signatures in sparse Mixture-of-Experts transformers to understand task-conditioned expert selection patterns in large language models.
Gradient descent-based approach for learning interpretable tree-based decision models, addressing combinatorial complexity of traditional discrete tree learning.
Data-driven superposition operator for non-renewal arrival processes in queueing networks using moment-based learning instead of classical analytical methods.
Group Resonance Network for EEG emotion recognition combining individual dynamics with group-level regularization to handle cross-subject variability.
Surrogate modeling approach for building energy prediction using weather-guided models to reduce computational cost of physics-based EnergyPlus simulations.
Quantum convolutional neural network architecture using localized cost functions and tensor-network initialization to address barren plateau optimization challenges.
Higher-Order Modular Attention (HOMA) mechanism extending transformer attention to capture triadic interactions in protein sequences beyond pairwise dependencies.
REOPOLD: framework for on-policy distillation of reasoning capabilities to smaller models using policy optimization and token-level rewards for efficient scaling.
H2LooP Spark Preview: continual pretraining framework for LLMs specialized in low-level embedded systems code generation with hardware-specific domain adaptation.
Mechanistic interpretability analysis of video vision transformers to understand how internal circuits represent action-outcome relationships in classification tasks.
Scaling-law framework for jailbreak attacks on LLMs treating each attack as compute-bounded optimization across methods and model families.