NRR-Phi: Text-to-State Mapping for Ambiguity Preservation in LLM Inference
Framework addressing LLM's tendency to collapse ambiguous inputs prematurely by mapping text to non-collapsing state spaces for better dialogue reasoning.
Framework addressing LLM's tendency to collapse ambiguous inputs prematurely by mapping text to non-collapsing state spaces for better dialogue reasoning.
Study introducing VAPT toolkit to evaluate how LLMs extract, embody, and explain human values from conversations through user perception research.
Benchmark for evaluating multimodal LLMs on handwritten STEM student solutions with mathematical formulas and diagrams, addressing authentic domain-specific evaluation gaps.
TernaryLM: Language model trained natively with 1.5-bit quantization achieving memory-efficient deployment on edge devices while maintaining language modeling capability.
Video generation model for precise instance insertion with sparse control in filmmaking applications, moving beyond prompt-engineering toward controllable generation.
Benchmark evaluating LLM-based coding agents on their ability to learn from context and reuse experience across related software engineering tasks in repositories.
Administrative law analysis of how government agencies balance technological capability with democratic oversight and accountability mechanisms.
Comparative study of CNN architectures (VGG, ResNet, GoogLeNet) analyzing relationship between depth and trainability in image recognition.
DUET-VLM: dual-stage token reduction framework for vision-language models reducing computational cost while maintaining accuracy during training and inference.
PedaCo-Gen: pedagogically-informed human-AI system for collaborative instructional video generation using Cognitive Theory of Multimedia Learning.
Layer gradient analysis method for identifying optimal layers in LLMs for knowledge editing while preserving model behavior on unrelated inputs.
Extension of ptychographic imaging to overlap-free single-shot coherent diffractive imaging using physics-informed neural networks.
SpotIt+: open-source verification tool for Text-to-SQL evaluation using bounded equivalence checking and constraint-mining for practical query discrepancies.
DiFlowDubber: two-stage approach for automated video dubbing using discrete flow matching for expressive prosody and precise audio-visual synchronization.
Method for measuring physical frame rate from visual dynamics in generative video models to improve temporal consistency.
AgentTrace: lightweight framework for post-hoc root cause analysis in deployed multi-agent systems using causal graph tracing from execution logs.
Study showing LLMs struggle with private library code generation despite API documentation; proposes teaching methods for private-library-oriented code generation.
Analysis of multimodal LLMs generating natural language explanations for face verification decisions on unconstrained images.
Goedel-Code-Prover: hierarchical proof search framework for automated code verification in Lean 4 using LLMs to decompose complex verification goals.
Analysis of how AI scaling laws reshape classical Amdahl's Law for modern heterogeneous computer architectures with specialized accelerators and tensor datapaths.
KG-Hopper: reinforcement learning framework enabling compact open-source LLMs to perform knowledge graph reasoning for multi-hop KBQA tasks.
mSFT: iterative algorithm for multi-task supervised fine-tuning that addresses heterogeneous overfitting by dynamically adjusting compute budget across datasets.
KALAVAI: quantitative model predicting when independently trained specialist LLMs can be fused post-hoc with measurable performance gains; includes practical prediction formula.
EVA: reinforcement learning framework for video agents using MLLMs with adaptive reasoning to handle long video sequences and temporal dependencies efficiently.
MDKeyChunker: structure-aware chunking pipeline for Markdown documents with single-call LLM enrichment to improve RAG accuracy and reduce metadata extraction overhead.
Deep learning model for automated sleep staging shows poor generalization to clinical populations with comorbid sleep disorders; proposes iSLEEPS to address limitations.
arXiv paper on SM-Net, machine learning model generating stellar spectra from fundamental stellar parameters using multiple libraries.
arXiv paper analyzing response homogenization in RLHF-aligned LLMs and its effects on uncertainty estimation methods.
arXiv paper introducing scalability coefficients for detecting problematic items in large-scale AI benchmarks using isotonic regression.
arXiv paper on Few TensoRF, a 3D reconstruction framework combining tensor representations with few-shot learning for NeRF.
arXiv paper demonstrating dual-layer side-channel attacks on local Vision-Language Models exploiting dynamic preprocessing vulnerabilities.
arXiv survey on reinforcement learning applications for infectious disease control and epidemic response optimization.
arXiv paper on physics-guided deep learning for groundwater level prediction using spatio-temporal modeling.
MAGNET: decentralized system for autonomous generation and training of domain-expert language models using autoresearch and BitNet ternary quantization.
Theoretical analysis of simplicity bias in neural networks using minimum description length principle and compression framework.
Investigation of whether LLMs perform genuine in-context molecular property prediction or rely on memorization despite potential training data contamination.
Analysis of activation-based probes for detecting misaligned AI systems, showing blind spots in detecting coherent misalignment versus deception.
DRiffusion: parallel sampling framework accelerating diffusion model inference through draft-and-refine process with skip transitions.
Data-driven framework using wavelet analysis on acoustic emission data to model plastic deformation in metals.
Transformer model with factorized attention to predict defensive coverage assignments in NFL football plays.
Bandit algorithm approach for dynamic regret minimization in unconstrained adversarial linear settings.
Deep learning framework using transformers to predict patient outcomes from EEG while preventing data leakage in survival prediction.
Game-based learning system using adaptive mechanisms to personalize mathematical education for children.
Machine learning for satellite network topology configuration under dynamic orbital movement.
EngineAD real-world multivariate anomaly detection dataset from vehicle fleet sensor telemetry with expert annotations for safety-critical domain.
ARTA joint training framework for adversarially robust multivariate time-series anomaly detection using min-max optimization and information retention.
Theoretical analysis of Minkowski weighted k-means revealing objective as power-mean aggregation of within-cluster dispersions controlled by exponent.
Somax composable Optax-native stack for second-order curvature-aware training with modular APIs for operators, estimators, and preconditioners.
QuitoBench open benchmark for time series forecasting covering eight trend-seasonality-forecastability regimes with regime-balanced dataset design.
GLU framework for sparse spatiotemporal reconstruction and forecasting using global-local-uncertainty fusion with unified state representation.