Ask HN: How do you programmatically evaluate if an LLM sounds "too AI"?
Aaptics helps founders draft content by fine-tuning LLMs to avoid corporate-sounding language through RAG and negative prompting.
Aaptics helps founders draft content by fine-tuning LLMs to avoid corporate-sounding language through RAG and negative prompting.
kbot is an open-source terminal AI agent with 23 agents, 290 tools, and 20 providers. Multi-model, local-first, works with MCP-compatible IDEs.
Benchmark evaluating multimodal LLMs' ability to process discrete symbols like math formulas and chemical structures, addressing gap in symbol understanding.
Introduces PRISM for intent-based persona routing in LLMs, improving both alignment and accuracy in multi-agent systems through selective persona application.
Proposes correlation-weighted multi-reward optimization to improve compositional generation in text-to-image models by reducing concept interference.
Studies how reasonably reasoning AI agents can avoid game-theoretic failures in interactive economic environments without post-training alignment methods.
Presents CAPSUL benchmark dataset for protein subcellular localization with 3D structural information for structure-based ML models.
Proposes Interplay, training independent simulators for conversational recommendation systems to generate reference-free dialogue data at scale.
Proposes MedForge for interpretable medical deepfake detection using MLLMs with explainable forgery-aware reasoning for healthcare applications.
Introduces ZebraArena, a procedurally generated diagnostic environment for evaluating reasoning-action coupling in tool-augmented LLMs with minimal dataset contamination.
Presents AFS-Search for text-to-image generation using agentic flow steering and parallel rollout search to improve spatial reasoning and reduce error accumulation.
Introduces D-Mem, a dual-process memory system for LLM agents enabling high-fidelity memory access for long-horizon reasoning and autonomous operation.
Discusses governance frameworks for synthetic minds and AI regulation, focusing on conceptual foundations beyond tool-centric approaches.
Proposes SCALe method to improve chain-of-thought training in vision-language models by addressing token imbalance between reasoning traces and answer segments.
Benchmark and policy optimization for visual-text geometric reasoning with dynamic construction. Addresses strategic diagram generation in multimodal LLM agents.
Memory-augmented attention layer inspired by Global Workspace Theory for contextualization. Cognitive model-based improvements to multi-head attention mechanisms.
Sparse attention architecture for multi-channel time series forecasting. Machine learning for finance/supply chain, not LLM or agent-focused.
Multi-agent memory coordination framework optimizing construction, retrieval, and utilization cycles. Applies multi-agent reasoning to improve memory-augmented LLM agent performance.
Analysis of dialect-sensitive stereotypes in single and multi-agent LLM architectures. Studies bias variation across Standard American and African-American English inputs.
LLM agent system that autonomously designs task-specific agents through memory-based RL and stateful prompts. Meta-agent framework with skill-based continual learning.
Method for concept unlearning in text-to-image diffusion models beyond keyword-based approaches. Addresses selective content removal from generative models.
Workshop proceedings on Theory of Mind in AI research. Collection of papers on cognitive modeling and AI understanding.
Policy optimization technique for diffusion LLMs reducing trajectory computation cost. Improves efficiency of preference alignment in generative language models.
Evaluation of LLM capability to generate novel mathematical research problems. Studies mathematical creativity and problem generation in language models.
Service architecture for distributed RL training of multi-turn LLM agents. Decouples rollout orchestration from training for scalable agent development.
Topology-aware reward propagation for RL training of LLM agents. Addresses sparse reward problem in agentic LLM reasoning with graph-based methods.
Multi-agent path finding algorithm with asynchronous action support. Graph search problem unrelated to LLMs or AI agents.
DRL framework for UAV network deployment in vehicular networks. Reinforcement learning application outside core AI/LLM focus areas.
Study analyzing how ChatGPT represents and reasons about geographic knowledge. Evaluates factual reasoning and world modeling in LLMs.
Research on LLM mathematical reasoning with formal expression derivation. Addresses structured reasoning in STEM via language models.
Develops quantitative introspection methods inspired by psychology to track internal state changes in LLMs across conversations using numeric self-report.
Evaluates whether multi-agent LLM governance systems follow institutional rules when granted authority, finding integrity requires pre-deployment safeguards.
Studies cross-model alignment of LLM representations for downstream objectives with applications in privacy-preserving and security-constrained settings.
Research manifesto proposing Agentic Business Process Management paradigm extending BPM for governing autonomous agents executing organizational processes.
Extends structural causal models with intentional interventions operator for teleological inference about goal-directed agent behavior in causal systems.
Evaluates PPS (5W3H-based structured prompting framework) for reducing intent transmission loss between users and LLMs across business, technical, and travel domains.
GAN-based simulation framework measuring racial bias propagation in predictive policing systems across multiple US cities with temporal analysis.
Uses Stochastic Gumbel AlphaZero to evaluate difficulty in Tetris Block Puzzle variants, applying game-playing AI as evaluator for puzzle design.
Analyzes online resource allocation among interacting modules with endogenous costs under uniform, gated, and competitive allocation paradigms with regret bounds.
Stability Monitor: behavioral fingerprinting system tracking LLM endpoint identity changes from model updates, quantization, inference engines beyond traditional uptime metrics.
Evaluates whether cross-domain mapping interventions increase creativity equally in humans and LLMs through product feature generation experiments.
LuMamba: self-supervised Mamba architecture for EEG modeling with topology-invariant electrode handling and improved computational efficiency over Transformers.
Studies how uncertainty estimation scales with parallel sampling in reasoning models using self-consistency and verbalized confidence across mathematics and STEM tasks.
Large-scale trace-level study showing multi-pass LLM reasoning in binary vulnerability analysis exhibits structured, token-level exploration patterns across hundreds of steps.
D5P4: generalized beam-search framework using determinantal point processes for diverse parallel decoding in discrete diffusion text generation models.
cuGenOpt: GPU-accelerated metaheuristic framework for combinatorial optimization balancing generality, performance, and usability across logistics and scheduling problems.
Box Maze framework enforces LLM reasoning integrity through process-control architecture to mitigate hallucination and unreliable reasoning under adversarial prompting.
OS-Themis: scalable multi-agent critic framework using decomposed trajectory milestones for training robust GUI agents with reinforcement learning.
Uses optimal transport as alignment objective for fine-tuning multilingual contextualized embeddings to improve cross-lingual word representations.
Comparative study evaluating whether LLMs demonstrate Theory of Mind capabilities using psychological paradigms.