F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World
F2LLM-v2 multilingual embedding models (80M-14B parameters) supporting 200+ languages with emphasis on low-resource language coverage.
F2LLM-v2 multilingual embedding models (80M-14B parameters) supporting 200+ languages with emphasis on low-resource language coverage.
FinTradeBench benchmark for evaluating LLM reasoning on financial decision-making using company fundamentals and trading signals.
NavTrust benchmark evaluating robustness of embodied navigation agents (VLN and OGN) under real-world data corruptions.
Heuristic methods for constructing restricted decision diagrams to approximate Pareto frontiers in multiobjective optimization.
Framework combining machine learning with automated reasoning for generating and selecting explanations in scientific discovery tasks.
Analysis of LLM-based world models for decision-making in reasoning systems, identifying evaluation gaps and methodological issues.
Framework using LLM agents to simulate decision discourse by representing diverse stakeholder perspectives in complex problem-solving.
Manus AI general-purpose autonomous agent combining LLM reasoning with execution capabilities for complex end-to-end tasks.
Deep reinforcement learning approach for multi-objective combinatorial optimization using conditional computation and preference decomposition.
Multimodal learning framework for solving Generalized Traveling Salesman Problem in robotic task planning.
Single-agent reinforcement learning framework for bus fleet control addressing traffic stochasticity and demand variability.
MMSearch-Plus benchmark for multimodal browsing agents requiring genuine vision-text reasoning and iterative retrieval verification.
CausalARC testbed for evaluating AI reasoning on abstract tasks with limited data and distribution shift using causal world models.
Bayesian evaluation framework replacing Pass@k metric for more stable and reliable LLM reasoning performance assessment.
Theoretical framework for measuring conflicts in random permutation sets using order-dependent uncertainty fusion.
SynBullying dataset uses multiple LLMs to generate synthetic conversational data for cyberbullying detection research.
AgroCoT benchmark evaluates reasoning capabilities of vision-language models for agricultural applications like crop monitoring and pest detection.
Memory Bear system applies cognitive science principles to address LLM memory limitations, hallucinations, and context window constraints.
VR-based discrete-event simulator for school security evaluation using behavioral data.
Certification protocol ensuring consistent semantic understanding between agents using stimulus-meaning model and empirical testing.
Data-centric framework learning optimal verbalization for converting user interaction logs into natural language for LLM-based recommendation systems.
Variable isolation study examining prompt architecture layers enabling LLMs to solve reasoning benchmarks like the car wash problem.
CIRCLE lifecycle framework bridging gap between AI model metrics and real-world deployment outcomes through six-stage evaluation.
AI4S-SDS system combining LLM agents with sparse MCTS and differentiable physics for automated chemical solvent design.
CliqueFlowmer approach for computational materials discovery using neural networks for offline optimization of material properties.
MEMO framework reducing variance in multi-turn multi-agent LLM game evaluations through memory augmentation and context optimization.
MedMASLab unified framework and benchmark for multimodal medical multi-agent systems with standardized integration and cross-specialty evaluation.
SoLA framework for reversible lifelong model editing in LLMs using semantic routing with LoRA modules to prevent knowledge forgetting.
Method to reduce overthinking and underthinking in Large Reasoning Models through balanced token allocation for efficient inference.
VTC-Bench evaluating multimodal LLM agents on complex visual tool composition, addressing limitations in existing tool-use benchmarks.
Hybrid scalar-verbal RL approach for emotional support dialogue systems using user reactions as learning signals instead of expert-defined rewards.
AsgardBench benchmark for evaluating visually-grounded interactive planning and plan adaptation based on visual observations.
Formal proof that safety is non-compositional when combining agents with conjunctive capability dependencies.
ARISE hierarchical RL framework for mathematical reasoning in LLMs that learns reusable strategies across problem instances.
Machine learning approach for predicting and discovering error patterns in vehicle diagnostic trouble codes using temporal sequence analysis.
Study of nonstandard errors in AI coding agents deploying 150 Claude agents on market analysis tasks, showing agent-to-agent variation in analytical choices.
IET framework for attributing multi-agent system outputs to specific agents without execution logs, enabling accountability in agent interactions.
SQLBench benchmark for evaluating Text-to-SQL capabilities of LLMs across sub-tasks, addressing gaps in prompt templates and performance assessment.
Graph learning model for drug-drug interaction prediction addressing generalization and robustness in extreme cases.
Deep learning framework mitigating perception latency in vision-based lane-keeping for autonomous vehicles using imitation learning.
Experimental study measuring how partisan biases in LLMs influence human political opinions and decision-making.
Structured transformer approach for offline model-based optimization combining reinforcement learning and generative modeling for design problems.
Framework addressing limitations of contrastive distillation for 3D representation learning by capturing modality-specific features.
Autoregressive transformer approach for component-based colored SVG generation from text descriptions.
Dataset for hierarchical KPI extraction from earnings filings using iXBRL structured financial documents.
LLM-based index advisor for database optimization using in-context learning to iteratively refine index recommendations.
Equilibrium finding algorithms in polymatrix games under differential privacy constraints with hardness results.
Survey of AI-based detection and mitigation methods for DDoS attacks with taxonomy of attack categories.
Ensemble of language models for automated tumor group classification from unstructured pathology reports in cancer registries.
Federated learning system balancing privacy-utility tradeoffs with incentive mechanisms and heterogeneous resource accommodation across organizations.