🚀 DataAgent¶

Data + AI Agent: Enterprise Data Task Solution

🚀 DataAgent is a next-generation enterprise data intelligence platform for Data + AI scenarios, reimagining the entire data engineering pipeline through the Agent paradigm. Deeply integrating NL2SQL, unified semantic layers, and multi-agent collaboration, it delivers end-to-end data analysis and feature mining across financial Q&A, AI for Science, and other core domains.

🌟 Why DataAgent¶

🏆 Scenario Advantages¶

Scenario	Traditional Approach	The DataAgent Edge	Typical Applications
📊 Financial Q&A	Business request → data team queue → manual SQL → manual verification; T+1 is the norm for a single metric query	NL2SQL four-stage pipeline (Perception→Generation→Validation→Reflection), natural language to instant answers. Semantic metric mapping, 74%+ execution accuracy on BIRD DEV benchmark, sub-second response	✅ Enterprise financial analytics assistant
🔬 AI for Science	Multi-source scientific data scattered everywhere; cross-database correlation requires manual exports; literature and data cannot be jointly queried	Multi-source federated queries + structured/unstructured joint retrieval, natural-language-driven scientific data exploration	✅ Scientific data exploration platform

⚡ Core Capabilities¶

Capability	Description
🧠 NL2SQL Intelligent Engine	Four-stage pipeline: Perceptor→Generator→Validator→Reflector; multi-strategy fusion: Prompt / ICL / Skeleton / DC; supports SQLite / MySQL / PostgreSQL / Hive; 74%+ execution accuracy on BIRD benchmark
🔬 Automated Feature Engineering	Agents autonomously explore relationships across hundreds of tables, auto-discover latent feature combinations with importance ranking and visualization — 10x+ efficiency boost
🏭 Full-Pipeline Data Factory	Data ingestion→Schema perception→Feature mining→Model training→Report generation — one YAML config runs the complete data engineering pipeline
🧩 Unified Semantic Layer	Prioritizes GaussVector as an enhanced vector retrieval foundation in the semantic layer, turning tables, columns, metric definitions, and business descriptions into retrievable schema signals for NL2SQL and multi-source semantic alignment
🔌 Plugin Tool Ecosystem	Local functions / MCP (stdio+sse) / A2A — three tool types with unified registration and invocation. Auto-discovery and on-demand loading. Built-in data analysis SKILLs
📡 Native Multi-Agent Collaboration	Full A2A 1.0 protocol support: automatic agent discovery, capability mapping, standardized communication. Naturally supports distributed collaboration for complex business tasks
🧩 YAML as Agent	Model, tools, memory, workflow, scenario prompts — all declaratively orchestrated. From idea to running Agent in minutes
🛡️ Enterprise Security Sandbox	Workspace isolation + path whitelisting + full audit trail, meeting financial-grade compliance requirements
⚡ Out of the Box	20+ industry scenario example configs — zero code to start, up and running in minutes

🚀 Quick Links¶

📚 Documentation¶

Installation

Choose uv / pip installation, environment setup and model configuration; when databases are needed, continue with Elasticsearch, PostgreSQL, MySQL deployment, prioritized GaussVector integration, scenario data import, and Semantic Service setup.

Start Installation → · Database Installation →
Quick Start

Run examples and quickly get the end-to-end pipeline working.

Quick Start →
Features

Learn about core capabilities, module structure, tools and model support; includes Semantic Service, semantic-layer vector retrieval with prioritized GaussVector support, and openJiuwen.

View Features → · Semantic Service → · openJiuwen →
Architecture

Learn about overall architecture, module relationships and key process design.

View Architecture →
API Design

Learn about key interfaces and integration methods for secondary development.

View API Design →
Use Cases

Build a dedicated NL2SQL Agent, build a data analysis Agent, and related tutorials and best practices.

View Use Cases →
Milestone

View release planning and roadmap.

View Milestone →
Reference

View common references, versions and contribution guidelines.

View Reference →