Semantic Service User Guide¶
Semantic Service organizes business data, tables, columns, metrics, ontology graphs, and business actions into semantic context that Agents can consume. Its goal is not to replace model reasoning, but to provide more reliable business semantics, data semantics, and query boundaries before inference, so Agents know what data is available, what fields mean, how objects relate, and which queries or actions are valid. The current open-source release focuses on data semantics required for NL2SQL.
At this stage, Semantic Service mainly provides MetaVisor enriched metadata and prioritizes GaussVector-oriented semantic-layer enhancements for vector indexing, recall ranking, and schema perception across table descriptions, column descriptions, metric definitions, and business keywords, helping NL2SQL recall and align candidate schemas before SQL generation; Ontology service capabilities are under development.
- MetaVisor enriched metadata: Currently available and used only for NL2SQL—both standalone NL2SQL Agents and main Agents calling NL2SQL sub-agents via
nl2sql_sub_agent_tool. With prioritized GaussVector support, MetaVisor can turn business metadata into recallable, rankable, and reusable schema semantic indexes, giving the semantic layer stronger candidate schema discovery. - Ontology service: Under development. The goal is to provide ontology/knowledge-graph semantics for business objects, relationships, attributes, paths, statistics, and server-side actions. Related services, Skills, CLI, and integration examples will be added when the capability stabilizes.
They solve different problems: MetaVisor helps NL2SQL understand where data lives, what fields mean, and how tables join; the planned Ontology service will help Agents understand what business objects are, how they relate, and which attributes, paths, statistics, or actions can be queried.
1. Capability Boundaries¶
| Component | Recommended Entry | Main Role |
|---|---|---|
| MetaVisor | NL2SQL Agent / nl2sql_sub_agent_tool |
Provides schema, field semantics, table relationships, and value matching for NL2SQL |
| Ontology service | Under development | Planned: ontology/knowledge-graph schema discovery, entity/relationship queries, path queries, statistical aggregation, and server-side action queries |
MetaVisor gives NL2SQL reliable data semantics before SQL generation: tables, columns, descriptions, types, sample values, and join relationships. The planned Ontology service will provide a business knowledge-graph view so Agents have structured ontology grounding for object relationships, business rules, path retrieval, aggregation, or server-side actions—instead of guessing entity labels, relationship names, attribute names, or UUIDs.
2. Overall Usage Patterns¶
In Agents, the Semantic Service can be used in three forms based on current and planned capabilities:
| Pattern | Suitable Scenario | Recommended Entry |
|---|---|---|
| Standalone NL2SQL Agent | The user question is natural language to SQL; the Agent only handles schema perception, SQL generation, execution, and result return | AGENT_CONFIG.type: "nl2sql" + METAVISOR |
| Main Agent calling NL2SQL sub-agent | The main Agent plans the overall task and delegates NL2SQL only when a database query is needed | nl2sql_sub_agent_tool + main Agent DATABASE / METAVISOR |
| Ontology/graph queries | Tasks require querying business objects, relationships, attributes, paths, statistics, or server-side actions | Under development |
A complex data task may combine these capabilities, but keep responsibilities clear:
- The main Agent understands user goals, decomposes tasks, and organizes the final answer.
- SQL questions go to the NL2SQL Agent or
nl2sql_sub_agent_tool, with MetaVisor supplying data semantics. - Ontology/graph questions belong to the under-development Ontology service; until integration stabilizes, have users provide explicit objects and constraints before follow-up queries.
- Do not let the main Agent guess fields, table relationships, ontology labels, attribute names, or UUIDs.
3. MetaVisor Enriched Metadata¶
MetaVisor targets structured data. Its core value is turning database tables and columns into semantic context that models can understand and validate. In the NL2SQL flow, it mainly provides:
- Table-level semantics: table names, descriptions, business meaning.
- Column-level semantics: column names, descriptions, types, sample values.
- Relationship semantics: which tables can join and what the join keys are.
- Value matching: whether literals in SQL exist; if not, recall similar values.
- GaussVector semantic-index enhancement: GaussVector carries vectors for table descriptions, column descriptions, metric definitions, and business keywords, improving candidate schema recall and strengthening semantic matching during NL2SQL perception.
In this project, MetaVisor is not described as a general ReAct perception tool—it is part of NL2SQL only.
3.1 Semantic-Layer Enhancements with Prioritized GaussVector Support¶
In the semantic layer, GaussVector is the prioritized vector retrieval enhancement. MetaVisor embeds tables, columns, metric definitions, business descriptions, and related semantic text, then stores and retrieves those vectorized semantic assets through GaussVector. During schema perception, NL2SQL searches semantic indexes with the user question and extracted keywords to recall candidate tables, candidate columns, and table descriptions. These candidates are combined with join relationships, column types, and value matching before SQL generation.
The GaussVector-oriented enhancements upgrade business metadata from static documentation into searchable, rankable, and reusable semantic assets, giving natural-language data queries more stable candidate schema recall and reducing the need for the model to guess table or column names.
3.2 Using MetaVisor in an NL2SQL Agent¶
When AGENT_CONFIG.type: "nl2sql", the NL2SQL Perceptor reads METAVISOR.metavisor_url to fetch schema and join info from MetaVisor; the Validator reads METAVISOR.valuematch_url for value matching when metadata_match is enabled.
Key configuration for a dedicated NL2SQL Agent:
AGENT_CONFIG:
type: "nl2sql"
DATABASE:
db_id: "<your_db_id>"
engine: "sqlite"
config:
path: "/path/to/your.sqlite"
METAVISOR:
metavisor_url: "http://host:32000"
valuematch_url: "host:8000"
For full configuration, run commands, and troubleshooting, see Build a Dedicated NL2SQL Agent.
3.3 Using MetaVisor in an NL2SQL Sub-Agent¶
For a general ReAct main Agent that calls NL2SQL only when SQL is needed, register nl2sql_sub_agent_tool. This tool reads the built-in source config:
dataagent/agents/nl2sql/nl2sql_agent.yaml
At runtime it overwrites the temporary NL2SQL sub-agent YAML with DATABASE and METAVISOR from the main Agent config, then starts the NL2SQL sub-agent via sub_agent_tool. So the main Agent needs its own DATABASE and METAVISOR; you do not edit the source NL2SQL YAML directly.
The main Agent only needs three configuration areas:
| Configuration | Role |
|---|---|
TOOLS.local_functions[].function: nl2sql_sub_agent_tool |
Register the NL2SQL sub-agent tool. |
DATABASE |
Business database for the main Agent; overwritten onto the NL2SQL sub-agent at runtime. |
METAVISOR |
Enriched metadata service; overwritten onto the NL2SQL sub-agent at runtime. |
nl2sql_sub_agent_tool does three key things:
- Reads
dataagent/agents/nl2sql/nl2sql_agent.yamlas the NL2SQL sub-agent base config. - Reads
DATABASEandMETAVISORfrom the main Agentconfig_managerand overwrites the temporary sub-agent YAML. - If
config.llm_modelis set on the tool, readsMODEL.<llm_model>from the main Agent and writes it into the sub-agent config.
So the main Agent config is the runtime source of truth for the NL2SQL sub-agent:
| Main Agent Config | Effect on Sub-Agent |
|---|---|
DATABASE |
Overwrites NL2SQL sub-agent database config |
METAVISOR |
Overwrites NL2SQL sub-agent enriched metadata config |
TOOLS.local_functions[].config.llm_model |
Binds the model slot used by the sub-agent |
MODEL.<llm_model> |
Written into the temporary sub-agent YAML MODEL |
For full main Agent YAML, tool parameters, run instructions, and troubleshooting, see Build a Data Analysis Agent.
3.4 Capabilities MetaVisor Provides to NL2SQL¶
NL2SQL calls MetaVisor and ValueMatch via dataagent/agents/nl2sql/utils/metavisor_client.py:
| Capability | Role |
|---|---|
get_table_list(db) |
List tables and descriptions in the database. |
get_table_columns_info(table_name) |
Get column names, descriptions, types, and sample values. |
get_joinable_tables(table_names) |
Get joinable column relationships between tables. |
semantic_search_column(db, keywords, top_k) |
Recall relevant columns by keyword semantics. |
vector_search_table_desc(db, keywords, top_k) |
Recall relevant tables by table-description vectors. |
semantic_search_tables(db, keywords, top_k) |
Recall relevant tables by keywords. |
check_value_exist(db, val) |
Check whether a literal in SQL exists. |
check_value_match(db, table, column, question, top_k) |
Recall similar values in a column to correct SQL literals. |
By default, if no fixed user_schema is provided, the NL2SQL Perceptor fetches schema from MetaVisor and converts tables, columns, and join info into model-readable SQL context.
4. Ontology Capabilities¶
The Ontology service targets business knowledge graphs. Unlike MetaVisor, ontology is not for SQL table/column context—it expresses business objects, relationships, attribute constraints, path rules, metric definitions, and server-side actions.
Ontology is an under-development Semantic Service capability. Service implementation, ontology_service Skill, CLI scripts, and query examples will be added when the capability stabilizes; this document describes goals and integration boundaries for now.
Planned Ontology capabilities include:
| Capability | Description |
|---|---|
| Schema discovery | Query entity types, relationship types, node attributes, and edge attributes in the current scenario. |
| Entity queries | List node instances by object type, or query node details by UUID. |
| Relationship queries | Query relationship types, edge instances, and one-hop relationships from source/target. |
| Attribute filtering | Filter nodes or edges by attribute conditions (name contains, numeric range, enum match, etc.). |
| Attribute explanation | Query attribute names, meanings, and values on nodes or edges to help Agents understand field semantics. |
| Path queries | Multi-hop queries, subgraph queries, or source-relationship-target pattern queries. |
| Statistical aggregation | Count, sort, and aggregate nodes or edges that match conditions. |
| Server-side actions | Query declared server actions and execute them once parameters are clear. |
4.1 Planned Integration Pattern¶
When the ontology service is open-sourced or integrated, expose it to the main Agent via deterministic tools or Skills—not by letting the model guess ontology labels, attribute names, UUIDs, or action parameters. Recommended flow:
- Discover entity types, relationship types, and queryable attributes in the business scenario.
- Parse candidate business objects, relationships, and filters from the user question.
- Confirm object identifiers, attribute meanings, and relationship boundaries in the ontology service.
- Run relationship queries, path queries, aggregation, or server-side actions on confirmed objects.
- Return query basis and results to the main Agent for answers or as business constraints for follow-up NL2SQL queries.
This flow is a design direction under development. Commands, environment variables, and service URLs will be documented when the ontology capability stabilizes.
5. Capability Selection Guide¶
In production Agents, choose Semantic Service capabilities by task type:
| User Question Type | Recommended Approach |
|---|---|
| “Query a business table and aggregate metrics” | Main Agent calls nl2sql_sub_agent_tool; MetaVisor supplies schema and join info to NL2SQL. |
| “What related objects does this business object have?” | Under-development Ontology scenario; for now, have users provide explicit objects and relationship constraints. |
| “Confirm business objects first, then query table statistics” | Use business rules or manual constraints to clarify objects, then hand the query to the NL2SQL sub-agent; automatic ontology confirmation is planned later. |
| “Natural language to SQL only” | Use an NL2SQL Agent with type: "nl2sql". |
Full walkthrough tutorials:
6. Configuration Checklist¶
- For standalone NL2SQL, confirm
AGENT_CONFIG.type: "nl2sql". - When the main Agent calls an NL2SQL sub-agent, confirm
nl2sql_sub_agent_toolis registered—not the genericsub_agent_tool. - Put MetaVisor config under
METAVISORon the runtime Agent; for sub-agent scenarios, put it in the main Agent YAML. DATABASE.db_idmust match the database name registered in MetaVisor.METAVISOR.metavisor_urlshould behttp://host:port.METAVISOR.valuematch_urlshould behost:port.- Ontology/knowledge-graph query capabilities are under development; configure
ontology_service,ONTOLOGY_URL, orSCENEafter integration stabilizes.
7. Related Code and Examples¶
- NL2SQL Agent config:
dataagent/agents/nl2sql/nl2sql_agent.yaml - NL2SQL Perceptor:
dataagent/agents/nl2sql/nodes/perceptor.py - MetaVisor / ValueMatch client:
dataagent/agents/nl2sql/utils/metavisor_client.py - NL2SQL Validator:
dataagent/agents/nl2sql/nodes/validator.py - NL2SQL sub-agent tool:
dataagent/actions/tools/local_tool/tools.py - Main Agent example calling NL2SQL sub-agent:
dataagent/core/flex/examples/nl2sql_flex_e2e_subagent.yaml