Semantic Service Deployment Guide¶
Semantic Service is the metadata service used by NL2SQL. It does not store real business data; it stores which tables exist in your database, what columns mean, how tables join, and which fields support metric aggregation.
In short:
- Real business data stays in SQLite, MySQL, PostgreSQL, and other databases.
- Semantic Service stores the "database manual" for models.
- Before querying, DataAgent / NL2SQL Agent fetches tables, columns, join relationships, and value-matching information from Semantic Service.
If you want to understand NL2SQL first, read Build a Dedicated NL2SQL Agent. When you are ready to connect your own database, deploy Semantic Service using this guide.
If you are following the database installation guides from the start, complete these first:
- Pull Docker Images: prepare Docker images.
- Deploy Database Services: start MySQL, PostgreSQL, Elasticsearch, and other base services.
- Scenario Data Import: optional sample business data.
This guide follows those steps and answers "how does NL2SQL know what tables and columns mean?" Earlier guides prepare real data and base services; this one prepares the Semantic Service metadata service.
1. What You Need at the End¶
After deployment, you need two service URLs in Agent YAML:
METAVISOR:
metavisor_url: "http://localhost:32000"
valuematch_url: "http://localhost:8000"
Also ensure Agent database settings match metadata imported into Semantic Service:
DATABASE:
db_id: "sales_db"
engine: "sqlite"
config:
path: "/path/to/sales.sqlite"
Three fields matter most:
| Field | Plain-language meaning |
|---|---|
DATABASE.db_id |
Name of this database registered in Semantic Service, for example sales_db. |
DATABASE.engine |
Real database type, for example sqlite, mysql, or postgres. |
METAVISOR.metavisor_url |
Semantic Service URL; NL2SQL reads table and column descriptions through it. |
db_id and engine must match imported Semantic Service metadata; otherwise NL2SQL cannot resolve tables and columns.
2. Overall Flow¶
The integration breaks down into five steps:
1. 准备 PostgreSQL
用来存放 Semantic Service 的元数据
2. 启动 Semantic Service
提供 HTTP 接口,供 DataAgent 查询元数据
3. 验证服务可访问
curl 健康检查返回 HTTP 200
4. 导入业务库元数据
告诉 Semantic Service:有哪些表、字段、关系
5. 修改 Agent YAML
配置 DATABASE 和 METAVISOR,然后运行 NL2SQL
This guide follows that order.
3. Prepare the Environment¶
You need:
| Tool | Purpose | Recommendation |
|---|---|---|
| Docker | Start PostgreSQL quickly | Docker reduces installation friction. |
| Java 8+ | Run Semantic Service | Check with java -version. |
| Semantic Service distribution | The service itself | semantic-layer-0.1.0.tar.gz (download in section 6). |
| A business database | Data source NL2SQL queries | Start with a SQLite file if you are new. |
Check Java:
java -version
If Java is missing, install Java 8 or newer.
4. Start PostgreSQL¶
Semantic Service needs a database for metadata and vector indexes. The default deployment recommends PostgreSQL with pgvector; in enterprise environments, you can also connect Semantic Service to an existing GaussVector database as the semantic-retrieval backend:
docker run -d \
--name semantic-service-pg \
-e POSTGRES_USER=postgres \
-e POSTGRES_PASSWORD=postgres \
-e POSTGRES_DB=semantic_layer \
-p 54321:5432 \
pgvector/pgvector:pg16
This starts PostgreSQL with:
| Setting | Value |
|---|---|
| Database name | semantic_layer |
| Username | postgres |
| Password | postgres |
| Host port | 54321 |
Verify connectivity:
PGPASSWORD=postgres psql -h localhost -p 54321 -U postgres -d semantic_layer -c "SELECT version();"
If you see the PostgreSQL version, the database is up.
4.1 Enable Required Extensions¶
Run:
PGPASSWORD=postgres psql -h localhost -p 54321 -U postgres -d semantic_layer -c 'CREATE EXTENSION IF NOT EXISTS "uuid-ossp";'
PGPASSWORD=postgres psql -h localhost -p 54321 -U postgres -d semantic_layer -c 'CREATE EXTENSION IF NOT EXISTS vector;'
PGPASSWORD=postgres psql -h localhost -p 54321 -U postgres -d semantic_layer -c 'CREATE EXTENSION IF NOT EXISTS pg_trgm;'
These extensions support UUIDs, vector search, and fuzzy text search. You only need to confirm the commands succeed. If you use GaussVector, make sure the target database has the required vector type, distance operators, and index capabilities enabled, and put the connection details in the Semantic Service configuration.
5. Initialize Semantic Service Schema¶
The Semantic Service distribution usually includes initialization SQL named like:
create_semantic_layer.sql
Find that file in the distribution, then run:
PGPASSWORD=postgres psql -h localhost -p 54321 -U postgres -d semantic_layer \
-f /path/to/create_semantic_layer.sql
Replace /path/to/create_semantic_layer.sql with the path on your machine.
Verify tables were created:
PGPASSWORD=postgres psql -h localhost -p 54321 -U postgres -d semantic_layer -c "\dt"
If you see tables such as data_table, data_column, semantic_term, and sql_process, initialization succeeded.
Warning
Some initialization scripts drop and recreate the database. Read the SQL before running it in production so you do not delete existing data.
6. Extract and Configure Semantic Service¶
Download and extract semantic-layer-0.1.0.tar.gz:
wget https://datagallery.obs.cn-southwest-2.myhuaweicloud.com/semantic-service/semantic-layer-0.1.0.tar.gz
mkdir -p ~/semantic-service
tar xzf semantic-layer-0.1.0.tar.gz -C ~/semantic-service
cd ~/semantic-service/semantic-layer-0.1.0
Typical layout:
semantic-layer-0.1.0/
├── bin/
│ ├── start.sh
│ └── stop.sh
├── conf/
│ └── semantic-service-application.properties
├── lib/
└── webapp/
Edit:
conf/semantic-service-application.properties
Configure the database connection first:
semantic_service.db.url=jdbc:postgresql://localhost:54321/semantic_layer
semantic_service.db.user=postgres
semantic_service.db.password=postgres
If PostgreSQL is not on localhost, change localhost:54321 to the actual host and port.
7. Vector Model Setup¶
Semantic Service can use embedding models for semantic search. The generated vectors can be stored in pgvector, GaussVector, or a compatible vector backend for table-description, column-description, and semantic-keyword recall. On first deploy, if you have no model yet, disable vector features and bring the service up first:
semantic_service.vector.embedding.service.enable=false
This simplifies deployment and is enough to validate the service and metadata import flow.
When a local model is ready, configure its path:
semantic_service.vector.embedding.model.name=BAAI/bge-base-zh-v1.5
semantic_service.vector.embedding.model.path=/opt/models/BAAI/bge-base-zh-v1.5
semantic_service.vector.embedding.dimensions=768
semantic_service.vector.embedding.cache.size=1000
Model output dimension, the configured dimensions, and the database vector(N) or GaussVector vector-field dimension must match.
8. Start Semantic Service¶
From the extracted service directory:
./bin/start.sh -p 32000
Port 32000 is an example; use any free port.
Stop the service:
./bin/stop.sh
If startup fails, check whether the port is in use:
lsof -nP -iTCP:32000 -sTCP:LISTEN
9. Verify the Service¶
Run:
curl -sS -o /dev/null -w "HTTP %{http_code}\n" \
http://localhost:32000/api/metaVisor/v3/types/typedefs
Expected:
HTTP 200
If the status is not 200, check logs:
tail -100 logs/jetty-console.log
tail -100 logs/application.log
Common causes:
- Wrong PostgreSQL host, username, or password.
- Initialization SQL not run; tables missing.
- Port already in use.
- Incorrect vector model path.
10. Import Business Database Metadata¶
After the service starts, import the "business database manual." Without it, NL2SQL can reach Semantic Service but cannot see your tables and columns.
Semantic Service stores metadata, not business rows. For example, if your database has an orders table, tell Semantic Service:
- Which database it belongs to:
sales_db - Table name:
orders - Columns:
order_id,order_amount,order_time - What each column means
- How tables join
10.1 Minimal Table Metadata Example¶
Minimal example for one table:
{
"entities": [
{
"typeName": "data_table",
"attributes": {
"qualifiedName": "sales_db.orders@sqlite",
"databaseName": "sales_db",
"schemaName": "main",
"tableName": "orders",
"tableNameEn": "orders",
"sourceType": "sqlite",
"llmContext": "订单表,记录订单金额、下单时间和客户信息",
"status": "Active"
}
}
],
"relationships": []
}
Save as metadata.json and import:
curl -X POST \
-H "Content-Type: application/json" \
-d @metadata.json \
"http://localhost:32000/api/metaVisor/v3/entity/bulk"
10.2 Import Columns and Relationships Too¶
Tables alone are not enough. Column descriptions and relationships make NL2SQL useful.
Column entities are usually data_column; table–column links use table_has_column. Table joins use table_join_relationship.
Naming convention:
| Object | qualifiedName example |
|---|---|
| Table | sales_db.orders@sqlite |
| Column | sales_db.orders.order_amount@sqlite |
Describe business meaning clearly in column metadata. Example:
{
"typeName": "data_column",
"attributes": {
"qualifiedName": "sales_db.orders.order_amount@sqlite",
"databaseName": "sales_db",
"tableNameEn": "orders",
"columnNameEn": "order_amount",
"sourceType": "sqlite",
"value_type": "number",
"llmContext": "订单实付金额,单位为元,可用于销售额、GMV 等统计",
"status": "Active"
}
}
11. Align with DataAgent Configuration¶
The most common mistake is mismatched names.
If DataAgent YAML has:
DATABASE:
db_id: "sales_db"
engine: "sqlite"
config:
path: "/path/to/sales.sqlite"
Semantic Service metadata should satisfy:
| DataAgent | Semantic Service metadata |
|---|---|
db_id: sales_db |
databaseName: "sales_db" |
engine: sqlite |
sourceType: "sqlite" |
Table orders |
tableNameEn: "orders" |
| SQLite engine | qualifiedName suffix @sqlite |
The SQLite file path lives only in DataAgent YAML; Semantic Service does not store .sqlite paths.
12. Use in DataAgent¶
After deployment and metadata import, configure NL2SQL Agent or main Agent YAML:
DATABASE:
db_id: "sales_db"
engine: "sqlite"
config:
path: "/path/to/sales.sqlite"
METAVISOR:
metavisor_url: "http://localhost:32000"
valuematch_url: "http://localhost:8000"
Then follow the case tutorials:
13. Common Issues¶
13.1 curl does not return HTTP 200¶
Check:
- Service is running.
- Port is correct.
- PostgreSQL is reachable.
- Logs for database or model errors.
13.2 NL2SQL cannot find tables¶
Usually config and metadata are misaligned. Verify:
DATABASE.db_idequals metadatadatabaseName.DATABASE.engineequals metadatasourceType.qualifiedNamesuffix is correct, for example@sqlitefor SQLite.- Table and column names match the real database.
13.3 Generated SQL does not match business definitions¶
Enrich field and metric semantics:
- Write clear meanings in
data_column.llmContext. - Add units, aggregation rules, and filters for metric fields.
- Add
table_join_relationshipfor common joins.
13.4 Vector model errors¶
To get deployment working first, set:
semantic_service.vector.embedding.service.enable=false
After the service and metadata import work, configure a local model.
14. Checklist¶
- [ ] PostgreSQL container is running.
- [ ]
semantic_layerdatabase is reachable. - [ ] Initialization SQL has been executed.
- [ ] JDBC URL in Semantic Service config is correct.
- [ ]
/api/metaVisor/v3/types/typedefsreturns HTTP 200. - [ ] At least one table and its column metadata are imported.
- [ ] DataAgent
DATABASE.db_idmatches metadatadatabaseName. - [ ] DataAgent
DATABASE.enginematches metadatasourceType. - [ ] DataAgent
METAVISOR.metavisor_urlpoints to Semantic Service.