Import Scenario Data
Scenario Data and Metadata Preparation¶
This is step 3 of the database installation flow. It prepares retail sample data and Semantic Service metadata for NL2SQL tutorials. When you finish, you will have:
- A local SQLite business database:
~/demo_retail.sqlite - A metadata file ready for Semantic Service import:
~/demo_seed_sqlite.json
If you only need to start MySQL / PostgreSQL / Elasticsearch, read Database Service Deployment first. If Semantic Service is not running yet, complete Semantic Service Deployment Guide, then return here to import metadata.
1. Sample Schema Overview¶
The sample uses two tables:
| Table | Role | Key columns |
|---|---|---|
retail_customers |
Customer dimension | customer_id, customer_name, city |
retail_orders |
Order fact table | order_id, customer_id, order_amount, order_date |
The tables join on retail_orders.customer_id = retail_customers.customer_id. This schema covers common NL2SQL questions such as:
- Total transaction amount.
- GMV by city.
- Recent orders.
- Order count per customer.
2. Generate SQLite Sample Data¶
The script below uses only the Python standard library and creates demo_retail.sqlite in your home directory.
cat > ~/create_demo_retail.py << 'EOF'
import sqlite3
from pathlib import Path
db_path = Path.home() / "demo_retail.sqlite"
if db_path.exists():
db_path.unlink()
conn = sqlite3.connect(db_path)
cur = conn.cursor()
cur.executescript(
"""
CREATE TABLE retail_customers (
customer_id TEXT PRIMARY KEY,
customer_name TEXT NOT NULL,
city TEXT NOT NULL
);
CREATE TABLE retail_orders (
order_id TEXT PRIMARY KEY,
customer_id TEXT NOT NULL,
order_amount REAL NOT NULL,
order_date TEXT NOT NULL,
FOREIGN KEY (customer_id) REFERENCES retail_customers(customer_id)
);
"""
)
customers = [
("C001", "Alice Zhang", "Beijing"),
("C002", "Bob Li", "Shanghai"),
("C003", "Carol Wang", "Shenzhen"),
("C004", "David Chen", "Beijing"),
("C005", "Eva Zhao", "Hangzhou"),
]
orders = [
("O1001", "C001", 1280.50, "2025-01-03"),
("O1002", "C002", 860.00, "2025-01-05"),
("O1003", "C001", 320.00, "2025-01-08"),
("O1004", "C003", 2199.00, "2025-01-11"),
("O1005", "C004", 640.00, "2025-01-12"),
("O1006", "C005", 1580.75, "2025-01-16"),
("O1007", "C002", 450.00, "2025-01-18"),
("O1008", "C004", 980.20, "2025-01-21"),
]
cur.executemany("INSERT INTO retail_customers VALUES (?, ?, ?)", customers)
cur.executemany("INSERT INTO retail_orders VALUES (?, ?, ?, ?)", orders)
conn.commit()
for table in ("retail_customers", "retail_orders"):
count = cur.execute(f"SELECT COUNT(*) FROM {table}").fetchone()[0]
print(f"{table}: {count} rows")
conn.close()
print(f"created: {db_path}")
EOF
python ~/create_demo_retail.py
Verify the data:
python - << 'EOF'
import sqlite3
from pathlib import Path
db_path = Path.home() / "demo_retail.sqlite"
conn = sqlite3.connect(db_path)
for row in conn.execute(
"""
SELECT c.city, ROUND(SUM(o.order_amount), 2) AS gmv
FROM retail_orders o
JOIN retail_customers c ON o.customer_id = c.customer_id
GROUP BY c.city
ORDER BY gmv DESC
"""
):
print(row)
conn.close()
EOF
If you see city and GMV aggregates, the business data is ready.
3. Generate Semantic Service Metadata¶
Business data tells the database which tables and rows exist; it does not yet tell NL2SQL what they mean. The generated demo_seed_sqlite.json describes:
- Table semantics: order fact table, customer dimension.
- Column semantics: order amount, order date, customer city, and so on.
- Table–column ownership: which columns belong to which table.
- Join relationships: how orders link to customers.
- Golden SQL example: standard SQL for GMV by city.
cat > ~/demo_seed_sqlite.json << 'EOF'
{
"entities": [
{
"typeName": "data_table",
"attributes": {
"qualifiedName": "demo_db.retail_customers@sqlite",
"name": "retail_customers",
"databaseName": "demo_db",
"schemaName": "main",
"tableName": "retail_customers",
"tableNameEn": "retail_customers",
"sourceType": "sqlite",
"tableNameCh": "零售客户表",
"tableDescription": "存储客户基础信息",
"llmContext": "客户维表 retail_customers,包含客户 ID、客户姓名和所在城市,可与订单表按 customer_id 关联。",
"layer": "DIM",
"status": "Active"
}
},
{
"typeName": "data_table",
"attributes": {
"qualifiedName": "demo_db.retail_orders@sqlite",
"name": "retail_orders",
"databaseName": "demo_db",
"schemaName": "main",
"tableName": "retail_orders",
"tableNameEn": "retail_orders",
"sourceType": "sqlite",
"tableNameCh": "零售订单表",
"tableDescription": "存储零售订单明细",
"llmContext": "订单事实表 retail_orders,包含订单 ID、客户 ID、订单金额和下单日期,可用于销售额、GMV、订单数等统计。",
"layer": "DWD",
"status": "Active"
}
},
{
"typeName": "data_column",
"attributes": {
"qualifiedName": "demo_db.retail_customers.customer_id@sqlite",
"databaseName": "demo_db",
"tableNameEn": "retail_customers",
"columnNameEn": "customer_id",
"sourceType": "sqlite",
"value_type": "string",
"llmContext": "客户唯一标识,可与 retail_orders.customer_id 关联。",
"status": "Active"
}
},
{
"typeName": "data_column",
"attributes": {
"qualifiedName": "demo_db.retail_customers.customer_name@sqlite",
"databaseName": "demo_db",
"tableNameEn": "retail_customers",
"columnNameEn": "customer_name",
"sourceType": "sqlite",
"value_type": "string",
"llmContext": "客户姓名。",
"status": "Active"
}
},
{
"typeName": "data_column",
"attributes": {
"qualifiedName": "demo_db.retail_customers.city@sqlite",
"databaseName": "demo_db",
"tableNameEn": "retail_customers",
"columnNameEn": "city",
"sourceType": "sqlite",
"value_type": "string",
"llmContext": "客户所在城市,可用于按城市统计订单金额或客户数量。",
"status": "Active"
}
},
{
"typeName": "data_column",
"attributes": {
"qualifiedName": "demo_db.retail_orders.order_id@sqlite",
"databaseName": "demo_db",
"tableNameEn": "retail_orders",
"columnNameEn": "order_id",
"sourceType": "sqlite",
"value_type": "string",
"llmContext": "订单唯一标识。",
"status": "Active"
}
},
{
"typeName": "data_column",
"attributes": {
"qualifiedName": "demo_db.retail_orders.customer_id@sqlite",
"databaseName": "demo_db",
"tableNameEn": "retail_orders",
"columnNameEn": "customer_id",
"sourceType": "sqlite",
"value_type": "string",
"llmContext": "下单客户 ID,可与 retail_customers.customer_id 关联。",
"status": "Active"
}
},
{
"typeName": "data_column",
"attributes": {
"qualifiedName": "demo_db.retail_orders.order_amount@sqlite",
"databaseName": "demo_db",
"tableNameEn": "retail_orders",
"columnNameEn": "order_amount",
"sourceType": "sqlite",
"value_type": "number",
"llmContext": "订单实付金额,单位为元,可用于销售额、GMV、客单价等统计。",
"status": "Active"
}
},
{
"typeName": "data_column",
"attributes": {
"qualifiedName": "demo_db.retail_orders.order_date@sqlite",
"databaseName": "demo_db",
"tableNameEn": "retail_orders",
"columnNameEn": "order_date",
"sourceType": "sqlite",
"value_type": "date",
"llmContext": "订单下单日期,格式为 YYYY-MM-DD,可用于按日、月、年筛选和统计。",
"status": "Active"
}
},
{
"typeName": "sql_process",
"attributes": {
"sqlId": "demo_retail_city_gmv",
"expression": "SELECT c.city, SUM(o.order_amount) AS gmv FROM retail_orders o JOIN retail_customers c ON o.customer_id = c.customer_id GROUP BY c.city ORDER BY gmv DESC",
"relatedTables": ["main.retail_orders", "main.retail_customers"],
"intent": "按城市汇总 GMV",
"query": "各城市成交额排名",
"llmContext": "关联订单表和客户表,按客户城市汇总订单金额。",
"status": "Active"
}
}
],
"relationships": [
{
"typeName": "table_has_column",
"end1": { "typeName": "data_column", "uniqueAttributes": { "qualifiedName": "demo_db.retail_customers.customer_id@sqlite" } },
"end2": { "typeName": "data_table", "uniqueAttributes": { "qualifiedName": "demo_db.retail_customers@sqlite" } }
},
{
"typeName": "table_has_column",
"end1": { "typeName": "data_column", "uniqueAttributes": { "qualifiedName": "demo_db.retail_customers.customer_name@sqlite" } },
"end2": { "typeName": "data_table", "uniqueAttributes": { "qualifiedName": "demo_db.retail_customers@sqlite" } }
},
{
"typeName": "table_has_column",
"end1": { "typeName": "data_column", "uniqueAttributes": { "qualifiedName": "demo_db.retail_customers.city@sqlite" } },
"end2": { "typeName": "data_table", "uniqueAttributes": { "qualifiedName": "demo_db.retail_customers@sqlite" } }
},
{
"typeName": "table_has_column",
"end1": { "typeName": "data_column", "uniqueAttributes": { "qualifiedName": "demo_db.retail_orders.order_id@sqlite" } },
"end2": { "typeName": "data_table", "uniqueAttributes": { "qualifiedName": "demo_db.retail_orders@sqlite" } }
},
{
"typeName": "table_has_column",
"end1": { "typeName": "data_column", "uniqueAttributes": { "qualifiedName": "demo_db.retail_orders.customer_id@sqlite" } },
"end2": { "typeName": "data_table", "uniqueAttributes": { "qualifiedName": "demo_db.retail_orders@sqlite" } }
},
{
"typeName": "table_has_column",
"end1": { "typeName": "data_column", "uniqueAttributes": { "qualifiedName": "demo_db.retail_orders.order_amount@sqlite" } },
"end2": { "typeName": "data_table", "uniqueAttributes": { "qualifiedName": "demo_db.retail_orders@sqlite" } }
},
{
"typeName": "table_has_column",
"end1": { "typeName": "data_column", "uniqueAttributes": { "qualifiedName": "demo_db.retail_orders.order_date@sqlite" } },
"end2": { "typeName": "data_table", "uniqueAttributes": { "qualifiedName": "demo_db.retail_orders@sqlite" } }
},
{
"typeName": "table_join_relationship",
"end1": { "typeName": "data_table", "uniqueAttributes": { "qualifiedName": "demo_db.retail_orders@sqlite" } },
"end2": { "typeName": "data_table", "uniqueAttributes": { "qualifiedName": "demo_db.retail_customers@sqlite" } },
"attributes": {
"join_type": "INNER JOIN",
"expression": "{source}.customer_id = {target}.customer_id",
"cardinality": "N:1",
"intent": "订单客户外键关联"
}
},
{
"typeName": "column_join_relationship",
"end1": { "typeName": "data_column", "uniqueAttributes": { "qualifiedName": "demo_db.retail_orders.customer_id@sqlite" } },
"end2": { "typeName": "data_column", "uniqueAttributes": { "qualifiedName": "demo_db.retail_customers.customer_id@sqlite" } },
"attributes": {
"join_type": "INNER JOIN",
"expression": "{source} = {target}",
"cardinality": "N:1",
"intent": "订单表客户 ID 关联客户表客户 ID"
}
}
]
}
EOF
4. Import into Semantic Service¶
Confirm Semantic Service is running and replace the port with yours. The example below uses 32000; if you started with the default port, use 31000.
SEMANTIC_SERVICE_URL="http://localhost:32000"
curl -sS -o /dev/null -w "HTTP %{http_code}\n" \
"$SEMANTIC_SERVICE_URL/api/metaVisor/v3/types/typedefs"
After you see HTTP 200, import metadata:
curl -X POST \
-H "Content-Type: application/json" \
-d @"$HOME/demo_seed_sqlite.json" \
"$SEMANTIC_SERVICE_URL/api/metaVisor/v3/entity/bulk"
Verify that tables can be retrieved:
curl -X POST \
-H "Content-Type: application/json" \
-d '{"typeName":"data_table","query":"retail_orders","limit":5}' \
"$SEMANTIC_SERVICE_URL/api/metaVisor/v3/search/basic"
If the response includes demo_db.retail_orders@sqlite, metadata import succeeded.
5. Align with DataAgent Configuration¶
DataAgent reads real business data; Semantic Service supplies table, column, and relationship semantics. Both sides must use the same names:
DATABASE:
db_id: "demo_db"
engine: "sqlite"
config:
path: "/home/your_user/demo_retail.sqlite"
METAVISOR:
metavisor_url: "http://localhost:32000"
valuematch_url: "http://localhost:8000"
Alignment:
| DataAgent config | Metadata field | This guide |
|---|---|---|
DATABASE.db_id |
databaseName |
demo_db |
DATABASE.engine |
sourceType and qualifiedName suffix |
sqlite, @sqlite |
DATABASE.config.path |
SQLite file path | ~/demo_retail.sqlite |
| SQLite table names | tableNameEn |
retail_orders, retail_customers |
The SQLite file path is only in DataAgent YAML; Semantic Service does not store it.
6. Next Steps¶
After this guide, continue with:
To use your own business database, change three things:
- Replace
demo_retail.sqlitewith your database. - Rewrite
demo_seed_sqlite.jsonfor your real tables, columns, and join relationships. - Keep DataAgent
DATABASE.db_idandDATABASE.enginealigned with metadatadatabaseNameandsourceType.