Python SDK Interface¶

DataAgent is the core Python SDK entry point exposed by the DataAgent framework. Instantiate an Agent by loading YAML configuration via from_config, then interact through chat or astream.

DataAgent.from_config¶

Interface Definition

class DataAgent:
    @classmethod
    def from_config(cls, config: str | Path) -> "DataAgent":
        ...

Creates an Agent instance from a YAML configuration file. The config path can be absolute or relative.

Parameters

Parameter	Type	Description
`config`	`str \\| Path`	Path to YAML config file (required)

Returns

A DataAgent instance.

Example

from dataagent.interface.sdk.agent import DataAgent

agent = DataAgent.from_config("path/to/ecommerce_agent.yaml")

DataAgent.chat¶

Interface Definition

async def chat(
    self,
    user_query: str,
    session_id: str | None = None,
    workspace: Path | str | None = None,
    initial_state: dict[str, Any] | None = None,
) -> dict[str, Any]:
    ...

Triggers a single-turn agent conversation. When debug=True (default), conversation logs are streamed to the terminal via Rich renderer with intermediate results.

Parameters

Parameter	Type	Default	Description
`user_query`	`str`	required	User query text
`session_id`	`str \\| None`	`None`	Session ID. When omitted, first tries `initial_state.session_id`, then reuses `self.session_id`, finally auto-generates
`workspace`	`Path \\| str \\| None`	`None`	Workspace override. Overrides the workspace setting in the config file
`initial_state`	`dict \\| None`	`None`	Initial state dict, can carry `user_id`, `session_id`, `messages`, etc.

Returns

dict[str, Any] — final state dict. Key fields:

Field	Type	Description
`messages`	`list`	Complete message history for this turn
`final_answer`	`str`	Present only on error, contains error description
`complete`	`bool`	Whether the conversation ended normally
`user_query`	`str`	The original user query
`error`	`str`	Present only on exception, exception info string

Example

response = await agent.chat("What was the top-selling product last month?")
# Extract final answer from messages on success
if "messages" in response:
    last_msg = response["messages"][-1]
    print(last_msg.content)

DataAgent.astream¶

Interface Definition

def astream(self, *args, **kwargs):
    ...

Triggers a streaming agent conversation, yielding events one by one via async generator. Suitable for web frontend-backend interaction scenarios.

Parameters

Supports two calling conventions:

LangGraph native: astream(input={...}, config={...}, stream_mode=...)
openJiuwen: astream(initial_state={...}, start_at=..., checkpoint_id=...)

Both support passing state fields such as session_id and workspace via initial_state.

Returns

AsyncGenerator — async generator yielding (stream_mode, event_data) tuples: - stream_mode="values": event_data is the current complete state - stream_mode="updates": event_data is incremental updates - stream_mode="custom": event_data is custom events (e.g., Rich render events)

Example

async for mode, data in agent.astream(input={"messages": [("human", "Analyze customer data")]}):
    if mode == "values":
        print(data)

YAML Configuration Reference¶

The following sections show the complete YAML configuration structure by module. All fields reflect actual code behavior. Fields not marked "optional" are required.

AGENT_CONFIG — Agent Base Configuration¶

AGENT_CONFIG:
  name: "Ecommerce Analysis Agent"       # Agent name
  type: "react"                          # Agent engine type: react (FlexAgent) | nl2sql (NL2SQLAgent)
  backend: "langgraph"                   # Backend engine, default "langgraph"
  max_iter: 50                           # Max iterations, unlimited if unset
  token_limit: 100000                    # Token limit, unlimited if unset
  enable_human_feedback: false           # Enable HITL human-in-the-loop, default false
  enable_portrait: false                 # Enable user portrait memory, default false

Code Behavior: - type determines engine selection in select_engine(): react → dataagent.core.flex.agent.FlexAgent, nl2sql → dataagent.agents.nl2sql.agent.NL2SQLAgent - max_iter when set is written to FlexRouter; exceeding it raises LimitReachedError, returning current state with a termination message appended - enable_human_feedback=true creates HumanFeedbackNode and registers the request_human_feedback tool - enable_portrait=true writes user characteristics to Memory via portraiter hook

MODEL — Model Configuration¶

MODEL:
  deepseek:                              # Model slot name (referenced by chat_model.name in Planner)
    name: "DEEPSEEK_CHAT"                # Model identifier
    model_type: "chat"                   # chat | embedding
    provider: "deepseek"                 # Platform identifier, used to read DEEPSEEK_BASE_URL / DEEPSEEK_API_KEY env vars
    tool_call_mode: "native"             # Tool call mode, default "native"
    params:
      model: "deepseek-chat"             # Actual model name (passed to litellm)
      temperature: 0.7
      max_tokens: 8192
      timeout: 90
      max_retries: 3

  qwen3:                                 # Auxiliary model slot (for hooks or standalone nodes)
    name: "QWEN3_CHAT"
    model_type: "chat"
    provider: "openai"                   # OpenAI-protocol-compatible service
    params:
      model: "qwen3-235b"
      temperature: 0.3

Code Behavior: - Each model slot is a dict; the key (e.g., deepseek) is the slot name - provider is uppercased to construct env var names: {PROVIDER}_BASE_URL and {PROVIDER}_API_KEY. Actual API keys and base URLs are injected via .env - params.model is required; other parameters (temperature, max_tokens, etc.) are optional and passed directly to litellm - Nodes reference model slots via chat_model.name, merged into AgentEnv.llm_configs - Model slots not referenced by child nodes are still included in llm_configs for hook use via runtime.llm("<slot_name>")

SCENARIO — Scenario Description¶

SCENARIO:
  chat:                                  # Scenario mode key, corresponds to mode="chat"
    instructions: |
      You are a professional data analysis assistant.
      Prioritize using available tools to obtain real data; note missing information when uncertain.
      Answers must be based on actual query results; do not fabricate data.

Code Behavior: - instructions is written to AgentEnv.instructions for use by the Planner node's prompt template

ACTOR_LOOP — Workflow Nodes¶

ACTOR_LOOP:                              # Main loop workflow (required, at least one node)
  - node: "planner"
    module: "dataagent.core.flex.nodes.planner.Planner"
    chat_model:
      name: "deepseek"                   # References MODEL.deepseek
    prompt_template:                     # Optional, append prompt
      system:                            # Only system / user supported
        content: "Extra text injected into system prompt (Jinja2 template)"

  - node: "executor"
    module: "dataagent.core.flex.nodes.executor.Executor"
    max_tool_result_length: 8192         # Max tool result length (truncation)
    max_concurrency: 5                   # Max concurrent tool calls

Code Behavior: - FlexAgent._create_nodes_from_config dynamically imports each node's module, using node as the node name - Reserved keys (node, module, chat_model, prompt_template) are not passed to the constructor; all other key-value pairs are passed as **kwargs - chat_model can be a string (shorthand for name) or a dict (with name key) - prompt_template supports only system / user message types, each with content (inline) or path (absolute path), mutually exclusive - FlexRouter loops through ACTOR_LOOP nodes until state.complete is True or max_iter is reached

TOOLS — Tool Configuration¶

TOOLS:
  local_functions:                       # Custom local Python function tools
    - module: "dataagent.actions.tools.local_tool.tools"
      function: "natural_language_to_sql"
    - module: "dataagent.actions.tools.local_tool.tools"
      function: "natural_language_to_plot"
    - module: "dataagent.actions.tools.local_tool.tools"
      function: "report_generator"

  mcp_servers:                           # MCP server tools
    - name: "my_mcp_server"
      url: "http://localhost:8000/mcp"

  A2A:                                   # Agent-to-Agent protocol tools
    - name: "other_agent"
      url: "http://localhost:9000/a2a"

  builtin:                               # Builtin tool override (6 tools registered by default below)
    - module: "dataagent.actions.tools.local_tool.bash_tool"
      function: "bash"
    - module: "dataagent.actions.tools.local_tool.file_tools"
      function: "edit_file"
    - module: "dataagent.actions.tools.local_tool.file_tools"
      function: "read_file"
    - module: "dataagent.actions.tools.local_tool.file_tools"
      function: "write_file"
    - module: "dataagent.actions.tools.local_tool.search_tools"
      function: "grep"
    - module: "dataagent.actions.tools.local_tool.search_tools"
      function: "glob"

Code Behavior: - 6 builtin tools registered by default: bash, edit_file, read_file, write_file, grep, glob - Setting TOOLS.builtin overrides the default list - Each local_functions entry is dynamically imported via module + function and registered - mcp_servers starts MCP client connections and auto-discovers tools - A2A registers remote Agent tools - Builtin skill data_analysis_report is active by default (dataagent/actions/skills/data_analysis_report/) - All tools are registered with ToolManager; executor calls them via runtime.tool_manager

CONTEXT — Context Management¶

CONTEXT:
  compress_token_limit: 32768            # Trigger LLM compression when message tokens exceed this value ×1.2
  compress_message_cnt: 200              # Trigger compression when message count exceeds this value
  file_node_threshold: 500               # Min chars for long text to be persisted as FileNode during IR conversion

Code Behavior: - All three are optional; no limit if unset - compress_token_limit actual trigger threshold is compress_token_limit * 1.2

WORKSPACE — Working Directory¶

WORKSPACE:
  path: "/data/agent_workspace"          # Agent workspace root directory (use absolute path)
  allow_path:                            # Allowed directories (Bash tool can only access these)
    - "/data/shared"
    - "/home/user/datasets"

Code Behavior: - Paths in path and allow_path must be absolute (supports ~/) - ConfigManager._validate_workspace_yaml_config validates during config loading - allow_path must be a list, not a single string

BASH_TOOL_WHITELIST — Bash Command Whitelist¶

BASH_TOOL_WHITELIST:
  - ls
  - cat
  - head
  - python
  - pip

Code Behavior: - When configured, only commands in the list are allowed in the Bash tool - Unlimited if unset or null

Complete Example¶

A ready-to-use complete YAML configuration:

AGENT_CONFIG:
  name: "Ecommerce Data Analysis Agent"
  type: "react"
  backend: "langgraph"
  max_iter: 50

MODEL:
  deepseek:
    name: "DEEPSEEK_CHAT"
    model_type: "chat"
    provider: "deepseek"
    params:
      model: "deepseek-chat"
      temperature: 0.7
      max_tokens: 8192
      timeout: 90
      max_retries: 3

  jina_v3:
    name: "jina_v3"
    model_type: "embedding"
    provider: "embedding"
    params:
      model: "jina-embeddings-v3"

SCENARIO:
  chat:
    instructions: |
      You are an ecommerce data analysis assistant. Prioritize using tools for real data; note missing information when uncertain.

ACTOR_LOOP:
  - node: "planner"
    module: "dataagent.core.flex.nodes.planner.Planner"
    chat_model:
      name: "deepseek"

  - node: "executor"
    module: "dataagent.core.flex.nodes.executor.Executor"
    max_tool_result_length: 8192

TOOLS:
  local_functions:
    - module: "dataagent.actions.tools.local_tool.tools"
      function: "natural_language_to_sql"
    - module: "dataagent.actions.tools.local_tool.tools"
      function: "report_generator"

WORKSPACE:
  path: "/data/agent_workspace"

Usage Example:

from dataagent.interface.sdk.agent import DataAgent

# Create Agent from config
agent = DataAgent.from_config("ecommerce_agent.yaml")

# Single-turn conversation
response = await agent.chat("What was the top-selling product last month?")
if "messages" in response:
    last_msg = response["messages"][-1]
    print(last_msg.content)

# Streaming conversation
async for mode, data in agent.astream(
    input={"messages": [("human", "Analyze customer retention trends")]},
    stream_mode="values"
):
    if mode == "values":
        print(data.get("messages", [])[-1] if data.get("messages") else "")