通用 Agent 架构设计完全指南

基于 Context Engineering、Claude Code 和企业级最佳实践
从零到生产级 Agent 的完整实现路径

核心设计原则

Andrej Karpathy 的 Context Engineering 原则

"Context engineering is the delicate art and science of filling the context window with just the right information for the next step."

Claude Code 的简洁哲学

"Claude Code is relatively simple. It is a standard agentic pattern for a single agent, combined with a host of tricks to enable running long sessions."

3 个核心原则

1. 简洁 > 复杂
   └─ 从最简单的 while(tool_use) 循环开始

2. 上下文 > 提示词
   └─ 设计完整的上下文工程系统

3. 专业化 > 通用化
   └─ 多个专门的 Sub-agents 胜过一个全能 Agent

架构设计

1. 宏观架构图

Preparing diagram...

2. 核心执行流程

Preparing diagram...

3. Context Engineering 分层架构

Preparing diagram...

关键洞察:

Level 0: 基础构建块 (必需)
Level 1: 认知工具使性能提升 43% (IBM 研究)
Level 2: 神经场理论支持长期记忆和语义理解
Level 3: 协议系统实现模块化和可组合性
Level 4: 元递归实现自我改进

设计考虑维度

维度 1: 上下文工程 (Context Engineering)

1.1 Token 预算管理

# Token 预算分配策略
class TokenBudget:
    """
    总预算: 200k tokens (Claude Sonnet 4.5)
    """
    SYSTEM_PROMPT = 2000      # 2k - 系统指令
    USER_TASK = 5000          # 5k - 用户任务描述
    EXAMPLES = 10000          # 10k - Few-shot 示例
    MEMORY = 30000            # 30k - 相关记忆
    TOOLS_DEF = 15000         # 15k - 工具定义
    CONVERSATION = 50000      # 50k - 对话历史
    WORKING = 70000           # 70k - 工作空间
    RESPONSE = 18000          # 18k - 响应生成

    @classmethod
    def validate(cls, context: Dict[str, int]) -> bool:
        """验证 token 使用是否在预算内"""
        total = sum(context.values())
        return total <= 200000

1.2 记忆系统设计

Preparing diagram...

实现策略:

from typing import List, Dict, Optional
import numpy as np
from datetime import datetime

class MemorySystem:
    """
    MEM1 风格的记忆系统
    参考: https://arxiv.org/pdf/2506.15841
    """

    def __init__(self):
        self.short_term: List[Dict] = []  # 最近 10 条交互
        self.working: Dict = {}            # 当前任务上下文
        self.long_term: VectorDB = None    # 向量数据库
        self.episodic: List[Episode] = []  # 情景记忆

    def consolidate(self, interaction: Dict) -> Dict:
        """
        记忆巩固: 将短期记忆压缩为长期记忆

        关键: 只保留关键信息,丢弃冗余
        """
        # 1. 提取关键实体和关系
        entities = self._extract_entities(interaction)
        relations = self._extract_relations(entities)

        # 2. 语义压缩
        compressed = self._semantic_compress(
            interaction,
            compression_ratio=0.3  # 压缩到 30%
        )

        # 3. 生成记忆嵌入
        embedding = self._generate_embedding(compressed)

        # 4. 存储到长期记忆
        memory_entry = {
            "timestamp": datetime.now(),
            "summary": compressed,
            "entities": entities,
            "relations": relations,
            "embedding": embedding,
            "importance": self._calculate_importance(interaction)
        }

        self.long_term.add(memory_entry)
        return memory_entry

    def retrieve(self, query: str, k: int = 5) -> List[Dict]:
        """
        记忆检索: 获取最相关的 k 条记忆
        """
        # 1. 查询嵌入
        query_embedding = self._generate_embedding(query)

        # 2. 语义相似度检索
        semantic_results = self.long_term.search(
            query_embedding,
            top_k=k
        )

        # 3. 时间衰减
        decayed_results = self._apply_temporal_decay(
            semantic_results
        )

        # 4. 重要性加权
        weighted_results = self._apply_importance_weight(
            decayed_results
        )

        return weighted_results[:k]

    def _calculate_importance(self, interaction: Dict) -> float:
        """
        计算交互重要性

        考虑因素:
        - 任务成功率
        - 用户满意度
        - 信息密度
        - 创新程度
        """
        factors = {
            "task_success": 0.3,
            "user_satisfaction": 0.3,
            "information_density": 0.2,
            "novelty": 0.2
        }

        score = sum(
            factors[k] * interaction.get(k, 0.5)
            for k in factors
        )

        return score

维度 2: 认知工具 (Cognitive Tools)

IBM 研究: 认知工具提升 43% 性能

Preparing diagram...

实现:

class CognitiveTool:
    """
    认知工具基类

    参考: IBM Zurich - Eliciting Reasoning with Cognitive Tools
    """

    def __init__(self, name: str, prompt_template: str):
        self.name = name
        self.template = prompt_template

    def __call__(self, problem: str, context: Dict = None) -> str:
        """执行认知工具"""
        prompt = self.template.format(
            problem=problem,
            **( context or {})
        )

        return self.llm_call(prompt)


# 工具 1: 理解问题
understand_question = CognitiveTool(
    name="understand_question",
    prompt_template="""
    Analyze the following problem and extract:

    Problem: {problem}

    1. Main Concepts: What are the key concepts?
    2. Given Information: What do we know?
    3. Unknown: What do we need to find?
    4. Constraints: What are the limitations?
    5. Relevant Theorems/Techniques: What might help?

    Provide a structured analysis:
    """
)

# 工具 2: 回忆相关知识
recall_related = CognitiveTool(
    name="recall_related",
    prompt_template="""
    Given the problem analysis:

    {problem_analysis}

    Recall and list:
    1. Similar problems you've solved
    2. Relevant mathematical/logical principles
    3. Known solution patterns
    4. Common pitfalls to avoid

    Provide relevant knowledge:
    """
)

# 工具 3: 检查答案
examine_answer = CognitiveTool(
    name="examine_answer",
    prompt_template="""
    Verify the proposed solution:

    Problem: {problem}
    Proposed Solution: {solution}

    Check:
    1. Does it satisfy all constraints?
    2. Is the logic sound?
    3. Are there edge cases?
    4. Can it be simplified?

    Provide verification:
    """
)

# 工具 4: 回溯思考
backtracking = CognitiveTool(
    name="backtracking",
    prompt_template="""
    The current approach seems stuck:

    Current Path: {current_path}
    Issue: {issue}

    Consider:
    1. Alternative approaches
    2. Relaxing constraints
    3. Decomposing differently
    4. Using different techniques

    Suggest backtracking strategy:
    """
)


class CognitiveToolchain:
    """认知工具链编排"""

    def __init__(self):
        self.tools = {
            "understand": understand_question,
            "recall": recall_related,
            "examine": examine_answer,
            "backtrack": backtracking
        }

    def solve(self, problem: str, max_iterations: int = 5) -> str:
        """使用认知工具链解决问题"""

        # 步骤 1: 理解问题
        analysis = self.tools["understand"](problem)

        # 步骤 2: 回忆相关知识
        knowledge = self.tools["recall"](
            problem,
            context={"problem_analysis": analysis}
        )

        # 步骤 3-N: 迭代求解
        for i in range(max_iterations):
            # 尝试解决
            solution = self._attempt_solution(
                problem, analysis, knowledge
            )

            # 检查答案
            verification = self.tools["examine"](
                problem,
                context={"solution": solution}
            )

            if self._is_valid(verification):
                return solution

            # 回溯思考
            backtrack_strategy = self.tools["backtrack"](
                problem,
                context={
                    "current_path": solution,
                    "issue": verification
                }
            )

            # 调整策略
            knowledge += f"\n\nBacktrack Insight: {backtrack_strategy}"

        return solution  # 返回最佳尝试

维度 3: Sub-Agent 架构

3.1 Agent 专业化模式

Preparing diagram...

关键原则:

单一职责: 每个 Agent 只做一件事
清晰边界: 明确的输入/输出接口
可组合性: 可以任意组合和串联
Token 效率: 轻量级 Agent (小于 3k tokens)

3.2 Sub-Agent 实现

from typing import Protocol, Dict, List, Optional
from dataclasses import dataclass
from enum import Enum

class AgentRole(Enum):
    """Agent 角色定义"""
    ORCHESTRATOR = "orchestrator"
    ANALYST = "analyst"
    ARCHITECT = "architect"
    DEVELOPER = "developer"
    TESTER = "tester"
    REVIEWER = "reviewer"


@dataclass
class AgentSpec:
    """Agent 规格说明"""
    role: AgentRole
    name: str
    description: str
    system_prompt: str
    tools: List[str]
    model: str = "claude-sonnet-4-5"
    max_tokens: int = 4000
    temperature: float = 0.7


class SubAgent(Protocol):
    """Sub-Agent 接口"""

    def execute(
        self,
        task: str,
        context: Dict,
        tools: List[Tool]
    ) -> Dict:
        """执行任务"""
        ...

    def get_cost(self) -> float:
        """获取成本"""
        ...


class SpecialistAgent:
    """专业化 Agent 实现"""

    def __init__(self, spec: AgentSpec):
        self.spec = spec
        self.conversation_history = []
        self.token_usage = 0

    def execute(
        self,
        task: str,
        context: Dict,
        tools: List[Tool]
    ) -> Dict:
        """
        执行任务

        Claude Code 模式: while(tool_use) 循环
        """
        # 构建初始消息
        messages = self._build_messages(task, context)

        # 主执行循环
        max_iterations = 10
        iteration = 0

        while iteration < max_iterations:
            # 调用 LLM
            response = self._llm_call(messages)

            # 记录 token 使用
            self.token_usage += response.usage.total_tokens

            # 检查是否需要使用工具
            if not response.tool_calls:
                # 无工具调用,任务完成
                return {
                    "success": True,
                    "result": response.content,
                    "iterations": iteration,
                    "tokens": self.token_usage
                }

            # 执行工具调用
            tool_results = self._execute_tools(
                response.tool_calls,
                tools
            )

            # 将结果添加到消息历史
            messages.append({
                "role": "assistant",
                "content": response.content,
                "tool_calls": response.tool_calls
            })
            messages.append({
                "role": "tool",
                "content": tool_results
            })

            iteration += 1

        # 达到最大迭代次数
        return {
            "success": False,
            "result": "Max iterations reached",
            "iterations": iteration,
            "tokens": self.token_usage
        }

    def _build_messages(
        self,
        task: str,
        context: Dict
    ) -> List[Dict]:
        """构建消息列表"""

        # 系统提示词
        system_prompt = self.spec.system_prompt.format(**context)

        # 用户任务
        user_message = f"""
        Task: {task}

        Context:
        {self._format_context(context)}

        Please complete this task step by step.
        Use the available tools as needed.
        """

        return [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message}
        ]


# 预定义专业 Agents

SPEC_ANALYST = AgentSpec(
    role=AgentRole.ANALYST,
    name="spec-analyst",
    description="Requirements analysis specialist",
    system_prompt="""
    You are a requirements analyst expert.

    Your job:
    1. Understand user needs deeply
    2. Extract functional & non-functional requirements
    3. Identify edge cases and constraints
    4. Create clear, testable specifications

    Guidelines:
    - Ask clarifying questions
    - Use structured formats (User Stories, Use Cases)
    - Think about security, performance, scalability
    - Document assumptions
    """,
    tools=["read_file", "write_file"],
    temperature=0.5
)

ARCHITECT = AgentSpec(
    role=AgentRole.ARCHITECT,
    name="architect",
    description="System architecture designer",
    system_prompt="""
    You are a software architect expert.

    Your job:
    1. Design system architecture
    2. Choose appropriate technologies
    3. Plan data models and APIs
    4. Consider scalability and maintainability

    Guidelines:
    - Follow SOLID principles
    - Use design patterns appropriately
    - Create diagrams (Mermaid)
    - Document technical decisions
    """,
    tools=["read_file", "write_file", "search_docs"],
    temperature=0.6
)

DEVELOPER = AgentSpec(
    role=AgentRole.DEVELOPER,
    name="developer",
    description="Code implementation specialist",
    system_prompt="""
    You are an expert software developer.

    Your job:
    1. Implement features according to specs
    2. Write clean, maintainable code
    3. Follow coding standards
    4. Add appropriate comments

    Guidelines:
    - TDD: Write tests first
    - Keep functions small and focused
    - Handle errors gracefully
    - Optimize for readability
    """,
    tools=["read_file", "write_file", "bash", "search_code"],
    temperature=0.3
)

TESTER = AgentSpec(
    role=AgentRole.TESTER,
    name="tester",
    description="Quality assurance specialist",
    system_prompt="""
    You are a QA engineer expert.

    Your job:
    1. Write comprehensive tests
    2. Find edge cases and bugs
    3. Ensure code coverage
    4. Verify requirements

    Guidelines:
    - Unit + Integration + E2E tests
    - Aim for >80% coverage
    - Test failure scenarios
    - Document test rationale
    """,
    tools=["read_file", "write_file", "bash"],
    temperature=0.4
)

REVIEWER = AgentSpec(
    role=AgentRole.REVIEWER,
    name="reviewer",
    description="Code review specialist",
    system_prompt="""
    You are a senior code reviewer.

    Your job:
    1. Review code quality
    2. Check for security issues
    3. Verify best practices
    4. Suggest improvements

    Guidelines:
    - Be constructive, not critical
    - Focus on maintainability
    - Check for common vulnerabilities
    - Ensure documentation
    """,
    tools=["read_file", "write_file"],
    temperature=0.5
)

维度 4: 工具设计

4.1 工具最小化原则

Claude Code 启示:
- 只提供必要的工具
- 每个工具职责单一
- 工具定义要清晰
- 避免工具冗余

核心工具集:

from abc import ABC, abstractmethod
from typing import Any, Dict, List

class Tool(ABC):
    """工具基类"""

    @abstractmethod
    def name(self) -> str:
        """工具名称"""
        pass

    @abstractmethod
    def description(self) -> str:
        """工具描述"""
        pass

    @abstractmethod
    def parameters(self) -> Dict:
        """参数 schema (JSON Schema)"""
        pass

    @abstractmethod
    def execute(self, **kwargs) -> Any:
        """执行工具"""
        pass


class ReadFileTool(Tool):
    """读取文件工具"""

    def name(self) -> str:
        return "read_file"

    def description(self) -> str:
        return "Read the contents of a file"

    def parameters(self) -> Dict:
        return {
            "type": "object",
            "properties": {
                "path": {
                    "type": "string",
                    "description": "File path to read"
                },
                "start_line": {
                    "type": "integer",
                    "description": "Optional: start line (1-indexed)"
                },
                "end_line": {
                    "type": "integer",
                    "description": "Optional: end line (inclusive)"
                }
            },
            "required": ["path"]
        }

    def execute(
        self,
        path: str,
        start_line: Optional[int] = None,
        end_line: Optional[int] = None
    ) -> str:
        """读取文件"""
        with open(path, 'r') as f:
            if start_line is None:
                return f.read()

            lines = f.readlines()
            start = start_line - 1
            end = end_line if end_line else len(lines)

            return ''.join(lines[start:end])


class WriteFileTool(Tool):
    """写入文件工具"""

    def name(self) -> str:
        return "write_file"

    def description(self) -> str:
        return "Write content to a file"

    def parameters(self) -> Dict:
        return {
            "type": "object",
            "properties": {
                "path": {
                    "type": "string",
                    "description": "File path to write"
                },
                "content": {
                    "type": "string",
                    "description": "Content to write"
                },
                "mode": {
                    "type": "string",
                    "enum": ["write", "append"],
                    "description": "Write mode",
                    "default": "write"
                }
            },
            "required": ["path", "content"]
        }

    def execute(
        self,
        path: str,
        content: str,
        mode: str = "write"
    ) -> str:
        """写入文件"""
        file_mode = 'w' if mode == "write" else 'a'

        with open(path, file_mode) as f:
            f.write(content)

        return f"Successfully wrote to {path}"


class BashTool(Tool):
    """执行 Bash 命令工具"""

    def name(self) -> str:
        return "bash"

    def description(self) -> str:
        return "Execute a bash command"

    def parameters(self) -> Dict:
        return {
            "type": "object",
            "properties": {
                "command": {
                    "type": "string",
                    "description": "Bash command to execute"
                },
                "cwd": {
                    "type": "string",
                    "description": "Working directory",
                    "default": "."
                }
            },
            "required": ["command"]
        }

    def execute(
        self,
        command: str,
        cwd: str = "."
    ) -> str:
        """执行命令 (需要权限检查!)"""
        import subprocess

        # 危险命令检查
        dangerous = ["rm -rf", "dd", "mkfs", "> /dev"]
        if any(d in command for d in dangerous):
            return "ERROR: Dangerous command blocked"

        try:
            result = subprocess.run(
                command,
                shell=True,
                cwd=cwd,
                capture_output=True,
                text=True,
                timeout=30
            )

            return f"STDOUT:\n{result.stdout}\nSTDERR:\n{result.stderr}"

        except subprocess.TimeoutExpired:
            return "ERROR: Command timeout"
        except Exception as e:
            return f"ERROR: {str(e)}"


# 工具注册表
class ToolRegistry:
    """工具注册与管理"""

    def __init__(self):
        self.tools: Dict[str, Tool] = {}

    def register(self, tool: Tool):
        """注册工具"""
        self.tools[tool.name()] = tool

    def get(self, name: str) -> Optional[Tool]:
        """获取工具"""
        return self.tools.get(name)

    def list_tools(self) -> List[Dict]:
        """列出所有工具 (用于 LLM)"""
        return [
            {
                "name": tool.name(),
                "description": tool.description(),
                "parameters": tool.parameters()
            }
            for tool in self.tools.values()
        ]


# 创建全局工具注册表
registry = ToolRegistry()
registry.register(ReadFileTool())
registry.register(WriteFileTool())
registry.register(BashTool())

维度 5: 权限与安全

5.1 权限系统设计

Preparing diagram...

实现:

from enum import Enum
from typing import Callable, Optional

class PermissionLevel(Enum):
    """权限级别"""
    READ_ONLY = 1
    SAFE_WRITE = 2
    COMMAND_EXEC = 3
    DANGEROUS = 4


class Permission:
    """权限定义"""

    def __init__(
        self,
        level: PermissionLevel,
        paths: Optional[List[str]] = None,
        commands: Optional[List[str]] = None
    ):
        self.level = level
        self.paths = paths or []
        self.commands = commands or []

    def allows(self, action: str, target: str) -> bool:
        """检查是否允许某个操作"""

        if action == "read":
            return self.level.value >= PermissionLevel.READ_ONLY.value

        if action == "write":
            if self.level.value < PermissionLevel.SAFE_WRITE.value:
                return False

            # 检查路径白名单
            if self.paths:
                return any(
                    target.startswith(p) for p in self.paths
                )
            return True

        if action == "execute":
            if self.level.value < PermissionLevel.COMMAND_EXEC.value:
                return False

            # 检查命令白名单
            if self.commands:
                return target.split()[0] in self.commands
            return True

        return False


class PermissionManager:
    """权限管理器"""

    def __init__(self):
        self.agent_permissions: Dict[str, Permission] = {}
        self.approval_required: Set[str] = set()

    def grant(self, agent_name: str, permission: Permission):
        """授予权限"""
        self.agent_permissions[agent_name] = permission

    def check(
        self,
        agent_name: str,
        action: str,
        target: str
    ) -> bool:
        """检查权限"""

        perm = self.agent_permissions.get(agent_name)
        if not perm:
            return False

        return perm.allows(action, target)

    def require_approval(
        self,
        agent_name: str,
        action: str,
        target: str,
        callback: Callable
    ) -> bool:
        """需要人类批准"""

        key = f"{agent_name}:{action}:{target}"

        if key in self.approval_required:
            # 已批准
            return True

        # 请求用户批准
        approved = callback(agent_name, action, target)

        if approved:
            self.approval_required.add(key)

        return approved


# 使用示例

perm_manager = PermissionManager()

# Analyst: 只读权限
perm_manager.grant(
    "spec-analyst",
    Permission(
        level=PermissionLevel.READ_ONLY
    )
)

# Developer: 安全写入权限
perm_manager.grant(
    "developer",
    Permission(
        level=PermissionLevel.SAFE_WRITE,
        paths=["/src", "/tests"],  # 只能写这些目录
        commands=["git", "npm", "pytest"]  # 允许的命令
    )
)

# Reviewer: 命令执行权限
perm_manager.grant(
    "reviewer",
    Permission(
        level=PermissionLevel.COMMAND_EXEC,
        paths=["/src", "/tests", "/docs"],
        commands=["git", "npm", "pytest", "lint"]
    )
)

维度 6: 可观测性

6.1 日志与监控

import logging
from datetime import datetime
from typing import Dict, Any
import json

class AgentLogger:
    """Agent 专用日志系统"""

    def __init__(self, agent_name: str):
        self.agent_name = agent_name
        self.logger = logging.getLogger(f"agent.{agent_name}")
        self.session_id = self._generate_session_id()

    def log_task_start(self, task: str, context: Dict):
        """记录任务开始"""
        self.logger.info({
            "event": "task_start",
            "agent": self.agent_name,
            "session": self.session_id,
            "task": task,
            "context": context,
            "timestamp": datetime.now().isoformat()
        })

    def log_tool_call(
        self,
        tool_name: str,
        parameters: Dict,
        result: Any
    ):
        """记录工具调用"""
        self.logger.info({
            "event": "tool_call",
            "agent": self.agent_name,
            "session": self.session_id,
            "tool": tool_name,
            "parameters": parameters,
            "result": str(result)[:200],  # 截断
            "timestamp": datetime.now().isoformat()
        })

    def log_error(self, error: Exception, context: Dict):
        """记录错误"""
        self.logger.error({
            "event": "error",
            "agent": self.agent_name,
            "session": self.session_id,
            "error_type": type(error).__name__,
            "error_message": str(error),
            "context": context,
            "timestamp": datetime.now().isoformat()
        })

    def log_metrics(self, metrics: Dict):
        """记录指标"""
        self.logger.info({
            "event": "metrics",
            "agent": self.agent_name,
            "session": self.session_id,
            "metrics": metrics,
            "timestamp": datetime.now().isoformat()
        })


class MetricsCollector:
    """指标收集器"""

    def __init__(self):
        self.metrics = {
            "total_tasks": 0,
            "successful_tasks": 0,
            "failed_tasks": 0,
            "total_tokens": 0,
            "total_cost": 0.0,
            "total_time": 0.0,
            "tool_calls": {},
            "agent_usage": {}
        }

    def record_task(
        self,
        agent_name: str,
        success: bool,
        tokens: int,
        cost: float,
        time: float
    ):
        """记录任务指标"""
        self.metrics["total_tasks"] += 1

        if success:
            self.metrics["successful_tasks"] += 1
        else:
            self.metrics["failed_tasks"] += 1

        self.metrics["total_tokens"] += tokens
        self.metrics["total_cost"] += cost
        self.metrics["total_time"] += time

        if agent_name not in self.metrics["agent_usage"]:
            self.metrics["agent_usage"][agent_name] = {
                "count": 0,
                "tokens": 0,
                "cost": 0.0
            }

        self.metrics["agent_usage"][agent_name]["count"] += 1
        self.metrics["agent_usage"][agent_name]["tokens"] += tokens
        self.metrics["agent_usage"][agent_name]["cost"] += cost

    def record_tool_call(self, tool_name: str):
        """记录工具调用"""
        if tool_name not in self.metrics["tool_calls"]:
            self.metrics["tool_calls"][tool_name] = 0

        self.metrics["tool_calls"][tool_name] += 1

    def get_summary(self) -> Dict:
        """获取汇总指标"""
        return {
            **self.metrics,
            "success_rate": (
                self.metrics["successful_tasks"] /
                max(self.metrics["total_tasks"], 1)
            ),
            "avg_cost_per_task": (
                self.metrics["total_cost"] /
                max(self.metrics["total_tasks"], 1)
            ),
            "avg_time_per_task": (
                self.metrics["total_time"] /
                max(self.metrics["total_tasks"], 1)
            )
        }

完整实现教程

步骤 1: 项目初始化

# 创建项目结构
mkdir -p universal-agent/{
    src/{agents,tools,memory,orchestrator},
    tests,
    config,
    logs,
    data/{vector_db,cache}
}

cd universal-agent

# 创建虚拟环境
python -m venv venv
source venv/bin/activate  # Linux/Mac
# venv\Scripts\activate  # Windows

# 安装依赖
pip install anthropic numpy chromadb pydantic python-dotenv

项目结构:

universal-agent/
├── src/
│   ├── agents/          # Agent 实现
│   │   ├── base.py
│   │   ├── specialist.py
│   │   └── orchestrator.py
│   ├── tools/           # 工具实现
│   │   ├── file_ops.py
│   │   ├── bash.py
│   │   └── registry.py
│   ├── memory/          # 记忆系统
│   │   ├── short_term.py
│   │   ├── long_term.py
│   │   └── retrieval.py
│   ├── orchestrator/    # 编排器
│   │   ├── main.py
│   │   └── context.py
│   └── utils/           # 工具函数
│       ├── logging.py
│       ├── metrics.py
│       └── permissions.py
├── tests/               # 测试
├── config/              # 配置
│   └── agents.yaml
├── logs/                # 日志
├── data/                # 数据
└── main.py              # 入口

步骤 2: 配置 Agent

config/agents.yaml:

# Agent 配置文件

orchestrator:
  model: claude-sonnet-4-5
  max_tokens: 8000
  temperature: 0.7

agents:
  - role: analyst
    name: spec-analyst
    model: claude-sonnet-4-5
    temperature: 0.5
    max_tokens: 4000
    tools:
      - read_file
      - write_file
    permissions:
      level: READ_ONLY

  - role: architect
    name: architect
    model: claude-sonnet-4-5
    temperature: 0.6
    max_tokens: 6000
    tools:
      - read_file
      - write_file
    permissions:
      level: SAFE_WRITE
      paths:
        - /design
        - /docs

  - role: developer
    name: developer
    model: claude-haiku-4-5 # 使用 Haiku 降低成本
    temperature: 0.3
    max_tokens: 8000
    tools:
      - read_file
      - write_file
      - bash
    permissions:
      level: SAFE_WRITE
      paths:
        - /src
        - /tests
      commands:
        - git
        - npm
        - pytest

  - role: tester
    name: tester
    model: claude-haiku-4-5
    temperature: 0.4
    max_tokens: 6000
    tools:
      - read_file
      - write_file
      - bash
    permissions:
      level: COMMAND_EXEC
      commands:
        - pytest
        - npm test
        - coverage

  - role: reviewer
    name: reviewer
    model: claude-sonnet-4-5
    temperature: 0.5
    max_tokens: 6000
    tools:
      - read_file
      - write_file
    permissions:
      level: READ_ONLY

cognitive_tools:
  enabled: true
  tools:
    - understand_question
    - recall_related
    - examine_answer
    - backtracking

memory:
  short_term:
    max_size: 10
  long_term:
    type: chromadb
    path: ./data/vector_db
  consolidation:
    threshold: 5
    compression_ratio: 0.3

步骤 3: 实现核心组件

src/orchestrator/main.py - 主编排器:

from typing import List, Dict, Optional
import yaml
from ..agents.specialist import SpecialistAgent
from ..memory.short_term import ShortTermMemory
from ..memory.long_term import LongTermMemory
from ..tools.registry import ToolRegistry
from ..utils.logging import AgentLogger
from ..utils.metrics import MetricsCollector

class MainOrchestrator:
    """
    主编排器

    职责:
    1. 任务分解
    2. Agent 选择和调度
    3. 上下文管理
    4. 结果聚合
    """

    def __init__(self, config_path: str):
        # 加载配置
        with open(config_path) as f:
            self.config = yaml.safe_load(f)

        # 初始化组件
        self.agents = self._init_agents()
        self.tools = ToolRegistry()
        self.short_memory = ShortTermMemory()
        self.long_memory = LongTermMemory(
            self.config['memory']['long_term']['path']
        )
        self.logger = AgentLogger("orchestrator")
        self.metrics = MetricsCollector()

    def execute(self, user_request: str) -> Dict:
        """
        执行用户请求

        这是主入口点
        """
        self.logger.log_task_start(user_request, {})

        try:
            # 步骤 1: 任务分析
            task_plan = self._analyze_task(user_request)

            # 步骤 2: 执行任务计划
            results = self._execute_plan(task_plan)

            # 步骤 3: 聚合结果
            final_result = self._aggregate_results(results)

            # 步骤 4: 更新记忆
            self._update_memory(user_request, final_result)

            return {
                "success": True,
                "result": final_result,
                "metrics": self.metrics.get_summary()
            }

        except Exception as e:
            self.logger.log_error(e, {"request": user_request})
            return {
                "success": False,
                "error": str(e)
            }

    def _analyze_task(self, request: str) -> Dict:
        """
        任务分析

        使用 Analyst Agent 分析任务并创建执行计划
        """
        analyst = self.agents.get("spec-analyst")

        # 获取相关上下文
        context = self._build_context(request)

        # 分析任务
        analysis = analyst.execute(
            task=f"""
            Analyze this user request and create an execution plan:

            Request: {request}

            Create a plan with:
            1. Required agents (analyst/architect/developer/tester/reviewer)
            2. Sequence of steps
            3. Dependencies between steps
            4. Success criteria

            Format as JSON.
            """,
            context=context,
            tools=self.tools.get_subset(["read_file", "write_file"])
        )

        return analysis

    def _execute_plan(self, plan: Dict) -> List[Dict]:
        """执行任务计划"""
        results = []

        for step in plan["steps"]:
            agent_name = step["agent"]
            agent = self.agents.get(agent_name)

            # 获取依赖步骤的结果
            dependencies = self._resolve_dependencies(
                step.get("dependencies", []),
                results
            )

            # 构建上下文
            context = self._build_context(
                step["description"],
                dependencies
            )

            # 执行步骤
            result = agent.execute(
                task=step["description"],
                context=context,
                tools=self.tools.get_for_agent(agent_name)
            )

            results.append({
                "step": step["name"],
                "agent": agent_name,
                "result": result
            })

        return results

    def _build_context(
        self,
        query: str,
        additional: Optional[Dict] = None
    ) -> Dict:
        """
        构建上下文

        整合:
        - 短期记忆
        - 长期记忆 (检索)
        - 额外上下文
        """
        context = {}

        # 短期记忆
        context["recent_interactions"] = self.short_memory.get_recent(5)

        # 长期记忆 (语义检索)
        context["relevant_memories"] = self.long_memory.retrieve(
            query,
            k=3
        )

        # 额外上下文
        if additional:
            context["additional"] = additional

        return context

    def _update_memory(self, request: str, result: Dict):
        """更新记忆系统"""

        interaction = {
            "request": request,
            "result": result,
            "timestamp": datetime.now()
        }

        # 短期记忆
        self.short_memory.add(interaction)

        # 长期记忆 (巩固)
        if self.short_memory.should_consolidate():
            consolidated = self.short_memory.consolidate()
            self.long_memory.add(consolidated)

步骤 4: 使用示例

main.py:

#!/usr/bin/env python3
"""
Universal Agent - 主入口

用法:
    python main.py "Create a todo list web app"
"""

import sys
from src.orchestrator.main import MainOrchestrator

def main():
    if len(sys.argv) < 2:
        print("Usage: python main.py '<your request>'")
        sys.exit(1)

    user_request = sys.argv[1]

    # 创建编排器
    orchestrator = MainOrchestrator("config/agents.yaml")

    # 执行请求
    print(f"🚀 Processing: {user_request}\n")

    result = orchestrator.execute(user_request)

    # 输出结果
    if result["success"]:
        print("✅ Success!")
        print(f"\nResult:\n{result['result']}")
        print(f"\nMetrics:\n{result['metrics']}")
    else:
        print(f"❌ Error: {result['error']}")

if __name__ == "__main__":
    main()

运行:

# 示例 1: 创建 Web 应用
python main.py "Create a todo list web application with React and FastAPI"

# 示例 2: 代码审查
python main.py "Review the authentication module for security issues"

# 示例 3: 添加功能
python main.py "Add dark mode to the existing UI"

高级模式

模式 1: 3 Amigo Pattern

Preparing diagram...

实现:

class ThreeAmigoPattern:
    """3 Amigo Agent 模式"""

    def __init__(self):
        self.pm_agent = SpecialistAgent(PM_SPEC)
        self.ux_agent = SpecialistAgent(UX_SPEC)
        self.dev_agent = SpecialistAgent(DEV_SPEC)

    def execute(self, user_request: str) -> Dict:
        """执行 3 Amigo 模式"""

        # 1. PM: 创建产品规格
        spec = self.pm_agent.execute(
            task=f"Create product spec for: {user_request}",
            context={},
            tools=[read_file, write_file]
        )

        # 2. UX: 设计界面
        design = self.ux_agent.execute(
            task=f"Design UI based on spec",
            context={"spec": spec},
            tools=[read_file, write_file, create_mockup]
        )

        # 3. Dev: 实现代码
        implementation = self.dev_agent.execute(
            task=f"Implement the design",
            context={"spec": spec, "design": design},
            tools=[read_file, write_file, bash]
        )

        # 4. UX: 验证设计
        validation = self.ux_agent.execute(
            task="Verify implementation matches design",
            context={"design": design, "impl": implementation},
            tools=[read_file, screenshot]
        )

        return {
            "spec": spec,
            "design": design,
            "implementation": implementation,
            "validation": validation
        }

模式 2: Spec Workflow System

class SpecWorkflowOrchestrator:
    """
    规格工作流编排器

    完整的软件开发生命周期
    """

    def __init__(self):
        self.agents = {
            "analyst": SpecialistAgent(SPEC_ANALYST),
            "architect": SpecialistAgent(ARCHITECT),
            "planner": SpecialistAgent(PLANNER),
            "developer": SpecialistAgent(DEVELOPER),
            "tester": SpecialistAgent(TESTER),
            "reviewer": SpecialistAgent(REVIEWER)
        }
        self.quality_gates = QualityGateSystem()

    def execute(self, project_idea: str) -> Dict:
        """执行完整工作流"""

        # === 规划阶段 ===
        print("📋 Planning Phase...")

        # 1. 需求分析
        requirements = self.agents["analyst"].execute(
            task=f"Analyze requirements for: {project_idea}"
        )

        # 2. 架构设计
        architecture = self.agents["architect"].execute(
            task="Design system architecture",
            context={"requirements": requirements}
        )

        # 3. 任务规划
        tasks = self.agents["planner"].execute(
            task="Break down into development tasks",
            context={
                "requirements": requirements,
                "architecture": architecture
            }
        )

        # Quality Gate 1
        if not self.quality_gates.check("planning", {
            "requirements": requirements,
            "architecture": architecture,
            "tasks": tasks
        }):
            return {"error": "Failed Quality Gate 1"}

        # === 开发阶段 ===
        print("💻 Development Phase...")

        # 4. 实现所有任务
        implementations = []
        for task in tasks["task_list"]:
            impl = self.agents["developer"].execute(
                task=task["description"],
                context={"architecture": architecture}
            )
            implementations.append(impl)

        # 5. 编写测试
        tests = self.agents["tester"].execute(
            task="Write comprehensive tests",
            context={"implementations": implementations}
        )

        # Quality Gate 2
        if not self.quality_gates.check("development", {
            "implementations": implementations,
            "tests": tests
        }):
            return {"error": "Failed Quality Gate 2"}

        # === 验证阶段 ===
        print("✅ Validation Phase...")

        # 6. 代码审查
        review = self.agents["reviewer"].execute(
            task="Review all code for quality and security",
            context={"code": implementations, "tests": tests}
        )

        # Quality Gate 3
        if not self.quality_gates.check("validation", {
            "review": review
        }):
            return {"error": "Failed Quality Gate 3"}

        # === 完成 ===
        print("🎉 Project Complete!")

        return {
            "success": True,
            "artifacts": {
                "requirements": requirements,
                "architecture": architecture,
                "tasks": tasks,
                "code": implementations,
                "tests": tests,
                "review": review
            },
            "metrics": self._collect_metrics()
        }

模式 3: TDD with Agents

class TDDWorkflow:
    """测试驱动开发工作流"""

    def __init__(self):
        self.tester = SpecialistAgent(TESTER)
        self.developer = SpecialistAgent(DEVELOPER)

    def implement_feature(
        self,
        feature_spec: str
    ) -> Dict:
        """TDD 方式实现功能"""

        # 步骤 1: 编写失败的测试
        print("🧪 Writing tests...")
        tests = self.tester.execute(
            task=f"""
            Write tests for this feature (TDD style):
            {feature_spec}

            IMPORTANT:
            - Do NOT create mock implementations
            - Tests should FAIL initially
            - Use pytest
            """,
            context={},
            tools=[write_file, bash]
        )

        # 步骤 2: 运行测试,确认失败
        print("❌ Confirming tests fail...")
        test_result = self._run_tests()

        if test_result["passed"]:
            return {"error": "Tests should fail initially!"}

        # 步骤 3: 提交测试
        self._commit("Add failing tests for feature")

        # 步骤 4: 实现功能
        print("💻 Implementing feature...")
        implementation = self.developer.execute(
            task=f"""
            Implement the feature to make tests pass:
            {feature_spec}

            IMPORTANT:
            - Do NOT modify the tests
            - Make tests pass with minimal code
            - Follow TDD principles
            """,
            context={"tests": tests},
            tools=[write_file, read_file, bash]
        )

        # 步骤 5: 运行测试,确认通过
        print("✅ Verifying tests pass...")
        test_result = self._run_tests()

        if not test_result["passed"]:
            # 迭代修复
            return self._iterate_until_pass(implementation)

        # 步骤 6: 提交代码
        self._commit("Implement feature (tests passing)")

        return {
            "success": True,
            "tests": tests,
            "implementation": implementation,
            "test_results": test_result
        }

生产部署

部署检查清单

# deployment-checklist.yaml

security:
  - [ ] 实现权限系统
  - [ ] 添加输入验证
  - [ ] 设置速率限制
  - [ ] 启用审计日志
  - [ ] 隔离危险操作
  - [ ] 使用容器沙箱

reliability:
  - [ ] 添加错误恢复机制
  - [ ] 实现超时处理
  - [ ] 设置重试逻辑
  - [ ] 添加健康检查
  - [ ] 配置优雅关闭

observability:
  - [ ] 结构化日志
  - [ ] 指标收集
  - [ ] 分布式追踪
  - [ ] 告警规则
  - [ ] 性能监控

cost_optimization:
  - [ ] 使用 Haiku 4.5 处理简单任务
  - [ ] 实现响应缓存
  - [ ] 优化 token 使用
  - [ ] 批处理请求
  - [ ] 监控成本指标

scalability:
  - [ ] 水平扩展设计
  - [ ] 负载均衡
  - [ ] 异步任务队列
  - [ ] 数据库分片
  - [ ] CDN 集成

Docker 部署

Dockerfile:

FROM python:3.11-slim

WORKDIR /app

# 安全: 创建非 root 用户
RUN useradd -m -u 1000 agent && chown -R agent:agent /app
USER agent

# 安装依赖
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# 复制代码
COPY --chown=agent:agent . .

# 环境变量
ENV PYTHONUNBUFFERED=1
ENV LOG_LEVEL=INFO

# 健康检查
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s \
  CMD python -c "import requests; requests.get('http://localhost:8000/health')"

# 启动
CMD ["python", "-m", "uvicorn", "src.api:app", "--host", "0.0.0.0", "--port", "8000"]

docker-compose.yml:

version: '3.8'

services:
  agent:
    build: .
    ports:
      - '8000:8000'
    environment:
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - LOG_LEVEL=INFO
    volumes:
      - ./logs:/app/logs
      - ./data:/app/data
    restart: unless-stopped
    networks:
      - agent-network

    # 资源限制
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 4G
        reservations:
          cpus: '1'
          memory: 2G

  vector-db:
    image: chromadb/chroma:latest
    ports:
      - '8001:8000'
    volumes:
      - chroma-data:/chroma/chroma
    networks:
      - agent-network

  monitoring:
    image: prom/prometheus:latest
    ports:
      - '9090:9090'
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    networks:
      - agent-network

volumes:
  chroma-data:
  prometheus-data:

networks:
  agent-network:
    driver: bridge

评估与优化

评估框架

from typing import Dict, List
import numpy as np

class AgentEvaluator:
    """Agent 评估系统"""

    def __init__(self):
        self.metrics = {}

    def evaluate(
        self,
        agent_name: str,
        test_cases: List[Dict]
    ) -> Dict:
        """
        评估 Agent 性能

        指标:
        - 准确率 (Accuracy)
        - 成功率 (Success Rate)
        - 平均成本 (Avg Cost)
        - 平均延迟 (Avg Latency)
        - Token 效率 (Token Efficiency)
        """
        results = []

        for case in test_cases:
            result = self._run_test_case(agent_name, case)
            results.append(result)

        # 计算指标
        metrics = {
            "accuracy": self._calculate_accuracy(results),
            "success_rate": self._calculate_success_rate(results),
            "avg_cost": np.mean([r["cost"] for r in results]),
            "avg_latency": np.mean([r["latency"] for r in results]),
            "token_efficiency": self._calculate_token_efficiency(results),
            "test_count": len(test_cases)
        }

        self.metrics[agent_name] = metrics
        return metrics

    def compare_agents(
        self,
        agent_names: List[str]
    ) -> Dict:
        """比较多个 Agent"""
        comparison = {}

        for metric in ["accuracy", "success_rate", "avg_cost"]:
            comparison[metric] = {
                name: self.metrics[name][metric]
                for name in agent_names
            }

        return comparison

    def generate_report(self) -> str:
        """生成评估报告"""
        report = "# Agent Evaluation Report\n\n"

        for agent_name, metrics in self.metrics.items():
            report += f"## {agent_name}\n\n"
            report += f"- Accuracy: {metrics['accuracy']:.2%}\n"
            report += f"- Success Rate: {metrics['success_rate']:.2%}\n"
            report += f"- Avg Cost: ${metrics['avg_cost']:.4f}\n"
            report += f"- Avg Latency: {metrics['avg_latency']:.2f}s\n"
            report += f"- Token Efficiency: {metrics['token_efficiency']:.2f}\n"
            report += f"\n"

        return report

总结与最佳实践

核心原则总结

1. 从简单开始
   └─ while(tool_use) 循环足以应对大多数场景

2. 专业化胜过通用化
   └─ 多个专门的小 Agent > 一个大而全的 Agent

3. 上下文工程是关键
   └─ Token 预算管理 + 记忆系统 + 认知工具

4. 人机协作优先
   └─ 始终保持人类在循环中

5. 可观测性必不可少
   └─ 日志 + 指标 + 追踪

实现路线图

Preparing diagram...

下一步行动

立即开始 ✅
- 克隆项目模板
- 配置 API 密钥
- 运行第一个示例
第一周目标
- 实现基础 Agent
- 添加 2-3 个工具
- 完成简单任务
第二周目标
- 添加 Sub-agents
- 实现认知工具
- 集成记忆系统
第三周目标
- 添加权限系统
- 实现监控
- 编写测试
第四周目标
- 容器化部署
- 性能优化
- 文档完善

参考资源

论文与研究

开源项目

官方文档

社区资源

文档版本: v1.0
最后更新: 2025-11-15
作者: Based on Context Engineering, Claude Code, and Enterprise Best Practices

附录: 完整代码仓库

完整的可运行代码已上传至: [您的 GitHub 仓库]

git clone https://github.com/your-username/universal-agent
cd universal-agent
pip install -r requirements.txt
python main.py "Your first request"

祝你构建出色的 Agent 系统! 🚀