Skip to content

Multi-Agent Collaboration

Agent Hierarchy

graph TB
    User[User Request] --> Orchestrator

    subgraph "Orchestrator (Lead Agent)"
        Orchestrator[Orchestrator]
        Thompson[Thompson Sampling<br/>✓ KEPT - Core Algorithm]
        RLlib[Ray RLlib<br/>Multi-Agent PPO]
    end

    subgraph "Specialized Agents"
        Discovery[Discovery Agent]
        Quality[Quality Guardian]
        Archivist[Personal Archivist]
        Serendipity[Serendipity Engine]
        Forecaster[Engagement Forecaster]
    end

    subgraph "Observability"
        LangSmith[LangSmith<br/>Agent Tracing]
    end

    Orchestrator --> Thompson
    Orchestrator --> RLlib

    RLlib --> Discovery
    RLlib --> Quality
    RLlib --> Archivist
    RLlib --> Serendipity
    RLlib --> Forecaster

    LangSmith -.monitors.-> Discovery
    LangSmith -.monitors.-> Quality
    LangSmith -.monitors.-> Archivist
    LangSmith -.monitors.-> Serendipity
    LangSmith -.monitors.-> Forecaster

    Discovery --> Content[Content Pool]
    Quality --> Scores[Quality Scores]
    Archivist --> Memory[User Memory]
    Serendipity --> Diversity[Diversity Metrics]
    Forecaster --> Notifications[Push Notifications]

    Content --> Orchestrator
    Scores --> Orchestrator
    Memory --> Orchestrator
    Diversity --> Orchestrator
    Notifications --> User

Multi-Agent PPO Training Flow

sequenceDiagram
    participant User
    participant Orchestrator
    participant Agents as Specialized Agents
    participant RLlib as Ray RLlib (Multi-Agent PPO)
    participant LangSmith

    Orchestrator->>Agents: Delegate subtasks
    LangSmith->>Agents: Trace agent decisions
    Agents->>Orchestrator: Return results
    Orchestrator->>User: Present feed

    User->>Orchestrator: Interaction (reward signal)

    Orchestrator->>RLlib: Report episode data
    RLlib->>RLlib: PPO policy updates
    RLlib->>Agents: Updated policy networks

    LangSmith->>LangSmith: Log training metrics

    loop Scheduled Training (2 AM daily)
        RLlib->>RLlib: Batch policy optimization
        RLlib->>LangSmith: Training metrics
    end

Observability Architecture

graph LR
    subgraph "Modal Platform"
        Agents[Multi-Agent System]
        RLlib[Ray RLlib Training]
    end

    subgraph "LangSmith (5K traces/month free)"
        Traces[Agent Traces]
        Metrics[Training Metrics]
        Dashboard[Dashboard UI]
    end

    subgraph "console.vows.social"
        AdminDash[Admin Dashboard]
        RLMetrics[RL Performance]
    end

    Agents --> Traces
    RLlib --> Metrics
    Traces --> Dashboard
    Metrics --> Dashboard
    Dashboard --> AdminDash
    Metrics --> RLMetrics