Phase 2: Agent Crew & MAGRPO

Timeline: Weeks 5-8 Status: Planned Prerequisites: Phase 1 complete with users on platform

Overview

Phase 2 introduces the specialized AI agents and multi-agent coordination. We evolve from a single orchestrator to a collaborative crew of agents powered by MAGRPO (Multi-Agent Group Relative Policy Optimization).

Goal: Improve content quality and discovery by 15-25% through specialized agent collaboration.

Deliverables

Week 5: Discovery Agent

Mission: Find exceptional wedding vendors on Instagram before they're popular

Capabilities: 1. Instagram API integration via curator accounts 2. Semantic search across Instagram content 3. Quality evaluation with vision models 4. Emerging vendor detection

Implementation:

class DiscoveryAgent {
  private curatorAccounts = [
    { username: "vows_au_modern", region: "AU", style: "modern" },
    { username: "vows_au_boho", region: "AU", style: "bohemian" }
  ];

  async discoverVendors(region: string, style: string): Promise<Vendor[]> {
    // 1. Use curator account to get Instagram suggestions
    const curator = this.selectCurator(region, style);
    const suggestions = await this.getInstagramSuggestions(curator);

    // 2. Filter by quality using Gemini
    const quality = await this.evaluateQuality(suggestions);

    // 3. Return high-quality vendors
    return suggestions.filter(s => quality[s.id] > 0.75);
  }
}

Instagram Strategy: - Create 3-5 curator accounts per region - Follow high-quality vendors in target niches - Let Instagram's algorithm surface similar accounts - Quality filter with our models

Tasks: 1. Set up Instagram API access and curator accounts 2. Implement content discovery pipeline 3. Integrate Gemini for quality scoring 4. Build vendor database and tracking 5. Deploy as Cloudflare Worker

Success Criteria: - Discover 50+ new quality vendors per week - 80%+ quality score for discovered content - No Instagram rate limit violations

Week 6: Quality Guardian

Mission: Ensure only exceptional vendors surface to users

Quality Dimensions: 1. Visual Quality - Professional photography, composition 2. Authenticity - Real weddings vs styled shoots 3. Professionalism - Vendor communication and reliability 4. Consistency - Portfolio quality across posts

Implementation:

class QualityGuardian {
  async evaluateVendor(vendor: Vendor): Promise<QualityScore> {
    // Multi-modal analysis with GPT-4V/Gemini
    const visual = await this.analyzePortfolio(vendor.posts);
    const authenticity = await this.detectAuthenticity(vendor);
    const consistency = this.measureConsistency(vendor.posts);

    return {
      overall: visual * 0.5 + authenticity * 0.3 + consistency * 0.2,
      visual,
      authenticity,
      consistency,
      reasoning: "..." // LLM explanation
    };
  }
}

Tiered Quality Evaluation: - Tier 1: Fast Gemini filter (99% of content) - Tier 2: GPT-4V for edge cases (1% of content) - Tier 3: Human review for top 0.1%

Tasks: 1. Implement multi-modal quality scoring 2. Build authenticity detection 3. Create consistency measurement 4. Set up tiered evaluation pipeline 5. Add quality threshold enforcement

Success Criteria: - Average quality score > 0.85 - False positive rate < 5% - Evaluation cost < $0.01 per vendor

Week 7: Personal Archivist & MAGRPO

Personal Archivist Mission: Foster personal connection through memory and timing

Capabilities: 1. User planning phase inference 2. Style preference evolution tracking 3. Milestone-based recommendations 4. Time-sensitive vendor suggestions

Implementation:

class PersonalArchivist {
  async identifyRelevantMoment(
    user: User,
    content: Content
  ): Promise<number> {
    // Infer planning phase from interactions
    const phase = await this.inferPlanningPhase(user);

    // Match content to phase (venue → photographer → decor)
    const phaseRelevance = this.matchPhase(content, phase);

    // Time-sensitive scoring (6 months before for photographer)
    const timingScore = this.scoreTiming(content, user.weddingDate);

    // Style evolution tracking
    const styleMatch = this.computeStyleMatch(
      content.embedding,
      user.styleHistory
    );

    return phaseRelevance * 0.4 + timingScore * 0.3 + styleMatch * 0.3;
  }
}

MAGRPO Integration:

Multi-Agent Group Relative Policy Optimization coordinates all agents:

// 1. Group Sampling: Each agent proposes candidates
const groups = await Promise.all(
  agents.map(agent => agent.propose(context))
);

// 2. Orchestrator selects via Thompson Sampling
const selectedFeed = orchestrator.thompsonSample(groups);

// 3. User interacts → reward signal
const reward = computeReward(userInteraction);

// 4. Group-relative advantage
const advantage = reward - mean(groupRewards);

// 5. All agents update simultaneously
await Promise.all(
  agents.map(agent => agent.updatePolicy(advantage))
);

Benefits: - No manual per-agent reward design - More stable than traditional MARL - 50% less memory than PPO - Optimizes for collaborative success

Tasks: 1. Implement Personal Archivist agent 2. Build MAGRPO coordination framework 3. Create group sampling mechanism 4. Add simultaneous policy updates 5. Integrate multi-objective rewards

Success Criteria: - Agents collaborate effectively (no conflicts) - Quality improves 15% over single agent - Discovery rate increases 25% - System remains stable over time

Week 8: Integration & Testing

Focus: Validate multi-agent system with real users

Deploy all agents to production
A/B test: Multi-agent vs single agent
Monitor collaboration dynamics
Tune agent weights and policies
Collect performance metrics

Testing Scenarios: - Cold start users (no history) - Active users (100+ interactions) - Different planning phases (early vs late) - Various style preferences

Monitoring Dashboard: - Individual agent performance - Collaboration effectiveness - Quality scores over time - Discovery metrics - User satisfaction (NPS)

Multi-Objective Reward Function

The MAGRPO system optimizes:

R_total = w₁·R_engagement + w₂·R_quality + w₃·R_discovery + w₄·R_diversity

Components:

R_engagement - Clicks, saves, shares, time on content
R_quality - Quality Guardian score, user feedback
R_discovery - Novel vendor exposures, cross-regional finds
R_diversity - Category distribution, style variety

Weight Learning: - Start with equal weights (0.25 each) - Use Multi-Objective Utility-UCB to learn per-user weights - Adapt weights based on user behavior

Success Metrics

Agent Performance: - Discovery Agent: 50+ vendors/week, 80% quality - Quality Guardian: 85% accuracy, <$0.01/evaluation - Personal Archivist: 20% better timing relevance - MAGRPO: Stable convergence in 1000 steps

System Performance: - 15% quality improvement vs Phase 1 - 25% increase in vendor discovery - Session duration > 5 minutes - 7-day retention > 50%

Technical: - Latency stays < 500ms (p99) - Agent collaboration overhead < 100ms - Memory usage < 256MB per agent - Costs < $0.05/user/month

Cost Analysis

New Services: - Workers AI (GPT-4V): ~$100/month (quality eval) - Workers AI (Gemini): ~$50/month (fast inference) - Additional compute: ~$25/month

Total Phase 2 Cost: ~$200-250/month at 1K users Per User Cost: ~$0.20-0.25/month

Optimization Strategies: 1. Aggressive caching (quality scores for 30 days) 2. Batch processing (10 evaluations at once) 3. Tiered evaluation (cheap filter → expensive only when needed)

Migration to Phase 3

Once Phase 2 succeeds: 1. Add Serendipity Engine for diversity 2. Add Engagement Forecaster for notifications 3. Enhance multi-objective optimization 4. Scale to 10K users

Scaling Path: - 1K users: Current infrastructure OK - 5K users: Upgrade Qdrant ($99/month) - 10K users: Add Workers AI Pro ($500/month)

Resources

Reference Docs: - MAGRPO Research Paper - Discovery Agent Component - Quality Guardian Component - Personal Archivist Component

Development Guides: - Building Agents - Testing Multi-Agent Systems - Deployment Guide

Next: Phase 3: Optimization