Phase 2: Agent Crew & MAGRPO
Timeline: Weeks 5-8 Status: Planned Prerequisites: Phase 1 complete with users on platform
Overview
Phase 2 introduces the specialized AI agents and multi-agent coordination. We evolve from a single orchestrator to a collaborative crew of agents powered by MAGRPO (Multi-Agent Group Relative Policy Optimization).
Goal: Improve content quality and discovery by 15-25% through specialized agent collaboration.
Deliverables
Week 5: Discovery Agent
Mission: Find exceptional wedding vendors on Instagram before they're popular
Capabilities: 1. Instagram API integration via curator accounts 2. Semantic search across Instagram content 3. Quality evaluation with vision models 4. Emerging vendor detection
Implementation:
class DiscoveryAgent {
private curatorAccounts = [
{ username: "vows_au_modern", region: "AU", style: "modern" },
{ username: "vows_au_boho", region: "AU", style: "bohemian" }
];
async discoverVendors(region: string, style: string): Promise<Vendor[]> {
// 1. Use curator account to get Instagram suggestions
const curator = this.selectCurator(region, style);
const suggestions = await this.getInstagramSuggestions(curator);
// 2. Filter by quality using Gemini
const quality = await this.evaluateQuality(suggestions);
// 3. Return high-quality vendors
return suggestions.filter(s => quality[s.id] > 0.75);
}
}
Instagram Strategy: - Create 3-5 curator accounts per region - Follow high-quality vendors in target niches - Let Instagram's algorithm surface similar accounts - Quality filter with our models
Tasks: 1. Set up Instagram API access and curator accounts 2. Implement content discovery pipeline 3. Integrate Gemini for quality scoring 4. Build vendor database and tracking 5. Deploy as Cloudflare Worker
Success Criteria: - Discover 50+ new quality vendors per week - 80%+ quality score for discovered content - No Instagram rate limit violations
Week 6: Quality Guardian
Mission: Ensure only exceptional vendors surface to users
Quality Dimensions: 1. Visual Quality - Professional photography, composition 2. Authenticity - Real weddings vs styled shoots 3. Professionalism - Vendor communication and reliability 4. Consistency - Portfolio quality across posts
Implementation:
class QualityGuardian {
async evaluateVendor(vendor: Vendor): Promise<QualityScore> {
// Multi-modal analysis with GPT-4V/Gemini
const visual = await this.analyzePortfolio(vendor.posts);
const authenticity = await this.detectAuthenticity(vendor);
const consistency = this.measureConsistency(vendor.posts);
return {
overall: visual * 0.5 + authenticity * 0.3 + consistency * 0.2,
visual,
authenticity,
consistency,
reasoning: "..." // LLM explanation
};
}
}
Tiered Quality Evaluation: - Tier 1: Fast Gemini filter (99% of content) - Tier 2: GPT-4V for edge cases (1% of content) - Tier 3: Human review for top 0.1%
Tasks: 1. Implement multi-modal quality scoring 2. Build authenticity detection 3. Create consistency measurement 4. Set up tiered evaluation pipeline 5. Add quality threshold enforcement
Success Criteria: - Average quality score > 0.85 - False positive rate < 5% - Evaluation cost < $0.01 per vendor
Week 7: Personal Archivist & MAGRPO
Personal Archivist Mission: Foster personal connection through memory and timing
Capabilities: 1. User planning phase inference 2. Style preference evolution tracking 3. Milestone-based recommendations 4. Time-sensitive vendor suggestions
Implementation:
class PersonalArchivist {
async identifyRelevantMoment(
user: User,
content: Content
): Promise<number> {
// Infer planning phase from interactions
const phase = await this.inferPlanningPhase(user);
// Match content to phase (venue → photographer → decor)
const phaseRelevance = this.matchPhase(content, phase);
// Time-sensitive scoring (6 months before for photographer)
const timingScore = this.scoreTiming(content, user.weddingDate);
// Style evolution tracking
const styleMatch = this.computeStyleMatch(
content.embedding,
user.styleHistory
);
return phaseRelevance * 0.4 + timingScore * 0.3 + styleMatch * 0.3;
}
}
MAGRPO Integration:
Multi-Agent Group Relative Policy Optimization coordinates all agents:
// 1. Group Sampling: Each agent proposes candidates
const groups = await Promise.all(
agents.map(agent => agent.propose(context))
);
// 2. Orchestrator selects via Thompson Sampling
const selectedFeed = orchestrator.thompsonSample(groups);
// 3. User interacts → reward signal
const reward = computeReward(userInteraction);
// 4. Group-relative advantage
const advantage = reward - mean(groupRewards);
// 5. All agents update simultaneously
await Promise.all(
agents.map(agent => agent.updatePolicy(advantage))
);
Benefits: - No manual per-agent reward design - More stable than traditional MARL - 50% less memory than PPO - Optimizes for collaborative success
Tasks: 1. Implement Personal Archivist agent 2. Build MAGRPO coordination framework 3. Create group sampling mechanism 4. Add simultaneous policy updates 5. Integrate multi-objective rewards
Success Criteria: - Agents collaborate effectively (no conflicts) - Quality improves 15% over single agent - Discovery rate increases 25% - System remains stable over time
Week 8: Integration & Testing
Focus: Validate multi-agent system with real users
- Deploy all agents to production
- A/B test: Multi-agent vs single agent
- Monitor collaboration dynamics
- Tune agent weights and policies
- Collect performance metrics
Testing Scenarios: - Cold start users (no history) - Active users (100+ interactions) - Different planning phases (early vs late) - Various style preferences
Monitoring Dashboard: - Individual agent performance - Collaboration effectiveness - Quality scores over time - Discovery metrics - User satisfaction (NPS)
Multi-Objective Reward Function
The MAGRPO system optimizes:
Components:
- R_engagement - Clicks, saves, shares, time on content
- R_quality - Quality Guardian score, user feedback
- R_discovery - Novel vendor exposures, cross-regional finds
- R_diversity - Category distribution, style variety
Weight Learning: - Start with equal weights (0.25 each) - Use Multi-Objective Utility-UCB to learn per-user weights - Adapt weights based on user behavior
Success Metrics
Agent Performance: - Discovery Agent: 50+ vendors/week, 80% quality - Quality Guardian: 85% accuracy, <$0.01/evaluation - Personal Archivist: 20% better timing relevance - MAGRPO: Stable convergence in 1000 steps
System Performance: - 15% quality improvement vs Phase 1 - 25% increase in vendor discovery - Session duration > 5 minutes - 7-day retention > 50%
Technical: - Latency stays < 500ms (p99) - Agent collaboration overhead < 100ms - Memory usage < 256MB per agent - Costs < $0.05/user/month
Cost Analysis
New Services: - Workers AI (GPT-4V): ~$100/month (quality eval) - Workers AI (Gemini): ~$50/month (fast inference) - Additional compute: ~$25/month
Total Phase 2 Cost: ~$200-250/month at 1K users Per User Cost: ~$0.20-0.25/month
Optimization Strategies: 1. Aggressive caching (quality scores for 30 days) 2. Batch processing (10 evaluations at once) 3. Tiered evaluation (cheap filter → expensive only when needed)
Migration to Phase 3
Once Phase 2 succeeds: 1. Add Serendipity Engine for diversity 2. Add Engagement Forecaster for notifications 3. Enhance multi-objective optimization 4. Scale to 10K users
Scaling Path: - 1K users: Current infrastructure OK - 5K users: Upgrade Qdrant ($99/month) - 10K users: Add Workers AI Pro ($500/month)
Resources
Reference Docs: - MAGRPO Research Paper - Discovery Agent Component - Quality Guardian Component - Personal Archivist Component
Development Guides: - Building Agents - Testing Multi-Agent Systems - Deployment Guide
Next: Phase 3: Optimization