Phase 3: Optimization & Intelligence
Timeline: Weeks 9-12 Status: Planned Prerequisites: Phase 2 agent crew operational
Overview
Phase 3 adds advanced intelligence features: serendipitous discovery, notification timing, and sophisticated multi-objective optimization. The system becomes truly "sentient" - predicting needs, preventing filter bubbles, and delighting users with perfect timing.
Goal: Achieve 80%+ feed diversity, 23%+ notification CTR lift, and NPS > 50.
Deliverables
Week 9: Serendipity Engine
Mission: Prevent filter bubbles and enable delightful discoveries
Strategies: 1. Cross-Regional Discovery - Melbourne couple sees stunning Sydney venue 2. Style Exploration - Modern couple discovers tasteful bohemian elements 3. Emerging Vendor Boost - Surface exceptional new vendors 4. Collaborative Filtering - "Users like you also loved..."
Implementation:
class SerendipityEngine {
async generateSerendipitousCandidates(
user: User,
recentFeed: Content[]
): Promise<Content[]> {
// Measure current diversity
const diversity = this.measureDiversity(recentFeed);
if (diversity < DIVERSITY_THRESHOLD) {
// Explore embedding space for novel but relevant content
const novelCandidates = await this.exploreEmbeddingSpace(
user.embedding,
recentFeed.map(c => c.embedding),
explorationRadius: 0.7 // Semantic distance
);
// Quality filter (only show good surprises)
return novelCandidates.filter(c => c.qualityScore > 0.75);
}
return [];
}
private measureDiversity(feed: Content[]): number {
// Category diversity
const categories = new Set(feed.map(c => c.category));
const categoryDiversity = categories.size / TOTAL_CATEGORIES;
// Style diversity (embedding space coverage)
const styleDiversity = this.computeEmbeddingCoverage(feed);
// Regional diversity
const regions = new Set(feed.map(c => c.region));
const regionalDiversity = regions.size / TOTAL_REGIONS;
return (categoryDiversity + styleDiversity + regionalDiversity) / 3;
}
}
Diversity Metrics: - Category Coverage: % of vendor types shown (photographers, venues, florists, etc.) - Embedding Diversity: Coverage of style embedding space - Regional Balance: Distribution across locations - Novelty Rate: % of content user hasn't seen similar to
Tasks: 1. Implement diversity measurement 2. Build embedding space exploration 3. Create novelty detection 4. Add collaborative filtering signals 5. Integrate with Orchestrator
Success Criteria: - Diversity score > 0.8 - Novel discoveries > 20% of feed - User delight rating > 8/10 - No engagement drop from diversity injection
Week 10: Engagement Forecaster
Mission: Intelligent notification timing for maximum welcome probability
Key Innovation: Fully integrated into MARL system (not a separate service)
Context Signals (200+):
interface NotificationContext {
// Temporal
timeOfDay: number;
dayOfWeek: number;
daysUntilWedding: number;
planningPhase: string;
// Behavioral
avgSessionDuration: number;
lastActive: Date;
notificationOpenRate: number;
preferredContentTypes: string[];
engagementPatterns: TimeSeries;
// Device/Environment
deviceType: string;
osVersion: string;
batteryLevel: number;
networkQuality: string;
timezone: string;
// Content
contentType: string;
qualityScore: number;
personalizationScore: number;
urgency: number;
vendorAvailability: boolean;
// Social
friendActivity: number;
trendingInRegion: boolean;
vendorResponsePending: boolean;
}
Prediction Model:
class EngagementForecaster {
async shouldSendNotification(
user: User,
content: Content,
context: NotificationContext
): Promise<{ send: boolean; timing: Date }> {
// Predict P(welcome | context)
const welcomeProb = await this.predictWelcome(
user,
content,
context
);
if (welcomeProb < WELCOME_THRESHOLD) {
return { send: false, timing: null };
}
// Optimize send time within next 24 hours
const optimalTime = await this.optimizeSendTime(user, context);
return {
send: true,
timing: optimalTime
};
}
private async predictWelcome(
user: User,
content: Content,
context: NotificationContext
): Promise<number> {
// Neural network trained on historical notification data
const features = this.extractFeatures(user, content, context);
return this.model.predict(features);
}
}
MARL Integration:
The Engagement Forecaster participates in MAGRPO: - Proposes: Notification timing candidates - Receives Reward: Based on user response (open, engage, opt-out) - Learns: When to send, when to hold back
Expected Lift: 23%+ CTR improvement (based on Meta research)
Tasks: 1. Build notification context extraction 2. Train welcome probability model 3. Implement send-time optimization 4. Integrate with MAGRPO framework 5. A/B test against naive timing
Success Criteria: - CTR > 23% lift vs baseline - Opt-out rate < 5% - Response time < 100ms - Maintains user trust (NPS stable or improving)
Week 11: Multi-Objective Refinement
Focus: Advanced reward engineering and weight learning
Multi-Objective Utility-UCB (MOU-UCB):
Learns optimal reward weights per user from behavior:
class MultiObjectiveOptimizer {
private weights: Map<string, number[]> = new Map();
async computeReward(
userId: string,
engagement: number,
quality: number,
discovery: number,
diversity: number,
satisfaction: number
): Promise<number> {
// Get or initialize user weights
const w = this.weights.get(userId) || [0.2, 0.2, 0.2, 0.2, 0.2];
// Compute weighted reward
const reward =
w[0] * engagement +
w[1] * quality +
w[2] * discovery +
w[3] * diversity +
w[4] * satisfaction;
// Update weights using MOU-UCB
await this.updateWeights(userId, [
engagement, quality, discovery, diversity, satisfaction
]);
return reward;
}
private async updateWeights(
userId: string,
outcomes: number[]
): Promise<void> {
// Multi-armed bandit over weight combinations
// Explores different weight combinations
// Exploits best-performing weights for this user
}
}
Cannibalization-Aware Optimization:
Prevent one feed component from cannibalizing others:
async optimizeComponent(
component: FeedComponent,
otherComponents: FeedComponent[]
): Promise<number> {
// Base reward for this component
const baseReward = component.engagement;
// Penalty if user didn't engage with other important components
const cannibalizationPenalty = otherComponents
.filter(c => c.priority === 'high')
.map(c => c.engagement === 0 ? -0.2 : 0)
.reduce((sum, p) => sum + p, 0);
return baseReward + cannibalizationPenalty;
}
Non-Stationarity Handling:
Detect and adapt to changing user preferences:
async detectConceptDrift(user: User): Promise<boolean> {
const recentPrefs = user.interactions.slice(-100);
const oldPrefs = user.interactions.slice(-500, -100);
const recentEmbed = await this.foundationModel.embed(recentPrefs);
const oldEmbed = await this.foundationModel.embed(oldPrefs);
const distance = cosineSimilarity(recentEmbed, oldEmbed);
return distance < 0.7; // Significant drift detected
}
// Adapt to drift
if (await this.detectConceptDrift(user)) {
this.increaseLearningRate(user);
this.increaseExploration(user);
}
Tasks: 1. Implement MOU-UCB weight learning 2. Add cannibalization detection 3. Build non-stationarity handling 4. Tune reward functions 5. A/B test optimization strategies
Week 12: LLM-Guided Credit Assignment (Optional)
When to Use: If MAGRPO's group-relative advantage is too noisy
Implementation:
async assignCredit(
agents: Agent[],
actions: Action[],
outcome: Reward,
context: Context
): Promise<Map<Agent, number>> {
const prompt = `
You are evaluating a collaborative content curation task.
Goal: Maximize user satisfaction in wedding vendor discovery
Agents and their actions:
${agents.map((a, i) => `- ${a.name}: ${actions[i].description}`).join('\n')}
User context: ${context.summary}
Outcome: ${outcome.value} (${outcome.description})
For each agent, provide a score from -1.0 to 1.0:
- 1.0: Exceptional positive contribution
- 0.0: Neutral contribution
- -1.0: Harmful contribution
Provide scores in JSON with brief reasoning.
`;
const llmResponse = await callLLM(prompt, model: "gpt-4");
return parseAgentScores(llmResponse);
}
Benefits: - More nuanced credit assignment than group-relative - Handles complex multi-agent scenarios - Provides interpretable reasoning
Drawbacks: - Adds LLM inference cost (~$0.001 per assignment) - Slower than pure MAGRPO - Requires careful prompt engineering
Tasks: 1. Implement LLM credit assignment 2. Create prompt templates 3. Build fallback to MAGRPO 4. A/B test vs pure MAGRPO 5. Monitor cost and latency
Success Metrics
Diversity & Discovery: - Feed diversity score > 0.8 - Novel discoveries > 20% of feed - Cross-regional discovery > 10% of feed - User delight with serendipity > 8/10
Notifications: - CTR > 23% lift vs baseline - Opt-out rate < 5% - Response latency < 100ms - NPS impact: neutral or positive
Multi-Objective: - Reward weights learned successfully per user - No cannibalization detected - Concept drift handled gracefully - System stability maintained
Overall: - User satisfaction (NPS) > 50 - 30-day retention > 70% - Session duration > 8 minutes - Cost < $0.05/user/month
Cost Analysis
New Services: - LLM credit assignment (optional): ~$50/month - Additional compute: ~$25/month
Total Phase 3 Cost: ~$275-325/month at 2K users Per User Cost: ~$0.14-0.16/month
Stays well under target of $0.20/user/month
Migration to Phase 4
Phase 4 focuses on scale and polish: 1. Edge caching optimization 2. Monitoring and observability 3. A/B testing framework 4. Production hardening 5. Support for 10K users
Resources
Reference Docs: - Serendipity Engine Component - Engagement Forecaster Component - Agent Architecture
Research Papers: - Multi-Objective Utility-UCB - Meta Notification Timing - LLM Credit Assignment
Development: - Testing Advanced Features - Deployment Guide - Monitoring Setup
Next: Phase 4: Scale & Polish