Phase 3: Optimization & Intelligence

Timeline: Weeks 9-12 Status: Planned Prerequisites: Phase 2 agent crew operational

Overview

Phase 3 adds advanced intelligence features: serendipitous discovery, notification timing, and sophisticated multi-objective optimization. The system becomes truly "sentient" - predicting needs, preventing filter bubbles, and delighting users with perfect timing.

Goal: Achieve 80%+ feed diversity, 23%+ notification CTR lift, and NPS > 50.

Deliverables

Week 9: Serendipity Engine

Mission: Prevent filter bubbles and enable delightful discoveries

Strategies: 1. Cross-Regional Discovery - Melbourne couple sees stunning Sydney venue 2. Style Exploration - Modern couple discovers tasteful bohemian elements 3. Emerging Vendor Boost - Surface exceptional new vendors 4. Collaborative Filtering - "Users like you also loved..."

Implementation:

class SerendipityEngine {
  async generateSerendipitousCandidates(
    user: User,
    recentFeed: Content[]
  ): Promise<Content[]> {
    // Measure current diversity
    const diversity = this.measureDiversity(recentFeed);

    if (diversity < DIVERSITY_THRESHOLD) {
      // Explore embedding space for novel but relevant content
      const novelCandidates = await this.exploreEmbeddingSpace(
        user.embedding,
        recentFeed.map(c => c.embedding),
        explorationRadius: 0.7  // Semantic distance
      );

      // Quality filter (only show good surprises)
      return novelCandidates.filter(c => c.qualityScore > 0.75);
    }

    return [];
  }

  private measureDiversity(feed: Content[]): number {
    // Category diversity
    const categories = new Set(feed.map(c => c.category));
    const categoryDiversity = categories.size / TOTAL_CATEGORIES;

    // Style diversity (embedding space coverage)
    const styleDiversity = this.computeEmbeddingCoverage(feed);

    // Regional diversity
    const regions = new Set(feed.map(c => c.region));
    const regionalDiversity = regions.size / TOTAL_REGIONS;

    return (categoryDiversity + styleDiversity + regionalDiversity) / 3;
  }
}

Diversity Metrics: - Category Coverage: % of vendor types shown (photographers, venues, florists, etc.) - Embedding Diversity: Coverage of style embedding space - Regional Balance: Distribution across locations - Novelty Rate: % of content user hasn't seen similar to

Tasks: 1. Implement diversity measurement 2. Build embedding space exploration 3. Create novelty detection 4. Add collaborative filtering signals 5. Integrate with Orchestrator

Success Criteria: - Diversity score > 0.8 - Novel discoveries > 20% of feed - User delight rating > 8/10 - No engagement drop from diversity injection

Week 10: Engagement Forecaster

Mission: Intelligent notification timing for maximum welcome probability

Key Innovation: Fully integrated into MARL system (not a separate service)

Context Signals (200+):

interface NotificationContext {
  // Temporal
  timeOfDay: number;
  dayOfWeek: number;
  daysUntilWedding: number;
  planningPhase: string;

  // Behavioral
  avgSessionDuration: number;
  lastActive: Date;
  notificationOpenRate: number;
  preferredContentTypes: string[];
  engagementPatterns: TimeSeries;

  // Device/Environment
  deviceType: string;
  osVersion: string;
  batteryLevel: number;
  networkQuality: string;
  timezone: string;

  // Content
  contentType: string;
  qualityScore: number;
  personalizationScore: number;
  urgency: number;
  vendorAvailability: boolean;

  // Social
  friendActivity: number;
  trendingInRegion: boolean;
  vendorResponsePending: boolean;
}

Prediction Model:

class EngagementForecaster {
  async shouldSendNotification(
    user: User,
    content: Content,
    context: NotificationContext
  ): Promise<{ send: boolean; timing: Date }> {
    // Predict P(welcome | context)
    const welcomeProb = await this.predictWelcome(
      user,
      content,
      context
    );

    if (welcomeProb < WELCOME_THRESHOLD) {
      return { send: false, timing: null };
    }

    // Optimize send time within next 24 hours
    const optimalTime = await this.optimizeSendTime(user, context);

    return {
      send: true,
      timing: optimalTime
    };
  }

  private async predictWelcome(
    user: User,
    content: Content,
    context: NotificationContext
  ): Promise<number> {
    // Neural network trained on historical notification data
    const features = this.extractFeatures(user, content, context);
    return this.model.predict(features);
  }
}

MARL Integration:

The Engagement Forecaster participates in MAGRPO: - Proposes: Notification timing candidates - Receives Reward: Based on user response (open, engage, opt-out) - Learns: When to send, when to hold back

Expected Lift: 23%+ CTR improvement (based on Meta research)

Tasks: 1. Build notification context extraction 2. Train welcome probability model 3. Implement send-time optimization 4. Integrate with MAGRPO framework 5. A/B test against naive timing

Success Criteria: - CTR > 23% lift vs baseline - Opt-out rate < 5% - Response time < 100ms - Maintains user trust (NPS stable or improving)

Focus: Advanced reward engineering and weight learning

Multi-Objective Utility-UCB (MOU-UCB):

Learns optimal reward weights per user from behavior:

class MultiObjectiveOptimizer {
  private weights: Map<string, number[]> = new Map();

  async computeReward(
    userId: string,
    engagement: number,
    quality: number,
    discovery: number,
    diversity: number,
    satisfaction: number
  ): Promise<number> {
    // Get or initialize user weights
    const w = this.weights.get(userId) || [0.2, 0.2, 0.2, 0.2, 0.2];

    // Compute weighted reward
    const reward =
      w[0] * engagement +
      w[1] * quality +
      w[2] * discovery +
      w[3] * diversity +
      w[4] * satisfaction;

    // Update weights using MOU-UCB
    await this.updateWeights(userId, [
      engagement, quality, discovery, diversity, satisfaction
    ]);

    return reward;
  }

  private async updateWeights(
    userId: string,
    outcomes: number[]
  ): Promise<void> {
    // Multi-armed bandit over weight combinations
    // Explores different weight combinations
    // Exploits best-performing weights for this user
  }
}

Cannibalization-Aware Optimization:

Prevent one feed component from cannibalizing others:

async optimizeComponent(
  component: FeedComponent,
  otherComponents: FeedComponent[]
): Promise<number> {
  // Base reward for this component
  const baseReward = component.engagement;

  // Penalty if user didn't engage with other important components
  const cannibalizationPenalty = otherComponents
    .filter(c => c.priority === 'high')
    .map(c => c.engagement === 0 ? -0.2 : 0)
    .reduce((sum, p) => sum + p, 0);

  return baseReward + cannibalizationPenalty;
}

Non-Stationarity Handling:

Detect and adapt to changing user preferences:

async detectConceptDrift(user: User): Promise<boolean> {
  const recentPrefs = user.interactions.slice(-100);
  const oldPrefs = user.interactions.slice(-500, -100);

  const recentEmbed = await this.foundationModel.embed(recentPrefs);
  const oldEmbed = await this.foundationModel.embed(oldPrefs);

  const distance = cosineSimilarity(recentEmbed, oldEmbed);

  return distance < 0.7;  // Significant drift detected
}

// Adapt to drift
if (await this.detectConceptDrift(user)) {
  this.increaseLearningRate(user);
  this.increaseExploration(user);
}

Tasks: 1. Implement MOU-UCB weight learning 2. Add cannibalization detection 3. Build non-stationarity handling 4. Tune reward functions 5. A/B test optimization strategies

Week 12: LLM-Guided Credit Assignment (Optional)

When to Use: If MAGRPO's group-relative advantage is too noisy

Implementation:

async assignCredit(
  agents: Agent[],
  actions: Action[],
  outcome: Reward,
  context: Context
): Promise<Map<Agent, number>> {
  const prompt = `
You are evaluating a collaborative content curation task.

Goal: Maximize user satisfaction in wedding vendor discovery

Agents and their actions:
${agents.map((a, i) => `- ${a.name}: ${actions[i].description}`).join('\n')}

User context: ${context.summary}
Outcome: ${outcome.value} (${outcome.description})

For each agent, provide a score from -1.0 to 1.0:
- 1.0: Exceptional positive contribution
- 0.0: Neutral contribution
- -1.0: Harmful contribution

Provide scores in JSON with brief reasoning.
`;

  const llmResponse = await callLLM(prompt, model: "gpt-4");
  return parseAgentScores(llmResponse);
}

Benefits: - More nuanced credit assignment than group-relative - Handles complex multi-agent scenarios - Provides interpretable reasoning

Drawbacks: - Adds LLM inference cost (~$0.001 per assignment) - Slower than pure MAGRPO - Requires careful prompt engineering

Tasks: 1. Implement LLM credit assignment 2. Create prompt templates 3. Build fallback to MAGRPO 4. A/B test vs pure MAGRPO 5. Monitor cost and latency

Success Metrics

Diversity & Discovery: - Feed diversity score > 0.8 - Novel discoveries > 20% of feed - Cross-regional discovery > 10% of feed - User delight with serendipity > 8/10

Notifications: - CTR > 23% lift vs baseline - Opt-out rate < 5% - Response latency < 100ms - NPS impact: neutral or positive

Multi-Objective: - Reward weights learned successfully per user - No cannibalization detected - Concept drift handled gracefully - System stability maintained

Overall: - User satisfaction (NPS) > 50 - 30-day retention > 70% - Session duration > 8 minutes - Cost < $0.05/user/month

Cost Analysis

New Services: - LLM credit assignment (optional): ~$50/month - Additional compute: ~$25/month

Total Phase 3 Cost: ~$275-325/month at 2K users Per User Cost: ~$0.14-0.16/month

Stays well under target of $0.20/user/month

Migration to Phase 4

Phase 4 focuses on scale and polish: 1. Edge caching optimization 2. Monitoring and observability 3. A/B testing framework 4. Production hardening 5. Support for 10K users