Skip to content

Phase 4: Scale & Polish

Timeline: Weeks 13-16 Status: Planned Prerequisites: Phase 3 complete with all agents operational


Overview

Phase 4 prepares the system for production scale. Focus shifts from building features to optimization, observability, and operational excellence. The goal is supporting 10K users with <500ms latency and <$500/month cost.

Goal: Production-ready system handling 10K users, 10K QPS, with 99.9% uptime.

Deliverables

Week 13: Performance Optimization

Focus: Reduce latency and cost through intelligent caching and batching

Edge Caching Strategy

Multi-Tier Cache Architecture:

// Cache hierarchy
const CACHE_LAYERS = {
  // L1: Cloudflare edge (ultra-fast, limited size)
  edge: {
    userEmbeddings: '15min',
    feedResults: '5min',
    thompsonParams: '30min'
  },

  // L2: Cloudflare KV (fast, larger capacity)
  kv: {
    contentEmbeddings: '7days',
    qualityScores: '24hours',
    agentScores: '1hour'
  },

  // L3: Qdrant (source of truth)
  qdrant: {
    fullEmbeddings: 'permanent',
    vectorSearch: 'no-cache'  // Always fresh
  }
};

Cache Invalidation Strategy:

class SmartCacheInvalidator {
  async onUserInteraction(userId: string, action: string) {
    if (SIGNIFICANT_ACTIONS.includes(action)) {
      // Invalidate user embedding (preferences changed)
      await this.invalidate(`user:${userId}:embedding`);

      // Keep feed cache for 5min (balance freshness/cost)
      // Will auto-refresh on next request after expiry
    }
  }

  async onContentUpdate(contentId: string) {
    // Invalidate quality scores
    await this.invalidate(`content:${contentId}:quality`);

    // Invalidate any feeds containing this content
    const affectedUsers = await this.getUsersWithContent(contentId);
    await Promise.all(
      affectedUsers.map(u => this.invalidate(`feed:${u}`))
    );
  }
}

Database Query Optimization

Qdrant Optimization:

// Before: Slow filtered search
const results = await qdrant.search({
  vector: userEmbedding,
  filter: {
    must: [
      { key: 'region', match: { value: 'AU' } },
      { key: 'category', match: { value: 'photographer' } },
      { key: 'quality', range: { gte: 0.8 } }
    ]
  },
  limit: 100
});

// After: Pre-filtered collections + vector search
const results = await qdrant.search({
  collection: `au_photographers_quality`,  // Pre-filtered
  vector: userEmbedding,
  limit: 100
});
// 10x faster!

Supabase Connection Pooling:

// Use Supavisor (Supabase's connection pooler)
const supabase = createClient(SUPABASE_URL, SUPABASE_KEY, {
  db: {
    pool: {
      min: 2,
      max: 10,
      idleTimeoutMillis: 30000
    }
  }
});

Batch Processing

// Batch agent scoring (10 at a time)
class BatchedAgentScoring {
  private queue: ScoringRequest[] = [];

  async score(content: Content): Promise<number> {
    return new Promise((resolve) => {
      this.queue.push({ content, resolve });

      if (this.queue.length >= 10 || this.timeoutReached()) {
        this.processBatch();
      }
    });
  }

  private async processBatch() {
    const batch = this.queue.splice(0, 10);

    // Single LLM call for 10 evaluations
    const scores = await this.batchScore(
      batch.map(r => r.content)
    );

    // Resolve all promises
    batch.forEach((req, i) => req.resolve(scores[i]));
  }
}

Tasks: 1. Implement edge caching hierarchy 2. Optimize Qdrant queries with pre-filtering 3. Set up Supabase connection pooling 4. Add batching for LLM calls 5. Measure latency improvements

Success Criteria: - P99 latency < 500ms (down from ~1s) - Cache hit rate > 80% - Database connections stable - Cost reduction > 40%

Week 14: Monitoring & Observability

Focus: Comprehensive system visibility

Metrics Dashboard

System Health: - Request rate, latency (p50, p99, p99.9) - Error rate, success rate - Cache hit rates - Database query performance - Worker/Durable Object utilization

AI Performance: - Agent scoring distribution - Thompson Sampling convergence - MAGRPO stability (policy variance) - Embedding quality (clustering metrics) - Quality scores over time

Business Metrics: - Active users (DAU, MAU) - Session duration, retention - Content save rate, engagement - Vendor discovery rate - NPS, user satisfaction

Implementation:

// Metrics collection
class MetricsCollector {
  async recordFeedRequest(userId: string, latency: number) {
    await this.increment('feed_requests_total');
    await this.histogram('feed_request_latency', latency);

    // Track by user segment
    const segment = await this.getUserSegment(userId);
    await this.increment(`feed_requests_${segment}`);
  }

  async recordAgentDecision(
    agent: string,
    score: number,
    latency: number
  ) {
    await this.gauge(`agent_${agent}_score`, score);
    await this.histogram(`agent_${agent}_latency`, latency);
  }
}

Alerting Rules:

const ALERTS = [
  {
    name: 'high_error_rate',
    condition: 'error_rate > 0.01',
    severity: 'critical',
    channel: 'pagerduty'
  },
  {
    name: 'slow_feed_generation',
    condition: 'feed_latency_p99 > 1000',
    severity: 'warning',
    channel: 'slack'
  },
  {
    name: 'quality_drift',
    condition: 'avg_quality_score < 0.7',
    severity: 'warning',
    channel: 'email'
  }
];

Logging Strategy

// Structured logging for debugging
logger.info('feed_generated', {
  userId,
  feedSize: 20,
  latency: 450,
  agents: {
    discovery: { score: 0.8, latency: 120 },
    quality: { score: 0.9, latency: 80 },
    archivist: { score: 0.7, latency: 100 },
    serendipity: { score: 0.6, latency: 150 }
  },
  thompsonSampling: {
    exploration: 0.3,
    exploitation: 0.7
  },
  cacheHits: 12,
  cacheMisses: 8
});

Tasks: 1. Set up metrics collection (Prometheus/Grafana) 2. Build real-time dashboards 3. Configure alerting rules 4. Implement structured logging 5. Create runbooks for common issues

Success Criteria: - All key metrics visible in real-time - Alerts fire before users notice issues - Mean time to detection (MTTD) < 5 minutes - Mean time to resolution (MTTR) < 30 minutes

Week 15: A/B Testing Framework

Focus: Systematic experimentation and optimization

Experiment Framework

class ExperimentFramework {
  async assignVariant(userId: string, experiment: string): Promise<string> {
    // Deterministic assignment based on user ID
    const hash = this.hashUserId(userId, experiment);

    if (hash < 0.5) return 'control';
    return 'treatment';
  }

  async trackExperiment(
    userId: string,
    experiment: string,
    metric: string,
    value: number
  ) {
    const variant = await this.assignVariant(userId, experiment);

    await this.record({
      experiment,
      variant,
      metric,
      value,
      userId,
      timestamp: Date.now()
    });
  }

  async analyzeExperiment(experiment: string): Promise<Results> {
    const control = await this.getMetrics(experiment, 'control');
    const treatment = await this.getMetrics(experiment, 'treatment');

    return {
      lift: (treatment.mean - control.mean) / control.mean,
      pValue: this.tTest(control, treatment),
      significant: pValue < 0.05,
      sampleSize: { control: control.n, treatment: treatment.n }
    };
  }
}

Planned Experiments

Experiment 1: Foundation Model - Control: Sentence-BERT + simple aggregation - Treatment: Custom trained foundation model - Metric: Engagement, quality, retention - Duration: 2 weeks, 1000 users

Experiment 2: Thompson Sampling - Control: Beta-Bernoulli Thompson Sampling - Treatment: ENR-based Thompson Sampling - Metric: Regret, sample efficiency - Duration: 2 weeks, 1000 users

Experiment 3: Agent Count - Control: 4 agents (Discovery, Quality, Archivist, Serendipity) - Treatment: 6 agents (+ Engagement Forecaster, custom agent) - Metric: Quality, diversity, cost - Duration: 2 weeks, 500 users per variant

Tasks: 1. Build experiment framework 2. Set up statistical analysis 3. Create experiment dashboard 4. Run 3 key experiments 5. Document learnings

Success Criteria: - Framework supports concurrent experiments - Statistical significance detection - Learnings documented and actioned - At least 1 significant improvement shipped

Week 16: Production Hardening

Focus: Reliability, security, and operational readiness

Reliability Improvements

Graceful Degradation:

class ResilientOrchestrator {
  async rankFeed(userId: string): Promise<Content[]> {
    try {
      // Try full multi-agent ranking
      return await this.multiAgentRank(userId);
    } catch (error) {
      logger.warn('multi_agent_failed', { userId, error });

      try {
        // Fallback: Simple ranking
        return await this.simpleRank(userId);
      } catch (error) {
        logger.error('simple_rank_failed', { userId, error });

        // Last resort: Trending content
        return await this.getTrendingContent();
      }
    }
  }
}

Circuit Breaker:

class CircuitBreaker {
  private failureCount = 0;
  private lastFailure = 0;
  private state = 'closed';

  async call<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'open') {
      if (Date.now() - this.lastFailure > RESET_TIMEOUT) {
        this.state = 'half-open';
      } else {
        throw new Error('Circuit breaker open');
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  private onFailure() {
    this.failureCount++;
    this.lastFailure = Date.now();

    if (this.failureCount > FAILURE_THRESHOLD) {
      this.state = 'open';
    }
  }
}

Security Hardening

  • Rate limiting per user (1000 req/hour)
  • API key rotation
  • Input validation and sanitization
  • SQL injection prevention
  • CORS configuration
  • Secrets management audit

Documentation

  • API documentation (OpenAPI spec)
  • Runbooks for common operations
  • Incident response playbooks
  • Architecture decision records
  • Deployment guides

Tasks: 1. Add graceful degradation 2. Implement circuit breakers 3. Security audit and fixes 4. Complete documentation 5. Disaster recovery testing

Success Criteria: - System survives agent failures - Zero security vulnerabilities - All operations documented - Disaster recovery tested

Final Metrics

Scale: - 10K users supported - 10K QPS capacity - 99.9% uptime

Performance: - P99 latency < 500ms - Cache hit rate > 80% - Error rate < 0.1%

Cost: - Total: < $500/month - Per user: < $0.05/month

Quality: - NPS > 50 - 30-day retention > 70% - Feed relevance > 0.85

Production Deployment Checklist

  • Performance benchmarks passed
  • Monitoring dashboards live
  • Alerting configured
  • A/B tests run and analyzed
  • Security audit complete
  • Documentation published
  • Disaster recovery tested
  • Team trained on operations
  • Incident response plan ready
  • Cost monitoring active

Post-Launch

Week 17+: Continuous Improvement 1. Monitor metrics daily 2. Run weekly experiments 3. Monthly architecture reviews 4. Quarterly planning 5. Scale based on growth

Resources

Monitoring: - Grafana Dashboards - Cloudflare Analytics - Sentry Error Tracking

Security: - OWASP Top 10 - Cloudflare WAF - Secrets Management

Operations: - Deployment Guide - Testing Guide - Runbooks


Congratulations! The system is production-ready. Time to scale and delight users.

See also: - Implementation Progress - Architecture Overview - Development Workflow