How It Works - Visual Guide

A plain-English, visual explanation of how Vows Social curates your perfect wedding feed.

The Big Picture

Think of Vows Social as a team of specialized experts working together to find content you'll love:

graph LR
    You[👰 You<br/>Browse & Save] --> AI[🧠 AI Learns<br/>Your Taste]
    AI --> Agents[🤖 Agents<br/>Find Matches]
    Agents --> Rank[🎯 Ranking<br/>Algorithm]
    Rank --> Feed[📱 Perfect<br/>Feed]
    Feed --> You

    style You fill:#FFE0F0
    style AI fill:#E0F2FE
    style Agents fill:#F3E8FF
    style Rank fill:#FEF3C7
    style Feed fill:#D1FAE5

Let's break down each step...

Step 1: You Interact 👰

Every time you browse wedding content, you're teaching our AI:

💾 Save content → "I love this style"
👀 View for 30+ seconds → "This is interesting"
📤 Share with partner → "This is important"
⏭️ Skip quickly → "Not my style"

sequenceDiagram
    actor Sarah as 👰 Sarah
    participant App as vows.social
    participant DB as Database

    Sarah->>App: Saves rustic barn venue 🏚️
    App->>DB: Log interaction:<br/>user=sarah, content=barn_42,<br/>action=save, timestamp=now
    Note over DB: All interactions<br/>stored for learning

Real Example:

Sarah saves 5 rustic barn venues, 3 sage green color palettes, and 2 outdoor ceremony photos. Our AI now knows: Sarah likes rustic + outdoor + earth tones.

Step 2: AI Learns Your Style 🧠

The Two-Tower Model

We use the same architecture as Pinterest and YouTube - a Two-Tower Model:

graph TB
    subgraph "User Tower 🧑"
        History[Sarah's History<br/>📚 Last 50 interactions] --> UserEncoder[User Encoder<br/>🧠 Neural Network]
        UserEncoder --> UserEmbed[User Embedding<br/>📊 128 numbers]
    end

    subgraph "Content Tower 🎨"
        Content[Venue Photo<br/>🏚️ Image + Metadata] --> ContentEncoder[Content Encoder<br/>🧠 SigLIP 2 Model]
        ContentEncoder --> ContentEmbed[Content Embedding<br/>📊 384 numbers]
    end

    UserEmbed --> Match[🎯 Match Score<br/>Dot Product]
    ContentEmbed --> Match
    Match --> Score[📈 How well they match<br/>0.0 to 1.0]

    style History fill:#FFE0F0
    style Content fill:#E0F2FE
    style Match fill:#FEF3C7
    style Score fill:#D1FAE5

In Plain English:

User Tower looks at everything Sarah has saved and creates a "taste profile" (128 numbers)
Content Tower understands every piece of content (image, style, colors) as 384 numbers
Match Score calculates how well they fit together (dot product = multiplication + sum)

Example:

Sarah's Taste Profile (simplified):
- rustic_score: 0.92
- modern_score: 0.15
- outdoor_score: 0.88
- sage_green: 0.95

Barn Venue #42:
- rustic_score: 0.89
- modern_score: 0.10
- outdoor_score: 0.92
- sage_green: 0.91

Match = (0.92×0.89) + (0.15×0.10) + (0.88×0.92) + (0.95×0.91)
      = 0.82 + 0.02 + 0.81 + 0.87
      = 2.52 → High match! ✅

Multimodal Understanding (Images + Text)

We use SigLIP 2, the state-of-the-art 2025 model for understanding images:

graph LR
    Image[🖼️ Venue Photo] --> SigLIP[SigLIP 2<br/>400M parameters<br/>Google AI]
    Text[📝 Description<br/>"Rustic barn venue"] --> SigLIP

    SigLIP --> Understanding[🧠 Understanding<br/>• Style: Rustic<br/>• Setting: Outdoor<br/>• Colors: Earth tones<br/>• Lighting: Natural]

    Understanding --> Vector[📊 384-number vector<br/>Computer-friendly format]

    style Image fill:#E0F2FE
    style Text fill:#FFE0F0
    style SigLIP fill:#F3E8FF
    style Understanding fill:#FEF3C7
    style Vector fill:#D1FAE5

Why This Matters:

Understands visual style (composition, lighting, colors)
Reads text descriptions and tags
Works across 89 languages (Jina CLIP v2 variant)
Fine-grained understanding (not just "wedding photo")

Step 3: Agents Find Perfect Matches 🤖

Six specialized AI agents work like a team of wedding planners:

graph TB
    Orchestrator[🎯 Orchestrator<br/>"Coordinate the team"]

    Discovery[🔍 Discovery Agent<br/>"Find hidden gems"]
    Quality[✨ Quality Guardian<br/>"Ensure high standards"]
    Archivist[📖 Personal Archivist<br/>"Remember the journey"]
    Serendipity[🎲 Serendipity Engine<br/>"Add variety"]
    Forecaster[⏰ Engagement Forecaster<br/>"Perfect timing"]

    Orchestrator --> Discovery
    Orchestrator --> Quality
    Orchestrator --> Archivist
    Orchestrator --> Serendipity
    Orchestrator --> Forecaster

    Discovery --> Scores[Agent Scores]
    Quality --> Scores
    Archivist --> Scores
    Serendipity --> Scores
    Forecaster --> Scores

    Scores --> Orchestrator

    style Orchestrator fill:#8B5CF6,color:#fff
    style Discovery fill:#06B6D4,color:#fff
    style Quality fill:#F59E0B,color:#fff
    style Archivist fill:#10B981,color:#fff
    style Serendipity fill:#EC4899,color:#fff
    style Forecaster fill:#8B5CF6,color:#fff

What Each Agent Does

🔍 Discovery Agent

Role: Find exceptional vendors before they're popular

Example:

"New photographer in Sydney posted amazing work yesterday. Only 500 Instagram followers but composition is stunning. Recommend to Sarah before they book up."

✨ Quality Guardian

Role: Ensure only high-quality content reaches you

Checks: - Image quality (blur, exposure, composition) - Professionalism (real weddings vs staged shoots) - Authenticity (genuine vendors vs stock photos)

Example:

"This venue photo is blurry and poorly lit. Quality score: 0.3/1.0. REJECT."

📖 Personal Archivist

Role: Remember your journey and planning phase

Tracks: - Wedding date (months until wedding) - Planning phase (inspiration → vendors → details) - Style evolution (preferences change over time) - Saved collections

Example:

"Sarah is 4 months out. She's past inspiration phase, now booking vendors. Prioritize photographers and florists with availability."

🎲 Serendipity Engine

Role: Prevent filter bubbles by introducing variety

Why It Matters: Without this, Sarah only sees rustic content and misses an elegant ballroom she'd actually love.

Strategy: - 80% proven taste (rustic barns) - 20% exploration (elegant venues, modern styles)

Example:

"Sarah loves rustic, but let's show her one elegant venue. If she saves it, we've discovered a secondary style she likes!"

⏰ Engagement Forecaster

Role: Predict perfect timing for notifications

Analyzes: - Time of day Sarah is most active - Days of week she engages - Planning phase timing (venue search → vendor booking → details)

Example:

"Sarah usually browses 7-9 PM on weekdays. She just saved 3 venues. Send 'New venues in your style' notification tomorrow at 7:15 PM."

Step 4: Thompson Sampling Ranks Everything 🎯

This is where the magic happens. Thompson Sampling is the same algorithm Instagram and Pinterest use.

The Problem We're Solving

You want two things: 1. Exploitation - Show content you'll probably love (safe bets) 2. Exploration - Try new content you might love (discoveries)

Too much exploitation = filter bubble (only see what you know you like) Too much exploration = bad recommendations (random content)

How Thompson Sampling Works

Think of it as a smart gambling strategy:

graph TB
    Content[🏚️ Venue Content<br/>Has been shown to Sarah 7 times]

    Stats[📊 Track Record<br/>✅ Saved: 5 times<br/>❌ Skipped: 2 times]

    Beta[🎲 Beta Distribution<br/>α=5 (successes), β=2 (failures)]

    Sample[🎯 Sample Score<br/>Random number from distribution<br/>Example: 0.82]

    Quality[✨ Quality Score<br/>From Quality Guardian<br/>Example: 0.90]

    Final[📈 Final Score<br/>0.82 × 0.90 = 0.74]

    Content --> Stats
    Stats --> Beta
    Beta --> Sample
    Sample --> Final
    Quality --> Final

    style Content fill:#E0F2FE
    style Stats fill:#FFE0F0
    style Beta fill:#F3E8FF
    style Sample fill:#FEF3C7
    style Quality fill:#FBCFE8
    style Final fill:#D1FAE5

Real Example: Ranking Sarah's Feed

Let's rank 3 venues for Sarah:

graph TB
    subgraph "Venue A: Rustic Barn"
        A1[Track Record<br/>✅ 5 saves<br/>❌ 2 skips] --> A2[Sample: 0.82<br/>High certainty]
        A2 --> A3[Quality: 0.90]
        A3 --> A4[Final: 0.74<br/>🥇 RANK #1]
    end

    subgraph "Venue B: New Photographer"
        B1[Track Record<br/>✅ 0 saves<br/>❌ 0 skips<br/>Never shown!] --> B2[Sample: 0.73<br/>High uncertainty]
        B2 --> B3[Quality: 0.85]
        B3 --> B4[Final: 0.62<br/>🥈 RANK #2]
    end

    subgraph "Venue C: Elegant Ballroom"
        C1[Track Record<br/>✅ 3 saves<br/>❌ 4 skips] --> C2[Sample: 0.51<br/>Lower certainty]
        C2 --> C3[Quality: 0.95]
        C3 --> C4[Final: 0.48<br/>🥉 RANK #3]
    end

    style A4 fill:#D1FAE5
    style B4 fill:#FEF3C7
    style C4 fill:#FED7AA

What Happened:

Venue A (Rustic Barn) - Ranked #1
Sarah has saved this style many times (high α=5)
High confidence it's a good match
Safe bet ✅
Venue B (New Photographer) - Ranked #2
Never shown before (α=0, β=0)
High uncertainty = try it!
Discovery opportunity 🔍
Venue C (Elegant Ballroom) - Ranked #3
Mixed results (3 saves, 4 skips)
Lower confidence
But still shown for diversity 🎲

The Beauty: System automatically balances exploration vs exploitation!

Self-Learning in Action

After Sarah interacts:

sequenceDiagram
    participant Sarah as 👰 Sarah
    participant Feed as Feed
    participant TS as Thompson Sampling

    Feed->>Sarah: Shows Venue B (new photographer)
    Sarah->>Feed: ❤️ Saves it!
    Feed->>TS: Update α_B: 0 → 1
    Note over TS: Next time, Venue B<br/>will rank higher!

    Feed->>Sarah: Shows Venue C (elegant ballroom)
    Sarah->>Feed: ⏭️ Skips it
    Feed->>TS: Update β_C: 4 → 5
    Note over TS: Next time, Venue C<br/>will rank lower

No manual tuning needed. The system learns from every interaction.

Step 5: Feed Gets Smarter Every Day 📊

Three Learning Mechanisms

graph TB
    subgraph "Real-Time Learning ⚡"
        Interaction[User Interaction] --> Thompson[Thompson Sampling<br/>Updates α/β instantly]
        Thompson --> NextFeed[Next Feed Request<br/>Already improved!]
    end

    subgraph "Nightly Training 🌙 (2 AM)"
        Batch[Daily Interactions] --> TwoTower[Two-Tower Model<br/>Retrains on A100 GPU]
        TwoTower --> BetterEmbeds[Better Embeddings<br/>Deployed automatically]
    end

    subgraph "Agent Training 🤖 (2 AM)"
        Episodes[Agent Episodes] --> RLlib[Ray RLlib<br/>Multi-Agent PPO]
        RLlib --> BetterAgents[Better Policies<br/>Agents collaborate better]
    end

    style Interaction fill:#D1FAE5
    style Batch fill:#E0F2FE
    style Episodes fill:#F3E8FF

Example: Sarah's First Week

Day	What Happens	Learning
Day 1	Sarah saves 5 rustic barns	Thompson α increases for rustic content
Day 2	Feed shows 80% rustic, 20% other	Sarah saves 1 elegant venue (surprise!)
Day 3	Thompson explores elegant style more	Two-Tower trains overnight, learns Sarah → rustic+elegant
Day 4	Feed now shows both styles	Sarah's embedding updated, better matches
Day 5	Agents learn Sarah is 4 months out	Engagement Forecaster prioritizes vendor bookings
Day 7	Two-Tower retrains with 7 days data	User embedding refined, content matches improve

Result: After 1 week, Sarah's feed is dramatically more personalized than Day 1.

Complete Flow: Request to Response

Let's watch Sarah's morning coffee browse:

sequenceDiagram
    actor Sarah as 👰☕ Sarah
    participant App as vows.social
    participant API as Modal API
    participant Orch as Orchestrator
    participant TwoTower as Two-Tower Model
    participant Qdrant as Vector DB
    participant Agents as Agent Crew
    participant TS as Thompson Sampling

    Note over Sarah: Opens app, 8:00 AM

    Sarah->>App: Request feed
    App->>API: GET /feed/for-you?user=sarah

    Note over API,Orch: Step 1: Understand Sarah

    API->>Orch: generate_feed(sarah, limit=20)
    Orch->>TwoTower: get_user_embedding(sarah)
    TwoTower->>TwoTower: Load Sarah's 50 recent interactions<br/>Run neural network
    TwoTower-->>Orch: [0.92, -0.43, 0.88, ...] (128 dims)

    Note over Orch,Qdrant: Step 2: Find similar content

    Orch->>Qdrant: vector_search(sarah_embedding, limit=100)
    Qdrant->>Qdrant: ANN search in 384-dim space<br/>Find nearest neighbors
    Qdrant-->>Orch: 100 candidates [venues, photos, vendors]

    Note over Orch,Agents: Step 3: Agent evaluation

    Orch->>Agents: evaluate_batch(candidates, sarah_context)

    par Discovery Agent
        Agents->>Agents: Score content freshness
    and Quality Guardian
        Agents->>Agents: Score visual quality
    and Personal Archivist
        Agents->>Agents: Check planning phase fit
    and Serendipity Engine
        Agents->>Agents: Measure diversity
    and Engagement Forecaster
        Agents->>Agents: Predict engagement
    end

    Agents-->>Orch: agent_scores {discovery: 0.85, quality: 0.92, ...}

    Note over Orch,TS: Step 4: Rank with Thompson Sampling

    Orch->>TS: rank(candidates, agent_scores)

    loop For each candidate
        TS->>TS: Sample from Beta(α, β)<br/>Multiply by quality<br/>Sort
    end

    TS-->>Orch: ranked_ids [42, 87, 19, ...]

    Orch-->>API: {items: [...], metadata: {...}}
    API-->>App: JSON with 20 ranked items
    App-->>Sarah: Beautiful, personalized feed 🎨

    Note over Sarah: Sarah saves a venue

    Sarah->>App: ❤️ Saves venue #42
    App->>API: POST /interactions {action: save, content: 42}
    API->>TS: update_reward(42, reward=1.0)
    TS->>TS: α_42 += 1 ✅

    Note over TS: Next feed will rank<br/>similar content higher!

Total Time: < 500ms from request to response 🚀

Why This Architecture Works

1. Industry-Proven Approaches

We use the same techniques as the giants:

Company	What They Use	What We Use
Pinterest	Two-Tower Model	✅ Two-Tower Model
Instagram	Thompson Sampling	✅ Thompson Sampling
TikTok	Contextual Bandits	✅ Beta-Bernoulli Bandit
Google	SigLIP for images	✅ SigLIP 2 (2025 SOTA)
OpenAI	Multi-Agent PPO	✅ Ray RLlib PPO

2. Self-Learning Without Manual Tuning

No one at Vows manually adjusts weights or tunes parameters. The system learns from user behavior:

Thompson Sampling learns from every save/skip
Two-Tower Model retrains nightly on interactions
Agent Policies optimize via Multi-Agent PPO

3. Full Observability

Every decision is logged and traceable via LangSmith:

graph LR
    Decision[Agent Decision] --> LangSmith[🔬 LangSmith]
    LangSmith --> Trace[Full Trace<br/>• Input<br/>• Reasoning<br/>• Output]
    Trace --> Debug[Debug Issues<br/>• Why this content?<br/>• Agent scores?<br/>• Match score?]

    style Decision fill:#F3E8FF
    style LangSmith fill:#F59E0B,color:#fff
    style Trace fill:#FEF3C7
    style Debug fill:#D1FAE5

Example Debug:

User complaint: "Why did I see modern venues? I prefer rustic."

LangSmith Trace:
1. Personal Archivist score: 0.45 (user prefers rustic)
2. Serendipity Engine score: 0.95 (diversity injection)
3. Final rank: #18 (shown for variety, not primary)

Action: Working as intended - diversity prevents filter bubble.

4. Unified Python Stack

All ML/AI code runs on Modal (Python):

No JavaScript/Python split like Cloudflare Workers + Fly.io
GPU access for embeddings (SigLIP 2 on A10G)
Serverless scaling ($0 when idle)
Single deployment platform

What Makes This Unique?

vs Pinterest

Pinterest: Generic recommendations for millions
Vows: Hyper-personalized for couples planning ONE wedding

vs Instagram

Instagram: Algorithmic feed you don't control
Vows: AI learns YOUR specific taste, no ads, pure curation

vs Google Search

Google: You search for what you know exists
Vows: Discovers vendors and ideas you didn't know existed

Next Steps

Ready to dive deeper?

Architecture Overview - Technical system design
Technology Stack - Detailed component breakdown
Implementation Plan - How we're building this
Visual Architecture - Complete end-to-end guide

Or start contributing:

Development Guides - Git workflow, testing, deployment
Backlog - What we're building next

Questions? Check the PRD or ADRs for architectural decisions! 🎨