Foundation Model API

API for generating embeddings and user representations using the Personalization Foundation Model.

Base URL

https://ml-inference.vows.social (Fly.io service)

Endpoints

Embed Content

Generate embedding for wedding content.

Endpoint: POST /embed/content

Request:

{
  "text": "Beautiful botanical garden wedding venue in Melbourne",
  "images": ["https://example.com/image1.jpg"],
  "tags": ["venue", "garden", "melbourne"],
  "vendorType": "venue"
}

Response:

{
  "embedding": [0.123, -0.456, ...],  // 384-dim vector
  "metadata": {
    "model": "sentence-bert-v1",
    "dimension": 384,
    "latency_ms": 45
  }
}

cURL Example:

curl -X POST https://ml-inference.vows.social/embed/content \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "text": "Modern minimalist wedding photographer",
    "tags": ["photographer", "modern"]
  }'

Get User Embedding

Generate or retrieve user embedding based on interaction history.

Endpoint: POST /embed/user

Request:

{
  "userId": "user-123",
  "interactions": [
    {
      "contentId": "content-1",
      "action": "save",
      "duration": 5.2,
      "timestamp": "2025-10-11T10:00:00Z"
    },
    {
      "contentId": "content-2",
      "action": "view",
      "duration": 3.5,
      "timestamp": "2025-10-11T10:05:00Z"
    }
  ],
  "decayRate": 0.01  // Optional, default 0.01
}

Response:

{
  "embedding": [0.789, -0.234, ...],  // 384-dim vector
  "metadata": {
    "model": "simple-aggregation-v1",
    "interactionCount": 127,
    "dominantStyle": "modern_minimalist",
    "confidence": 0.85,
    "latency_ms": 120
  }
}

Retrieve Cached Embedding

Get previously computed user embedding (cached).

Endpoint: GET /embed/user/:userId

Response:

{
  "embedding": [0.789, -0.234, ...],
  "metadata": {
    "cachedAt": "2025-10-11T10:30:00Z",
    "validUntil": "2025-10-11T11:30:00Z",
    "interactionCount": 127
  }
}

Cache Invalidation: - User embedding is cached for 1 hour - Invalidated on significant interactions (save, share) - Force refresh with ?refresh=true

Batch Embed Content

Embed multiple content items efficiently.

Endpoint: POST /embed/batch

Request:

{
  "items": [
    {
      "id": "content-1",
      "text": "Garden wedding venue Melbourne",
      "tags": ["venue", "garden"]
    },
    {
      "id": "content-2",
      "text": "Modern wedding photographer Sydney",
      "tags": ["photographer", "modern"]
    }
  ]
}

Response:

{
  "embeddings": {
    "content-1": [0.1, -0.2, ...],
    "content-2": [0.3, -0.4, ...]
  },
  "metadata": {
    "batchSize": 2,
    "totalLatency_ms": 80,
    "avgLatency_ms": 40
  }
}

Cold Start Embedding

Generate embedding for new user with no history.

Endpoint: POST /embed/cold-start

Request:

{
  "userId": "user-new-123",
  "preferences": {
    "styles": ["modern", "minimalist"],
    "region": "Melbourne",
    "budget": "medium",
    "weddingDate": "2026-06-15"
  }
}

Response:

{
  "embedding": [0.5, -0.3, ...],
  "metadata": {
    "method": "preference_based",
    "confidence": 0.6,
    "note": "Will improve with user interactions"
  }
}

Compute Similarity

Calculate similarity between embeddings.

Endpoint: POST /embed/similarity

Request:

{
  "embedding1": [0.1, -0.2, ...],
  "embedding2": [0.3, -0.4, ...],
  "metric": "cosine"  // cosine, euclidean, or dot
}

Response:

{
  "similarity": 0.87,
  "metric": "cosine",
  "interpretation": "highly_similar"  // highly_similar, similar, somewhat_similar, different
}

Model Versions

Current Models

Phase 1 (Simple): - sentence-bert-v1 - Pre-trained Sentence-BERT - Dimension: 384 - No training required - Fast inference (~50ms)

Phase 2 (Custom): - foundation-model-v1 - Custom transformer - Dimension: 128 (user) / 384 (content) - Trained on interaction data - Better personalization

Model Selection

Specify model in request:

{
  "text": "...",
  "model": "foundation-model-v1"
}

Or use header:

X-Model-Version: foundation-model-v1

Advanced Features

Style Classification

Get style classification along with embedding.

Request:

{
  "text": "Elegant garden wedding with soft romantic florals",
  "classifyStyle": true
}

Response:

{
  "embedding": [...],
  "style": {
    "primary": "romantic",
    "secondary": "garden",
    "confidence": 0.82,
    "scores": {
      "romantic": 0.82,
      "modern": 0.15,
      "rustic": 0.12,
      "bohemian": 0.08
    }
  }
}

Explain Embedding

Get human-readable explanation of what the embedding captures.

Request:

{
  "embedding": [0.1, -0.2, ...],
  "explain": true
}

Response:

{
  "explanation": {
    "dominantFeatures": [
      "Modern aesthetic with clean lines",
      "Garden/outdoor setting preference",
      "Natural light emphasis"
    ],
    "styleProfile": "modern_garden",
    "vendorTypeAffinity": {
      "venue": 0.9,
      "photographer": 0.7,
      "florist": 0.6
    }
  }
}

Performance

Latency Targets

Operation	Target	Typical
Single embed	< 100ms	~50ms
Batch embed (10)	< 200ms	~150ms
User embedding	< 150ms	~120ms
Similarity	< 10ms	~5ms

Optimization Tips

Batch requests - Embed multiple items at once
Cache embeddings - Content embeddings rarely change
Use simple model - Sentence-BERT for speed
Compress vectors - Use PCA for storage

Error Handling

Common Errors

Invalid Input:

{
  "error": {
    "code": "INVALID_INPUT",
    "message": "Text field is required",
    "field": "text"
  }
}

Model Not Found:

{
  "error": {
    "code": "MODEL_NOT_FOUND",
    "message": "Model 'custom-v2' not available",
    "availableModels": ["sentence-bert-v1", "foundation-model-v1"]
  }
}

Timeout:

{
  "error": {
    "code": "INFERENCE_TIMEOUT",
    "message": "Model inference took too long",
    "timeout_ms": 5000
  }
}

Usage Examples

TypeScript

import { VowsMLClient } from '@vows/ml-sdk';

const client = new VowsMLClient({
  apiKey: process.env.API_KEY
});

// Embed content
const contentEmbedding = await client.embed.content({
  text: "Modern wedding venue Melbourne",
  tags: ["venue", "modern"]
});

// Get user embedding
const userEmbedding = await client.embed.user({
  userId: "user-123",
  interactions: recentInteractions
});

// Compute similarity
const similarity = await client.embed.similarity(
  userEmbedding.embedding,
  contentEmbedding.embedding
);

console.log(`Similarity: ${similarity}`);

Python

from vows_ml import VowsMLClient

client = VowsMLClient(api_key=os.environ['API_KEY'])

# Embed content
embedding = client.embed.content(
    text="Garden wedding photographer",
    tags=["photographer", "garden"]
)

# Get user embedding
user_emb = client.embed.user(
    user_id="user-123",
    interactions=recent_interactions
)

# Find similar users
similar = client.embed.find_similar(
    embedding=user_emb,
    collection="users",
    limit=10
)

cURL

# Embed content
curl -X POST https://ml-inference.vows.social/embed/content \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Modern wedding venue", "tags": ["venue"]}'

# Get user embedding
curl -X POST https://ml-inference.vows.social/embed/user \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "userId": "user-123",
    "interactions": [...]
  }'

# Compute similarity
curl -X POST https://ml-inference.vows.social/embed/similarity \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "embedding1": [0.1, -0.2, ...],
    "embedding2": [0.3, -0.4, ...],
    "metric": "cosine"
  }'

Orchestrator API - Uses embeddings for ranking
Vector Store API - Stores and searches embeddings
Discovery API - Uses embeddings for vendor search

Foundation Model API

Base URL

Endpoints

Embed Content

Get User Embedding

Retrieve Cached Embedding

Batch Embed Content

Cold Start Embedding

Compute Similarity

Model Versions

Current Models

Model Selection

Advanced Features

Style Classification

Explain Embedding

Performance

Latency Targets

Optimization Tips

Error Handling

Common Errors

Usage Examples

TypeScript

Python

cURL

Related APIs

Resources