Skip to content

Foundation Model API

API for generating embeddings and user representations using the Personalization Foundation Model.


Base URL

https://ml-inference.vows.social (Fly.io service)


Endpoints

Embed Content

Generate embedding for wedding content.

Endpoint: POST /embed/content

Request:

{
  "text": "Beautiful botanical garden wedding venue in Melbourne",
  "images": ["https://example.com/image1.jpg"],
  "tags": ["venue", "garden", "melbourne"],
  "vendorType": "venue"
}

Response:

{
  "embedding": [0.123, -0.456, ...],  // 384-dim vector
  "metadata": {
    "model": "sentence-bert-v1",
    "dimension": 384,
    "latency_ms": 45
  }
}

cURL Example:

curl -X POST https://ml-inference.vows.social/embed/content \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "text": "Modern minimalist wedding photographer",
    "tags": ["photographer", "modern"]
  }'


Get User Embedding

Generate or retrieve user embedding based on interaction history.

Endpoint: POST /embed/user

Request:

{
  "userId": "user-123",
  "interactions": [
    {
      "contentId": "content-1",
      "action": "save",
      "duration": 5.2,
      "timestamp": "2025-10-11T10:00:00Z"
    },
    {
      "contentId": "content-2",
      "action": "view",
      "duration": 3.5,
      "timestamp": "2025-10-11T10:05:00Z"
    }
  ],
  "decayRate": 0.01  // Optional, default 0.01
}

Response:

{
  "embedding": [0.789, -0.234, ...],  // 384-dim vector
  "metadata": {
    "model": "simple-aggregation-v1",
    "interactionCount": 127,
    "dominantStyle": "modern_minimalist",
    "confidence": 0.85,
    "latency_ms": 120
  }
}


Retrieve Cached Embedding

Get previously computed user embedding (cached).

Endpoint: GET /embed/user/:userId

Response:

{
  "embedding": [0.789, -0.234, ...],
  "metadata": {
    "cachedAt": "2025-10-11T10:30:00Z",
    "validUntil": "2025-10-11T11:30:00Z",
    "interactionCount": 127
  }
}

Cache Invalidation: - User embedding is cached for 1 hour - Invalidated on significant interactions (save, share) - Force refresh with ?refresh=true


Batch Embed Content

Embed multiple content items efficiently.

Endpoint: POST /embed/batch

Request:

{
  "items": [
    {
      "id": "content-1",
      "text": "Garden wedding venue Melbourne",
      "tags": ["venue", "garden"]
    },
    {
      "id": "content-2",
      "text": "Modern wedding photographer Sydney",
      "tags": ["photographer", "modern"]
    }
  ]
}

Response:

{
  "embeddings": {
    "content-1": [0.1, -0.2, ...],
    "content-2": [0.3, -0.4, ...]
  },
  "metadata": {
    "batchSize": 2,
    "totalLatency_ms": 80,
    "avgLatency_ms": 40
  }
}


Cold Start Embedding

Generate embedding for new user with no history.

Endpoint: POST /embed/cold-start

Request:

{
  "userId": "user-new-123",
  "preferences": {
    "styles": ["modern", "minimalist"],
    "region": "Melbourne",
    "budget": "medium",
    "weddingDate": "2026-06-15"
  }
}

Response:

{
  "embedding": [0.5, -0.3, ...],
  "metadata": {
    "method": "preference_based",
    "confidence": 0.6,
    "note": "Will improve with user interactions"
  }
}


Compute Similarity

Calculate similarity between embeddings.

Endpoint: POST /embed/similarity

Request:

{
  "embedding1": [0.1, -0.2, ...],
  "embedding2": [0.3, -0.4, ...],
  "metric": "cosine"  // cosine, euclidean, or dot
}

Response:

{
  "similarity": 0.87,
  "metric": "cosine",
  "interpretation": "highly_similar"  // highly_similar, similar, somewhat_similar, different
}


Model Versions

Current Models

Phase 1 (Simple): - sentence-bert-v1 - Pre-trained Sentence-BERT - Dimension: 384 - No training required - Fast inference (~50ms)

Phase 2 (Custom): - foundation-model-v1 - Custom transformer - Dimension: 128 (user) / 384 (content) - Trained on interaction data - Better personalization

Model Selection

Specify model in request:

{
  "text": "...",
  "model": "foundation-model-v1"
}

Or use header:

X-Model-Version: foundation-model-v1


Advanced Features

Style Classification

Get style classification along with embedding.

Request:

{
  "text": "Elegant garden wedding with soft romantic florals",
  "classifyStyle": true
}

Response:

{
  "embedding": [...],
  "style": {
    "primary": "romantic",
    "secondary": "garden",
    "confidence": 0.82,
    "scores": {
      "romantic": 0.82,
      "modern": 0.15,
      "rustic": 0.12,
      "bohemian": 0.08
    }
  }
}

Explain Embedding

Get human-readable explanation of what the embedding captures.

Request:

{
  "embedding": [0.1, -0.2, ...],
  "explain": true
}

Response:

{
  "explanation": {
    "dominantFeatures": [
      "Modern aesthetic with clean lines",
      "Garden/outdoor setting preference",
      "Natural light emphasis"
    ],
    "styleProfile": "modern_garden",
    "vendorTypeAffinity": {
      "venue": 0.9,
      "photographer": 0.7,
      "florist": 0.6
    }
  }
}


Performance

Latency Targets

Operation Target Typical
Single embed < 100ms ~50ms
Batch embed (10) < 200ms ~150ms
User embedding < 150ms ~120ms
Similarity < 10ms ~5ms

Optimization Tips

  1. Batch requests - Embed multiple items at once
  2. Cache embeddings - Content embeddings rarely change
  3. Use simple model - Sentence-BERT for speed
  4. Compress vectors - Use PCA for storage

Error Handling

Common Errors

Invalid Input:

{
  "error": {
    "code": "INVALID_INPUT",
    "message": "Text field is required",
    "field": "text"
  }
}

Model Not Found:

{
  "error": {
    "code": "MODEL_NOT_FOUND",
    "message": "Model 'custom-v2' not available",
    "availableModels": ["sentence-bert-v1", "foundation-model-v1"]
  }
}

Timeout:

{
  "error": {
    "code": "INFERENCE_TIMEOUT",
    "message": "Model inference took too long",
    "timeout_ms": 5000
  }
}


Usage Examples

TypeScript

import { VowsMLClient } from '@vows/ml-sdk';

const client = new VowsMLClient({
  apiKey: process.env.API_KEY
});

// Embed content
const contentEmbedding = await client.embed.content({
  text: "Modern wedding venue Melbourne",
  tags: ["venue", "modern"]
});

// Get user embedding
const userEmbedding = await client.embed.user({
  userId: "user-123",
  interactions: recentInteractions
});

// Compute similarity
const similarity = await client.embed.similarity(
  userEmbedding.embedding,
  contentEmbedding.embedding
);

console.log(`Similarity: ${similarity}`);

Python

from vows_ml import VowsMLClient

client = VowsMLClient(api_key=os.environ['API_KEY'])

# Embed content
embedding = client.embed.content(
    text="Garden wedding photographer",
    tags=["photographer", "garden"]
)

# Get user embedding
user_emb = client.embed.user(
    user_id="user-123",
    interactions=recent_interactions
)

# Find similar users
similar = client.embed.find_similar(
    embedding=user_emb,
    collection="users",
    limit=10
)

cURL

# Embed content
curl -X POST https://ml-inference.vows.social/embed/content \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Modern wedding venue", "tags": ["venue"]}'

# Get user embedding
curl -X POST https://ml-inference.vows.social/embed/user \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "userId": "user-123",
    "interactions": [...]
  }'

# Compute similarity
curl -X POST https://ml-inference.vows.social/embed/similarity \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "embedding1": [0.1, -0.2, ...],
    "embedding2": [0.3, -0.4, ...],
    "metric": "cosine"
  }'


Resources