Overview

SMEGA (Sourcegraph Managed Embeddings Generation API) is a Sourcegraph managed service that enables Cody users to generate embedding vectors used for Chat Context fetching.

All client-initiated calls to SMEGA mirror the existing Cody Embedding integration for OpenAI - they are:

Sent from the BFG binary to Cody Gateway (reusing existing integration) over HTTPS, passing through Cloudflare WAF
Using our standard Cody authentication tokens (issued by sourcegraph.com)
Subject to WAF and Cody Gateway concurrency limits
Are not subject to per-user usage-based rate-limiting

Cody Gateway forwards the requests to a new API hosted in a dedicated GCP project by using a GCP Load Balancer with GCP-managed SSL certificate.

The new API uses an upstream, industry standard inference server called Triton, exposing HTTP and gRPC API endpoints allowing the use of Embedding (and other models). Model configuration and conversion happens at build time, with config / model weights / tokenizer code embedded in the Docker image.

More context in project doc

Implementation

SMEGA is implemented using Nvidia Triton, hosting the Sentence Transformers multi-qa-mpnet-base-dot-v1 model converted to ONNX format. SMEGA is hosting using Kubernetes clusters managed by GKE (Standard), and exposed externally through Cody Gateway.

SMEGA is stateless - it:

doesn’t write input/output to disk or other permanent storage
doesn’t log user-provided data
doesn’t store or process any client-provided credentials

Operations

Repository

Kubernetes manifests

Buildkite pipeline