Overview

SMEGA (Sourcegraph Managed Embeddings Generation API) is a Sourcegraph managed service that enables Cody users to generate embedding vectors used for Chat Context fetching.

All client-initiated calls to SMEGA mirror the existing Cody Embedding integration for OpenAI - they are:

Cody Gateway forwards the requests to a new API hosted in a dedicated GCP project by using a GCP Load Balancer with GCP-managed SSL certificate.

The new API uses an upstream, industry standard inference server called Triton, exposing HTTP and gRPC API endpoints allowing the use of Embedding (and other models). Model configuration and conversion happens at build time, with config / model weights / tokenizer code embedded in the Docker image.

More context in project doc

Implementation

SMEGA is implemented using Nvidia Triton, hosting the Sentence Transformers multi-qa-mpnet-base-dot-v1 model converted to ONNX format. SMEGA is hosting using Kubernetes clusters managed by GKE (Standard), and exposed externally through Cody Gateway.

SMEGA is stateless - it:

Operations

Repository

Kubernetes manifests

Buildkite pipeline