During managed service incidents, this page documents the incident response playbooks the Core Services team can use when issues arise in the Managed Services Platform (MSP) fleet and shared platform.
<aside> 👋 This page is fairly high-level - looking for something more specific?
If a MSP service outage occurs, you should declare an Incidents , which more or less means using the /incident
command to create an incident. Assess the impact of the outage and configure the incident as appropriate:
owners
field in service specification to infer what channels and stakeholders need to be notified.Affected Services
field of the incident creation template.Quick links and brief summary below - for more details refer to the more generalized guidance.
mspServiceEditor
or mspServiceReader
on the service's folder:
catogory: prod
services: Entitle: mspServiceEditor
on the Managed Services
foldercatogory: internal
services: Entitle: mspServiceEditor
on the the Internal Services
foldercatogory: test
services: All engineers should have access by default (test services are placed in the Engineering Projects
folder)mspServiceEditor
and mspServiceReader
are available for convenience, and are configured in gcp/org/customer-roles/msp.tf
in the infrastructure repo. Additional roles can be requested directly via Entitle.Managed Services Platform Operators
can be used in case a non-Core-Services teammate needs access, or if there is some other issue accessing the workspace.owners
can be used for escalated access to Sourcegraph's entire Terraform Cloud account. Use with care!sourcegraph-secrets
GSM access: need for sg msp tfc
commands and using terraform apply
Service-specific guidance is generated in Managed Services infrastructure pages.
→ See ‣