During managed service incidents, this page documents the incident response playbooks the Core Services team can use when issues arise in the Managed Services Platform (MSP) fleet and shared platform.
<aside> 💡 For more per-managed-service-service/MSP-operator-oriented guidance, refer to the Managed Services infrastructure pages instead.
</aside>
If a MSP service outage occurs, you should declare an Incidents , which more or less means using the /incident
command to create an incident. Assess the impact of the outage and configure the incident as appropriate:
owners
field in service specification to infer what channels and stakeholders need to be notified.Affected Services
field of the incident creation template.Quick links and brief summary below - for more details refer to the more generalized guidance.
mspServiceEditor
or mspServiceReader
on the service's folder:
catogory: prod
services: Entitle: mspServiceEditor
on the Managed Services
foldercatogory: internal
services: Entitle: mspServiceEditor
on the the Internal Services
foldercatogory: test
services: All engineers should have access by default (test services are placed in the Engineering Projects
folder)mspServiceEditor
and mspServiceReader
are available for convenience, and are configured in gcp/org/customer-roles/msp.tf
in the infrastructure repo. Additional roles can be requested directly via Entitle.Managed Services Platform Operators
can be used in case a non-Core-Services teammate needs access, or if there is some other issue accessing the workspace.owners
can be used for escalated access to Sourcegraph's entire Terraform Cloud account. Use with care!Service-specific guidance is generated in Managed Services infrastructure pages.
Custom Terraform (*.tf
) can be added to relevant environment workspaces in the managed-services
repository to quickly provision and manage custom infrastructure using Terraform Cloud during an incident, without needing to make significant changes to sg msp
to introduce a new resource.
In peacetime, all service workspaces are left in "VCS mode", where the remote managed-services
repository is used when running Terraform plan and apply in Terraform Cloud.
Changes to the repository automatically triggers a plan as part of repository CI, and merging to main
automatically deploys the workspaces.