Satellite - simple Golang library to provide cloud agnosticity

The content of this page hasn't been updated for years and might refer to discontinued products and projects.

Satellite is a Golang library and RESTful API that determines a host’s cloud provider with a simple HTTP call. Behind the scenes, it uses file systems and provider metadata to properly identify cloud providers.

When we started to work on Pipeline and the Banzai Cloud Pipeline Platform Operators, we soon realized how frequently we would need to find out which cloud provider the service was actually running on.

Note that Pipeline supports 6 different cloud providers

While all cloud providers have something called a metadata server, from which one can query information about the cloud itself, in some cases it is not clear how to interpret that information.

In Pipeline all the services we provide are cloud agnostic but cloud aware - and they partially rely on the seamless detection of cloud providers through our Satellite service.

Satellite uses sysfs and cloud provider specific metadata server calls to fetch and parse the required information, and makes it available through a REST API.

In this blog post we’ll be introducing Satellite, and for those who of you interested in technical details, we’ll also be discussing the services used by that project. As stated earlier, Satellite is part of Banzai Cloud Pipeline Platform, our feature-rich enterprise-grade application platform, built for containers on top of Kubernetes.

Satellite 🔗︎

Satellite

We created this project so that we would have an easy way of determining which cloud provider a cluster was running on. We are using Satellite in our PVC-operator, which handles Storage Class creation and PVC binding in a cloud agnostic way. We decided to make the project available for everyone, here. It can be used to build tools that require provider information. Here is an example:

curl http://satellite-service:8888/satellite
{name: amazon}

When gathering information about a cloud, Satellite employs two distinct approaches. First, it tries to determine the provider by using Sysfs, then - if unsuccessful - it tries the metadata server. Determining providers through metadata servers is a more precise but significantly slower approach. If sysfs does not provide enough information, Satellite falls back to gather information from the metadata server.

Satellite currently supports the following providers:

Amazon
Google
Azure
Alibaba
Oracle
DigitalOcean

To ease to deployment process, we also created a Helm chart for Satellite. It can be accessed from here. If you are using helm, please add our repository to your helm client, then install the chart with:

helm install banzaicloud-stable/satellite

Sysfs 🔗︎

Sysfs is a pseudo file system provided by the Linux kernel. It exports information about various hardware devices, kernel subsystems and associated device drivers to user space through virtual files. Satellite looks for unambiguous vendor or product files that can identify the cloud provider.

Using Metadata servers to fetch cloud specific info: 🔗︎

If the Sysfs approach fails to yield reliable results, we fall back on a slower network-based trial and error method.

Amazon 🔗︎

To identify Amazon via metadata service we use the following API call:

curl http://169.254.169.254/latest/dynamic/instance-identity/document

{
    "devpayProductCodes" : null,
    "marketplaceProductCodes" : [ "1abc2defghijklm3nopqrs4tu" ],
    "availabilityZone" : "us-west-2b",
    "privateIp" : "10.158.112.84",
    "version" : "2017-09-30",
    "instanceId" : "i-1234567890abcdef0",
    "billingProducts" : null,
    "instanceType" : "t2.micro",
    "accountId" : "123456789012",
    "imageId" : "ami-5fb8c835",
    "pendingTime" : "2016-11-19T16:32:11Z",
    "architecture" : "x86_64",
    "kernelId" : null,
    "ramdiskId" : null,
    "region" : "us-west-2"
}

We only parse a couple of fields from the response. We use imageId and instanceId, since these values are enough to identify Amazon explicitly.

Google 🔗︎

In Google’s case, it’s easier to use the metadata server, because we do not need to parse anything from the response. The metadata call we use here is as follows:

curl -H Metadata-Flavor:Google http://metadata.google.internal/computeMetadata/v1/instance/tags

Azure 🔗︎

In Azure’s case, the metadata service IP address is the same as for Amazon. Unfortunately, Azure does not provide us with a lot of useful information to help us determine the identity of our provider, so, here, we test the required query string. If it succeeds, we assume that we’re running on Azure.

curl -H Metadata:true http://169.254.169.254/metadata/instance?api-version=2017-04-02

Alibaba 🔗︎

Alibaba provides a metadata server on a different IP address. We can use the same approach we used with Azure. Fortunately, this call returns usable information.

curl http://100.100.100.200/latest/meta-data/instance/instance-type
ecs.sn1.large

Oracle 🔗︎

To identify Oracle via metadata service we use the following call:

curl http://169.254.169.254/opc/v1/instance/metadata/
{
  "oke-subnet-label" : "s6zwh2ejmba",
  "oke-tm" : "oke",
  "oke-k8version" : "v1.10.3",
  "oke-slot" : "0",
  "oke-pool-id" : "ocid1.nodepool.oc1.eu-frankfurt-1.aaaaaaaaaezwgy3egyydgnlchfqwmmtdg44wemzume4tqn3ghnrdimtbga2g",
  "oke-cluster-label" : "crtamtfgvrg",
  "oke-image-name" : "Oracle-Linux-7.5",
  "oke-initial-node-labels" : "pipeline-nodepool-name=pool1",
  "oke-pool-label" : "nrdimtbga2g",
  "oke-compartment-name" : "k8stest",
  "oke-ad" : "sAkO:EU-FRANKFURT-1-AD-1",
  "oke-cluster-id" : "ocid1.cluster.oc1.eu-frankfurt-1.aaaaaaaaaezdsyrthaygiyrqg5qtgobtmy4dmmddmu2tsobsmcrtamtfgvrg"
}

We only need to parse one field from the response: oke-tm, which is enough to identify Oracle.

DigitalOcean 🔗︎

DigitalOcean also uses the “standard” metadata server IP address, but, in addition, it returns a json object with a great quantity of information about the cluster.

curl http://169.254.169.254/metadata/v1.json

{
  "droplet_id":2756294,
  "hostname":"sample-droplet",
  "vendor_data":"#cloud-config\ndisable_root: false\nmanage_etc_hosts: true\n\ncloud_config_modules:\n - ssh\n - set_hostname\n - [ update_etc_hosts, once-per-instance ]\n\ncloud_final_modules:\n - scripts-vendor\n - scripts-per-once\n - scripts-per-boot\n - scripts-per-instance\n - scripts-user\n",
  "region":"nyc3",
  ...
}

The droplet_id is a unique field which we use to identify DigitalOcean clusters.

Future plans 🔗︎

We are planning to add support for more cloud providers in the coming months.

If you’d like to learn more about Banzai Cloud, check out our other posts on this blog, the Pipeline, Hollowtrees and Bank-Vaults projects.

Related resources

Four ways to build hybrid clouds with Kubernetes

article

Banzai Cloud Pipeline, the hybrid any-cloud platform

article

The Horizontal Pod Autoscaler operator reloaded

article