In this blog post, I’m going to try to explain in my own words a high level overview of what Metal3 is, the motivation behind it and some concepts related to a ‘baremetal operator’.
Let’s have some definitions!
Custom Resource Definition ๐
The k8s API provides some out-of-the-box objects such as pods, services, etc. There are a few methods of extending the k8s API (such as API extensions) but since a few releases back, the k8s API can be extended easily with custom resources definitions (CRDs). Basically this means you can virtually create any type of object definition in k8s (actually only user with cluster-admin capabilities) with a yaml such as:
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
# name must match the spec fields below, and be in the form: <plural>.<group>
name: crontabs.stable.example.com
spec:
# group name to use for REST API: /apis/<group>/<version>
group: stable.example.com
# list of versions supported by this CustomResourceDefinition
versions:
- name: v1
# Each version can be enabled/disabled by Served flag.
served: true
# One and only one version must be marked as the storage version.
storage: true
# either Namespaced or Cluster
scope: Namespaced
names:
# plural name to be used in the URL: /apis/<group>/<version>/<plural>
plural: crontabs
# singular name to be used as an alias on the CLI and for display
singular: crontab
# kind is normally the CamelCased singular type. Your resource manifests use this.
kind: CronTab
# shortNames allow shorter string to match your resource on the CLI
shortNames:
- ct
preserveUnknownFields: false
validation:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
cronSpec:
type: string
image:
type: string
replicas:
type: integer
And after kubectl apply -f
you can kubectl get crontabs
.
There are tons of information with regards to CRDs, like the k8s official documentation.
The CRD by himself is not useful ‘per se’ as nobody will take care of it (that’s why I said definition). It
requires a controller
to watch for those new objects and react to different
events affecting the object.
Controller ๐
A controller is basically a loop that watches the current status of an object and if it is different from the desired status, it fix it (reconciliation). This is why k8s is ‘declarative’, you specify the object desired status instead ‘how to do it’ (imperative).
Again, there are tons of documentation (and examples) around the controller pattern which is basically the k8s roots, so I’ll let your google-foo take care of it :)
Operator ๐
An Operator (in k8s slang) is an application running in your k8s cluster that deploys, manages and maintain (so, operates) a k8s application.
This k8s application (the one that the operator manages), can be as simple as a ‘hello world’ application containerized and deployed in your k8s cluster or it can be a much more complex thing, such a database cluster.
The ‘operator’ is like an ‘expert sysadmin’ containerized that takes care of your application.
Bear in mind that the ‘expert’ tag (meaning the automation behind the operator) depends on the operator implementation… so there can be basic operators that only deploy your application or complex operators that handle day 2 operations such as upgrades, fail overs, backup/restore, etc.
See the CoreOS operator definition for more information.
Cloud Controller Manager ๐
k8s code is smart enough to be able to leverage the underlying infrastructure where the cluster is running, such as being able of creating ‘LoadBalancer’ services, understanding the cluster topology based on the cloud provider AZs where the nodes are running (for scheduling reasons), etc.
This task of ‘talking to the cloud provider’ is performed by the Cloud Controller Manager (CCM) and for more information you can take a look at the official k8s documentation with regards the architecture and the administration (also, if you are brave enough, you can create your own cloud controller manager )
Cluster API ๐
The Cluster API implementation is a WIP ‘framework’ that allows a k8s cluster to manage itself, including the ability of creating new clusters, adding more nodes, etc. in a ‘k8s way’ (declarative, controllers, CRDs, etc.), so there are objects such as Cluster
that can be expressed as k8s objects:
apiVersion: "cluster.k8s.io/v1alpha1"
kind: Cluster
metadata:
name: mycluster
spec:
clusterNetwork:
services:
cidrBlocks: ["10.96.0.0/12"]
pods:
cidrBlocks: ["192.168.0.0/16"]
serviceDomain: "cluster.local"
providerSpec:
...
but also:
There are some provider implementations in the wild such as the AWS, Azure, GCP, OpenStack, vSphere, etc. ones and the Cluster API project is driven by the SIG Cluster Lifecycle.
Please review the official Cluster API repository for more information.
Actuator ๐
The actuator
is a Cluster API interface that reacts to changes to Machine
objects reconciliating the Machine
status.
The actuator code is tightly coupled with the provider (that’s why it is an interface) such as the AWS one.
MachineSet vs Machine ๐
To simplify, let’s say that MachineSets
are to Machines
what ReplicaSets
are
to Pods
. So you can scale the Machines
in your cluster just by changing
the number of replicas of a MachineSet
.
Cluster API vs Cloud Providers ๐
As we have seen, the Cluster API leverages the provider related to the k8s infrastructure itself (clusters and nodes) and the CCM and the cloud provider integration for k8s is to leverage the cloud provider to provide support infrastructure.
Let’s say Cluster API is for the k8s administrators and the CCM is for the k8s users :)
Machine API ๐
The OpenShift 4 Machine API is a combination of some of the upstream Cluster API with custom OpenShift resources and it is designed to work in conjunction with the Cluster Version Operator.
OpenShift’s Machine API Operator ๐
The machine-api-operator is an operator that manages the Machine API objects in an OpenShift 4 cluster.
The operator is capable of creating machines in AWS and libvirt (more providers
coming soon) via the Machine Controller
and it is included out of the
box with OCP 4 (and can be deployed in a k8s vanilla as well)
Baremetal ๐
A baremetal server (or bare-metal) is just a computer server.
The last years terms such as virtualization, containers, serverless, etc. have been popular but at the end of the day, all the code running on top of a SaaS, PaaS or IaaS is actually running in a real physical server stored in a datacenter wired to routers, switches and power. That server is a ‘baremetal’ server.
If you are used to cloud providers and instances, you probably don’t know the pains of baremetal management… including things such as connecting to the virtual console (usually it requires an old java version) to debug issues, configuring pxe for provisioning baremetal hosts (or attach ISOs via the virtual console… or insert a CD/DVD physically into the CD carry if you are ‘lucky’ enough…), configuring VLANs for traffic isolation, etc.
That kind of operations is not ‘cloud’ ready and there are tools that provide baremetal management, such as maas or ironic.
Ironic ๐
OpenStack bare metal provisioning (or ironic) is an open source project (or even better, a number of open source projects) to manage baremetal hosts. Ironic avoids the administrator to deal with pxe configuration, manual deployments, etc. and provides a defined API and a series of plugins to interact with different baremetal models and vendors.
Ironic is used in OpenStack to provide baremetal
objects but there are some
projects (such as bifrost) to use
Ironic ‘standalone’ (so, no OpenStack required)
Metal3 ๐
Metal3 is a project aimed at providing a baremetal operator that implements the Cluster API framework required to be able to manage baremetal in a k8s way (easy peasy!). It uses ironic under the hood to avoid reinventing the wheel, but consider it as an implementation detail that may change.
The Metal3 baremetal operator watches for BareMetalHost
(CRD) objects defined as:
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: my-worker-0
spec:
online: true
bootMACAddress: 00:11:22:33:44:55
bmc:
address: ipmi://my-worker-0.ipmi.example.com
credentialsName: my-worker-0-bmc-secret
There are a few more fields in the BareMetalHost
object such as the image, hardware profile, etc.
The Metal3 project is actually divided into two different components:
baremetal-operator ๐
The Metal3 baremetal-operator is the component that manages baremetal hosts. It exposes a new BareMetalHost
custom resource in the k8s API that lets you manage hosts in a declarative way.
cluster-api-provider-baremetal ๐
The Metal3 cluster-api-provider-baremetal includes the integration with the Cluster API project. This provider currently includes a Machine actuator that acts as a client of the BareMetalHost custom resources.
BareMetalHost vs Machine vs Node ๐
BareMetalHost
is a Metal3 objectMachine
is a Cluster API object- Node is where the pods run :)
Those three concepts are linked in a 1:1:1 relationship meaning:
A BareMetalHost
created with Metal3 maps to a Machine
object and once the
installation procedure finishes, a new kubernetes node will be added to the
cluster.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
my-node-0.example.com Ready master 25h v1.14.0
$ kubectl get machines --all-namespaces
NAMESPACE NAME INSTANCE STATE TYPE REGION ZONE AGE
openshift-machine-api my-node-0 25h
$ kubectl get baremetalhosts --allnamespaces
NAMESPACE NAME STATUS PROVISIONING STATUS MACHINE BMC HARDWARE PROFILE ONLINE ERROR
openshift-machine-api my-node-0 OK provisioned my-node-0.example.com ipmi://1.2.3.4 unknown true
The 1:1 relationship for the BareMetalHost
and the Machine
is stored in the
machineRef
field in the BareMetalHost
object:
$ kubectl get baremetalhost/my-node-0 -n openshift-machine-api -o jsonpath='{.spec.machineRef}'
map[name:my-node-0 namespace:openshift-machine-api]
In a Machine
annotation:
$ kubectl get machines my-node-0 -n openshift-machine-api -o jsonpath='{.metadata.annotations}'
map[metal3.io/BareMetalHost:openshift-machine-api/my-node-0]
The 1:1 relationship for the Machine
and the Node
currently requires to
execute the link-machine-and-node.sh script to modify the Machine
object .status
field:
./link-machine-and-node.sh MACHINE NODE
Then, the reference is stored in the .status.nodeRef.name
field in the
Machine
object:
$ kubectl get machine my-node-0 -o jsonpath='{.status.nodeRef.name}'
my-node-0.example.com
Recap ๐
Being able to ‘just scale a node’ in k8s means a lot of underlying concepts and technologies involved behind the scenes :)
Resources/links ๐
- https://dzone.com/articles/introducing-the-kubernetes-cluster-api-project-2
- https://blogs.vmware.com/cloudnative/2019/03/14/what-and-why-of-cluster-api/
- https://kubernetes-sigs.github.io/cluster-api/
- https://github.com/kubernetes-sigs/cluster-api-provider-aws
- https://itnext.io/deep-dive-to-cluster-api-a0b4e792d57d
- https://www.linux.com/blog/event/kubecon/2018/4/extending-kubernetes-cluster-api