After an install, I only have the postgres-0 pod!

Issue

~~Problem~~ ~~starting~~

After installing your software, I only have a postgres-0 in the element-onprem namespace:

[user@element element-enterprise-installer-1.0.0]$ kubectl get pods -n element-onprem
NAME         READY   STATUS    RESTARTS   AGE
postgres-0   1/1     Running   0          3m33s

calico-kube-controllers in the kube-system namespace throwing this error:

[FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

Environment

Element Enterprise Installer 1.0.0

Red Hat Enterprise Linux 8.5.0

Resolution

On Ubuntu, edit /etc/modules and add in there a new line:

br_netfilter

On Red Hat Enterprise Linux, edit /etc/modules-load.d/snap.microk8s.conf and add in there a new line:

br_netfilter

Run:

microk8s stop

Edit /var/snap/microk8s/current/args/kube-proxy and remove the --proxy-mode line completely.

Run: sudo modprobe br_netfilter

Then run: microk8s start

After this, wait a little bit for all of the pods to finish creating and bring the rest of the stack up.

Root Cause

Looking at all my pods, there are several errors:

[karl1@elementuser@element element-enterprise-installer-1.0.0]$ kubectl get pods -A
NAMESPACE            NAME                                         READY   STATUS             RESTARTS   AGE
kube-system          coredns-7f9c69c78c-9g5xf                     0/1     Running            0          8m3s
kube-system          calico-node-l8xmn                            1/1     Running            0          11m
container-registry   registry-9b57d9df8-xjcf5                     0/1     Pending            0          2m8s
kube-system          coredns-ddd489c4d-bhwq5                      0/1     Running            0          2m8s
kube-system          dashboard-metrics-scraper-78d7698477-pcpbg   1/1     Running            0          2m8s
kube-system          hostpath-provisioner-566686b959-bvgr5        1/1     Running            0          2m8s
kube-system          calico-kube-controllers-f7868dd95-dqd6b      0/1     CrashLoopBackOff   10         11m
element-onprem       postgres-0                                   1/1     Running            0          2m9s
kube-system          kubernetes-dashboard-85fd7f45cb-m7lkb        1/1     Running            2          2m8s
ingress              nginx-ingress-microk8s-controller-tlrqk      0/1     Running            3          2m9s
operator-onprem      osdk-controller-manager-644775db9d-jzqnb     1/2     Running            2          2m8s
kube-system          metrics-server-8bbfb4bdb-tlnzk               1/1     Running            2          2m8s

Looking at the logs for calico-kube-controllers in the kube-system namespace:

[user@element ~]$ kubectl logs -n kube-system calico-kube-controllers-f7868dd95-swpst 
2022-05-09 15:18:10.856 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", ReconcilerPeriod:"5m", CompactionPeriod:"10m", EnabledControllers:"node", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", HealthEnabled:true, SyncNodeLabels:true, DatastoreType:"kubernetes"}
W0509 15:18:10.857670       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2022-05-09 15:18:10.858 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2022-05-09 15:18:20.859 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2022-05-09 15:18:20.859 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
[karl1@element ~]$ sudo vi /etc/modules-load.d/snap.microk8s.conf

Environment

~~Element Enterprise Installer 1.0.0~~

Resolution

The

~~Edit~~reason ~~/etc/modules~~that ~~and~~this ~~add~~is inhappening ~~there~~is aunder ~~new~~certain ~~line:~~

scenarios,

br_netfilter

~~Run:~~

microk8s stop

fails

~~Edit~~to /var/snap/microk8s/current/args/kube-proxy ~~and remove~~load the --proxy-modebr_netfilter ~~line~~kernel ~~completely.~~

module

~~Run:~~

and

sudothis modprobe br_netfilter

~~Then run:~~

sudo microk8s start

Root Cause

~~What is~~allows the ~~root~~calico ~~cause~~networking ~~for~~to fall back to user space routing, which fails to work in this environment and causes the ~~issue?~~

calico-kube-controllers

~~Formatting:~~pod ~~This~~to not start, which cascades into the rest of the stack not really coming up. More on this specific issue can be ~~free~~seen ~~form.~~here: https://github.com/canonical/microk8s/issues/3085. The microk8s team does expect to release a fix and we will work to incorporate it in the future.