After an install, I only have the postgres-0 pod!

Issue

After installing your software, I only have a postgres-0 in the element-onprem namespace:

[user@element element-enterprise-installer-1.0.0]$ kubectl get pods -n element-onprem
NAME         READY   STATUS    RESTARTS   AGE
postgres-0   1/1     Running   0          3m33s

calico-kube-controllers in the kube-system namespace throwing this error:

[FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

Environment

Element Enterprise Installer 1.0.0
Red Hat Enterprise Linux 8.5.0

Resolution

On Ubuntu, edit /etc/modules and add in there a new line:

br_netfilter
On Red Hat Enterprise Linux, edit /etc/modules-load.d/snap.microk8s.conf and add in there a new line:

br_netfilter
Run:

microk8s stop
Edit /var/snap/microk8s/current/args/kube-proxy and remove the --proxy-mode line completely.
Run: sudo modprobe br_netfilter
Then run: microk8s start
After this, wait a little bit for all of the pods to finish creating and bring the rest of the stack up.

Root Cause

Looking at all my pods, there are several errors:

[user@element element-enterprise-installer-1.0.0]$ kubectl get pods -A
NAMESPACE            NAME                                         READY   STATUS             RESTARTS   AGE
kube-system          coredns-7f9c69c78c-9g5xf                     0/1     Running            0          8m3s
kube-system          calico-node-l8xmn                            1/1     Running            0          11m
container-registry   registry-9b57d9df8-xjcf5                     0/1     Pending            0          2m8s
kube-system          coredns-ddd489c4d-bhwq5                      0/1     Running            0          2m8s
kube-system          dashboard-metrics-scraper-78d7698477-pcpbg   1/1     Running            0          2m8s
kube-system          hostpath-provisioner-566686b959-bvgr5        1/1     Running            0          2m8s
kube-system          calico-kube-controllers-f7868dd95-dqd6b      0/1     CrashLoopBackOff   10         11m
element-onprem       postgres-0                                   1/1     Running            0          2m9s
kube-system          kubernetes-dashboard-85fd7f45cb-m7lkb        1/1     Running            2          2m8s
ingress              nginx-ingress-microk8s-controller-tlrqk      0/1     Running            3          2m9s
operator-onprem      osdk-controller-manager-644775db9d-jzqnb     1/2     Running            2          2m8s
kube-system          metrics-server-8bbfb4bdb-tlnzk               1/1     Running            2          2m8s

Looking at the logs for calico-kube-controllers in the kube-system namespace:

[user@element ~]$ kubectl logs -n kube-system calico-kube-controllers-f7868dd95-swpst 
2022-05-09 15:18:10.856 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", ReconcilerPeriod:"5m", CompactionPeriod:"10m", EnabledControllers:"node", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", HealthEnabled:true, SyncNodeLabels:true, DatastoreType:"kubernetes"}
W0509 15:18:10.857670       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2022-05-09 15:18:10.858 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2022-05-09 15:18:20.859 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2022-05-09 15:18:20.859 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

The reason that this is happening is under certain scenarios, microk8s fails to load the br_netfilter kernel module and this allows the calico networking to fall back to user space routing, which fails to work in this environment and causes the calico-kube-controllers pod to not start, which cascades into the rest of the stack not really coming up. More on this specific issue can be seen here: https://github.com/canonical/microk8s/issues/3085. The microk8s team does expect to release a fix and we will work to incorporate it in the future.