Troubleshooting
Introduction to Troubleshooting
Troubleshooting the Element Installer comes down to knowing a little bit about kubernetes and how to check the status of the various resources. This guide will walk you through some of the initial steps that you'll want to take when things are going wrong.
install.sh problems
Sometimes there will be problems when running the ansible-playbook portion of the installer. When this happens, you can increase the verbosity of ansible logging by editing .ansible.rc
in the installer directory and setting:
export ANSIBLE_DEBUG=true
export ANSIBLE_VERBOSITY=4
and re-running the installer. This will generate quite verbose output, but that typically will help pinpoint what the actual problem with the installer is.
Problems post-installation
Checking Pod Status and Getting Logs
-
In general, a well-functioning Element stack has at it's minimum the following containers (or pods in kubernetes language) running:
[user@element2 ~]$ kubectl get pods -n element-onprem NAME READY STATUS RESTARTS AGE instance-synapse-main-0 1/1 Running 4 (27h ago) 6d21h postgres-0 1/1 Running 2 (27h ago) 6d21h app-element-web-688489b777-v7l2m 1/1 Running 6 (27h ago) 6d22h server-well-known-55bdb6b66-m8px6 1/1 Running 2 (27h ago) 6d21h instance-synapse-haproxy-554bd57975-z2ppv 1/1 Running 3 (27h ago) 6d21h
The above
kubectl get pods -n element-onprem
is the first place to start. You'll notice in the above, all of the pods are in theRunning
status and this indicates that all should be well. If the state is anything other than "Running" or "Creating", then you'll want to grab logs for those pods. To grab the logs for a pod, run:kubectl logs -n element-onprem <pod name>
replacing
<pod name>
with the actual pod name. If we wanted to get the logs from synapse, the specific syntax would be:kubectl logs -n element-onprem instance-synapse-main-0
and this would generate logs similar to:
2022-05-03 17:46:33,333 - synapse.util.caches.lrucache - 154 - INFO - LruCache._expire_old_entries-2887 - Dropped 0 items from caches 2022-05-03 17:46:33,375 - synapse.storage.databases.main.metrics - 471 - INFO - generate_user_daily_visits-289 - Calling _generate_user_daily_visits 2022-05-03 17:46:58,424 - synapse.metrics._gc - 118 - INFO - sentinel - Collecting gc 1 2022-05-03 17:47:03,334 - synapse.util.caches.lrucache - 154 - INFO - LruCache._expire_old_entries-2888 - Dropped 0 items from caches 2022-05-03 17:47:33,333 - synapse.util.caches.lrucache - 154 - INFO - LruCache._expire_old_entries-2889 - Dropped 0 items from caches 2022-05-03 17:48:03,333 - synapse.util.caches.lrucache - 154 - INFO - LruCache._expire_old_entries-2890 - Dropped 0 items from caches
-
Again, for every pod not in the
Running
orCreating
status, you'll want to use the above procedure to get the logs for Element to look at. -
If you don't have any pods in the
element-onprem
namespace as indicated by running the above command, then you should run:[user@element2 ~]$ kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE container-registry registry-5f697bb7df-dbzpq 1/1 Running 6 (27h ago) 6d22h kube-system dashboard-metrics-scraper-69d9497b54-hdrdq 1/1 Running 6 (27h ago) 6d22h kube-system hostpath-provisioner-7764447d7c-jckkc 1/1 Running 11 (17h ago) 6d22h element-onprem instance-synapse-main-0 1/1 Running 4 (27h ago) 6d22h element-onprem postgres-0 1/1 Running 2 (27h ago) 6d22h element-onprem app-element-web-688489b777-v7l2m 1/1 Running 6 (27h ago) 6d22h element-onprem server-well-known-55bdb6b66-m8px6 1/1 Running 2 (27h ago) 6d21h kube-system calico-kube-controllers-6966456d6b-x4scn 1/1 Running 6 (27h ago) 6d22h element-onprem instance-synapse-haproxy-554bd57975-z2ppv 1/1 Running 3 (27h ago) 6d21h kube-system calico-node-l28tp 1/1 Running 6 (27h ago) 6d22h kube-system coredns-64c6478b6c-h5jp4 1/1 Running 6 (27h ago) 6d22h ingress nginx-ingress-microk8s-controller-n6wmk 1/1 Running 6 (27h ago) 6d22h operator-onprem osdk-controller-manager-5f9d86f765-t2kn9 2/2 Running 9 (17h ago) 6d22h kube-system metrics-server-679c5f986d-msfc5 1/1 Running 6 (27h ago) 6d22h kube-system kubernetes-dashboard-585bdb5648-vrn42 1/1 Running 10 (17h ago) 6d22h
-
This is the output from a healthy system, but if you have any of these pods not in the
Running
orCreating
state, then please gather logs using the following syntax:kubectl logs -n <namespace> <pod name>
-
So to gather logs for the kubernetes ingress, you would run:
kubectl logs -n ingress nginx-ingress-microk8s-controller-n6wmk
and you would see logs similar to:
I0502 14:15:08.467258 6 leaderelection.go:248] attempting to acquire leader lease ingress/ingress-controller-leader... I0502 14:15:08.467587 6 controller.go:155] "Configuration changes detected, backend reload required" I0502 14:15:08.481539 6 leaderelection.go:258] successfully acquired lease ingress/ingress-controller-leader I0502 14:15:08.481656 6 status.go:84] "New leader elected" identity="nginx-ingress-microk8s-controller-n6wmk" I0502 14:15:08.515623 6 controller.go:172] "Backend successfully reloaded" I0502 14:15:08.515681 6 controller.go:183] "Initial sync, sleeping for 1 second" I0502 14:15:08.515705 6 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress", Name:"nginx-ingress-microk8s-controller-n6wmk", UID:"548d9478-094e-4a19-ba61-284b60152b85", APIVersion:"v1", ResourceVersion:"524688", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
Again, for all pods not in the
Running
orCreating
state, please use the above method to get log data to send to Element.
Other Commands of Interest
Some other commands that may yield some interesting data while troubleshooting are:
-
Show all persistent volumes and persistent volume claims for the
element-onprem
namespace:kubectl get pv -n element-onprem
This will give you output similar to:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-9fc3bc29-2e5d-4b88-a9cd-a4c855352404 20Gi RWX Delete Bound container-registry/registry-claim microk8s-hostpath 55d synapse-media 50Gi RWO Delete Bound element-onprem/synapse-media microk8s-hostpath 7d postgres 5Gi RWO Delete Bound element-onprem/postgres microk8s-hostpath 7d
-
Show the synapse configuration:
kubectl describe cm -n element-onprem instance-synapse-shared
and this will return output similar to:
send_federation: True start_pushers: True turn_allow_guests: true turn_shared_secret: n0t4ctuAllymatr1Xd0TorgSshar3d5ecret4obvIousreAsons turn_uris: - turns:turn.matrix.org?transport=udp - turns:turn.matrix.org?transport=tcp turn_user_lifetime: 86400000
-
Show the Element Web configuration:
kubectl describe cm -n element-onprem app-element-web
and this will return output similar to:
config.json: ---- { "default_server_config": { "m.homeserver": { "base_url": "https://synapse2.local", "server_name": "local" } }, "dummy_end": "placeholder", "integrations_jitsi_widget_url": "https://dimension.element2.local/widgets/jitsi", "integrations_rest_url": "https://dimension.element2.local/api/v1/scalar", "integrations_ui_url": "https://dimension.element2.local/element", "integrations_widgets_urls": [ "https://dimension.element2.local/widgets" ] }
-
Show the nginx configuration for Element Web: (If using nginx as your ingress controller in production or using the PoC installer.)
kubectl describe cm -n element-onprem app-element-web-nginx
and this will return output similar to:
server { listen 8080; add_header X-Frame-Options SAMEORIGIN; add_header X-Content-Type-Options nosniff; add_header X-XSS-Protection "1; mode=block"; add_header Content-Security-Policy "frame-ancestors 'self'"; add_header X-Robots-Tag "noindex, nofollow, noarchive, noimageindex"; location / { root /usr/share/nginx/html; index index.html index.htm; charset utf-8; } }
-
Check list of active kubernetes events:
kubectl get events -A
You will see a list of events or the message
No resources found
. -
Show the state of services in the
element-onprem
namespace:kubectl get services -n element-onprem
This should return output similar to:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE postgres ClusterIP 10.152.183.47 <none> 5432/TCP 6d23h app-element-web ClusterIP 10.152.183.60 <none> 80/TCP 6d23h server-well-known ClusterIP 10.152.183.185 <none> 80/TCP 6d23h instance-synapse-main-headless ClusterIP None <none> 80/TCP 6d23h instance-synapse-main-0 ClusterIP 10.152.183.105 <none> 80/TCP,9093/TCP,9001/TCP 6d23h instance-synapse-haproxy ClusterIP 10.152.183.78 <none> 80/TCP 6d23h
-
Show the status of the stateful sets in the
element-onprem
namespace:kubectl get sts -n element-onprem
This should return output similar to:
NAME READY AGE postgres 1/1 6d23h instance-synapse-main 1/1 6d23h
-
Show deployments in the
element-onprem
namespace:kubectl get deploy -n element-onprem
This will return output similar to:
NAME READY UP-TO-DATE AVAILABLE AGE app-element-web 1/1 1 1 6d23h server-well-known 1/1 1 1 6d23h instance-synapse-haproxy 1/1 1 1 6d23h
-
Show the status of all namespaces:
kubectl get namespaces
which will return output similar to:
NAME STATUS AGE kube-system Active 20d kube-public Active 20d kube-node-lease Active 20d default Active 20d ingress Active 6d23h container-registry Active 6d23h operator-onprem Active 6d23h element-onprem Active 6d23h
-
Destroy the micro8ks setup
If you wish to start over, you can reset the microk8s setup by doing:
microk8s.reset --destroy-storage
WARNING: This will destroy all of your microk8s containers and storage. Use with caution.