EMS Knowledge Base

The knowledge base for all Element Matrix Services provided products.

I can't upload files after updating to 0.6.1

Issue

Environment

Resolution

To resolve this issue, recursively change the permissions of the directory configured in parameters.yml as media_host_data_path. For this example, in paramters.yml, we have:

media_host_data_path: "/mnt/data"

and a quick ls on this path shows the 991 ownership:

$ ls -l /mnt/
total 4
drwxr-xr-x 3 991 991 4096 Apr 27 13:20 data

To fix this, run:

sudo chown 10991:991 -R /mnt/data

afterwards, ls should show the 10991 ownership:

$ ls -l /mnt/
total 4
drwxr-xr-x 3 10991 991 4096 Apr 27 13:20 data

and now you should be able to upload files again.

Root Cause

In this case, the installation started with 0.5.3 and in 0.6.0, we changed the UIDs that synapse runs as in order to avoid conflicting with any potential system UID. Previously, the UID was 991, but we moved to 10991. As such, this breaks permissions on the existing synapse_media directory.

You may see an error similar to this one in your synapse logs, which can be obtained by running kubectl logs -n element-onprem instance-synapse-main-0:

2022-04-27 13:28:02,521 - synapse.http.server - 100 - ERROR - POST-59388 - Failed handle request via 'UploadResource': <XForwardedForRequest at 0x7f9aa49f9e20 method='POST' uri='/_matrix/media/r0/upload' clientproto='HTTP/1.1' site='8008'>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/synapse/http/server.py", line 269, in _async_render_wrapper
    callback_return = await self._async_render(request)
  File "/usr/local/lib/python3.9/site-packages/synapse/http/server.py", line 297, in _async_render
    callback_return = await raw_callback_return
  File "/usr/local/lib/python3.9/site-packages/synapse/rest/media/v1/upload_resource.py", line 96, in _async_render_POST
    content_uri = await self.media_repo.create_content(
  File "/usr/local/lib/python3.9/site-packages/synapse/rest/media/v1/media_repository.py", line 178, in create_content
    fname = await self.media_storage.store_file(content, file_info)
  File "/usr/local/lib/python3.9/site-packages/synapse/rest/media/v1/media_storage.py", line 92, in store_file
    with self.store_into_file(file_info) as (f, fname, finish_cb):
  File "/usr/local/lib/python3.9/contextlib.py", line 119, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.9/site-packages/synapse/rest/media/v1/media_storage.py", line 135, in store_into_file
    os.makedirs(dirname, exist_ok=True)
  File "/usr/local/lib/python3.9/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/local/lib/python3.9/os.py", line 225, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/media/media_store/local_content/PQ'

synapse-haproxy container in CrashLoopBackOff state

Issue

We are seeing

[karl1@element ~]$ kubectl get pods -n element-onprem
NAME                                        READY   STATUS             RESTARTS   AGE
server-well-known-8c6bd8447-fts78           1/1     Running            2          39h
app-element-web-c5bd87777-745gh             1/1     Running            2          39h
postgres-0                                  1/1     Running            2          39h
instance-synapse-haproxy-5b4b55fc9c-jv7pp   0/1     CrashLoopBackOff   40         39h
instance-synapse-main-0                     1/1     Running            6          39h

and the synapse-haproxy container never leaves the CrashLoopBackOff state.

Environment

Resolution

Add the following lines to /etc/security/limits.conf:

*              soft    nofile  100000
*              hard    nofile  100000

and reboot the box. After a reboot, the microk8s environment will come back up and the synapse-haproxy container should run without error.

Root Cause

Check the logs of synapse-haproxy with this command:

kubectl logs -n element-onprem instance-synapse-haproxy-5b4b55fc9c-jv7pp

You will want to replace the instance name with your specific instance. See if you have this message:

'[haproxy.main()] Cannot raise FD limit to 80034, limit 65536.'

If so, you have run out of open file descriptors and as such the container cannot start.

Can't connect to local registry 127.0.0.1:32000

Issue

    "msg": "non-zero return code",
    "rc": 1,
    "start": "2022-05-26 10:37:08.441849",
    "stderr": "Error: Get \"https://localhost:32000/v2/\": dial tcp [::1]:32000: connect: connection refused; Get \"http://localhost:32000/v2/\": dial tcp [::1]:32000: connect: connection refused",
    "stderr_lines": [
        "Error: Get \"https://localhost:32000/v2/\": dial tcp [::1]:32000: connect: connection refused; Get \"http://localhost:32000/v2/\": dial tcp [::1]:32000: connect: connection refused"
    ],

Environment

Resolution

First, let's begin by removing any bits of the old image that may be in containerd. We need the name of the image to do this and looking at this error:

"cdkbot/registry-amd64:2.6": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/cdkbot/registry-amd64:2.6": failed to extract layer 

the image is named docker.io/cdkbot/registry-amd64:2.6. So we will now run:

microk8s.ctr rm docker.io/cdkbot/registry-amd64:2.6

Unmount the offending volume from the kubectl describe pod setup:

sudo umount /var/snap/microk8s/common/var/lib/containerd/tmpmounts/containerd-mount490181863

If this succeeds, then you can issue:

microk8s.ctr pull docker.io/cdkbot/registry-amd64:2.6

and if this succeeds, you can then run:

kubectl delete pod -n container-registry registry

and watch the registry come back up.

If you cannot get the mounted volume to unmount, you may need to reboot to completely clear the issue.

Root Cause

The root cause is that the registry container will not start:

$ kubectl get pods -A
NAMESPACE            NAME                                         READY   STATUS             RESTARTS   AGE
kube-system          hostpath-provisioner-566686b959-jl2b4        1/1     Running            1          69m
...
container-registry   registry-9b57d9df8-kmks4                     0/1     ImagePullBackOff   0          44m

To figure out why this won't start, we need to run kubectl describe pod -n container-registry registry:

$ kubectl describe pod -n container-registry registry
Name:         registry-9b57d9df8-k7v2r
Namespace:    container-registry
Priority:     0
Node:         mynode/192.168.122.1
Start Time:   Thu, 26 May 2022 11:33:04 -0700
Labels:       app=registry
              pod-template-hash=9b57dea58
...
  Normal   BackOff           5m41s (x4 over 7m36s)  kubelet            Back-off pulling image "cdkbot/registry-amd64:2.6"
  Warning  Failed            5m41s (x4 over 7m36s)  kubelet            Error: ImagePullBackOff
  Warning  Failed            2m58s                  kubelet            Failed to pull image "cdkbot/registry-amd64:2.6": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/cdkbot/registry-amd64:2.6": failed to extract layer sha256:8aa4fcad5eeb286fe9696898d988dc85503c6392d1a2bd9023911fb0d6d27081: failed to unmount /var/snap/microk8s/common/var/lib/containerd/tmpmounts/containerd-mount490181863: failed to unmount target /var/snap/microk8s/common/var/lib/containerd/tmpmounts/containerd-mount490181863: device or resource busy: unknown

Looking at the above, we can seee that /var/snap/microk8s/common/var/lib/containerd/tmpmounts/containerd-mount490181863 is busy and failing to unmount, thus causing our problem.

We've also noticed in this case that bits of an old image download can be left in containerd and we've updated the resolution to handle this as well.

Installer fails with AnsibleUnsafeText object has no attribute 'addons'

Issue

TASK [microk8s : convert from list to dict] ***************************************************************************************************************************************************************************************
task path: /home/user/element-enterprise-installer-2022-05.06/ansible/roles/microk8s/tasks/addons.yml:12
fatal: [localhost]: FAILED! => {
    "msg": "'ansible.utils.unsafe_proxy.AnsibleUnsafeText object' has no attribute 'addons'"
}

Environment

Resolution

Run:

microk8s.start

and then restart the installer.

Root Cause

Situations exist where the installer can get in a state that microk8s has not started but the installer thinks microk8s is running.

How to Setup Local Host Resolution Without DNS

Overview

In an Element Enterprise On-Premise environment, hostnames must resolve to the appropriate IP addresses. If you have a proper DNS server with records for these hostnames in place, then you will be good to go.

In the event that a DNS server is not available for proper hostname resolution, you may use /etc/hosts and host_aliases. This article will walk you through that.

If you choose to use this method, do note that federation outside of your local environment will not be possible. Further, using the mobile applications will not be possible as they must be able to access your environment, which typically requires DNS.

Further, this assumes that you are using the single node installer based on microk8s.

Steps

For single node installations with microk8s, if we were setting up was synapse and element and these ran on the local domain with the IP of 192.168.122.39, we would set the following entries in /etc/hosts:

192.168.122.39 element.local 
192.168.122.39 synapse.local
192.168.122.39 local

and the following in host_aliases in the parameters.yml file found in your configuration directory:

host_aliases:
  - ip: "192.168.122.39"
    hostnames:
      - "element.local"
      - "synapse.local"
      - "local"

How to Upgrade microk8s for Single Node Installations

microk8s in Single Node Installations

For Element On-Premise and Element Enterprise On-Premise, we offer a multi-node installer and a single-node installer. In our single-node installations, we install Canonical's microk8s, a lightweight distribution of kubernetes. We then use this installation to deploy our software via our kubernetes operator. All of this is managed by our installer.

That said, we do not handle the upgrading of existing microk8s installations with the installer. This document details how to upgrade microk8s when needed. If you have any questions, please do not hesitate to contact Element Support.

Upgrading microk8s

The first step in upgrading microk8s to the latest version deployed by the installer is to remove the existing microk8s installation. Given that all of microk8s is managed by a snap, we can do this without worrying about our Element Enterprise On-Premise installation. The important data for your installation is all stored outside of the snap space and will not be impacted by removing microk8s. Start by running:

sudo snap list

and just determine that microk8s is installed:

[user@element2 element-enterprise-installer-2022-05.06]$ sudo snap list
Name      Version    Rev    Tracking     Publisher   Notes
core      16-2.55.5  13250  -            canonical✓  core
core18    20220428   2409   -            canonical✓  base
microk8s  v1.21.13    3410   1.21/stable  canonical✓  classic

Once you've made sure that microk8s is installed, remove it by running:

sudo snap remove microk8s

Now at this point, you should be able to verify that microk8s is no longer installed by running:

sudo snap list

and getting output similar to:

[user@element2 element-enterprise-installer-2022-05.06]$ sudo snap list
Name      Version    Rev    Tracking     Publisher   Notes
core      16-2.55.5  13250  -            canonical✓  core
core18    20220428   2409   -            canonical✓  base

Now that you no longer have microk8s installed, you are ready to run the latest installer. Once you run the latest installer, it will install the latest version of microk8s.

When the installer finishes, you should see an upgraded version of microk8s installed if you run sudo snap list similar to:

Name      Version   Rev    Tracking       Publisher   Notes
core18    20220706  2538   latest/stable  canonical✓  base
microk8s  v1.24.3   3597   1.24/stable    canonical✓  classic
snapd     2.56.2    16292  latest/stable  canonical✓  snapd

At this point, you will need to reboot the server to restore proper networking into the microk8s cluster. After a reboot, wait for your pods to start and your Element Enterprise On-Premise installation is now running a later version of microk8s.

After upgrading to 1.0.0, postgres-0 is in CrashLoopBackOff state

Issue

Environment

Resolution

To fix this issue, first read the root cause and issue sections and double check that this is your issue. The resolution is to delete the sts, pvc, and pv for postgres, the empty data directory and then re-run the installer. These steps WILL destroy any existing Postgresql data, which in the ephemeral case (that this issue decsribes) is none.

To find where the data directory is, run:

kubectl describe pv postgres | grep -i path

This will show output similar to:

StorageClass:      microk8s-hostpath
    Type:          HostPath (bare host directory volume)
    Path:          /mnt/data/synapse-postgres
    HostPathType:  

From here, we can see that /mnt/data/synapse-postgres is where postgres is trying to initiate the database. Let's take a look at that directory:

[user@element2 element-enterprise-installer-1.0.0]$ sudo ls -l /mnt/data/synapse-postgres/
total 0
drwx------. 2 systemd-coredump input 6 Apr 26 15:13 data
[user@element2 element-enterprise-installer-1.0.0]$ sudo ls -l /mnt/data/synapse-postgres/data
total 0

As you can see, we have the data directory and it is empty. Make a note of this directory for later.

Now we need to remove the pvc and the pv. If you really do have just an empty data directory, there is no need to make a backup. If you have more than data in your postgres pv path, you will want to STOP AND MAKE A BACKUP OF THAT PATH'S CONTENTS.

Now, to delete the PVC, you will need two terminals. In one terminal, you will run:

kubectl delete pvc -n element-onprem postgres

You will notice that this command just sits there waiting once run. In another terminal, run this command:

kubectl delete pod -n element-onprem postgres-0

As soon as the pod is deleted, you should notice that the kubectl delete pvc command also completes. At this point, we need to now delete the pv:

kubectl delete pv -n element-onprem postgres

Now it is time to remove the sts for postgres:

kubectl delete sts -n element-onprem postgres

Remove the data directory:

sudo rm -r /mnt/data/synapse-postgres/data

Now re-run the installer. Once the installer is re-run, you should have a working postgresql. You should notice a running pod in kubectl get pods -n element-onprem:

postgres-0                                  1/1     Running   0              2m11s

and your /mnt/data/synapse-postgres directory should have entries similar to:

drwx------. 6 systemd-coredump input    54 May  6 10:14 base
drwx------. 2 systemd-coredump input  4096 May  6 10:15 global
drwx------. 2 systemd-coredump input     6 May  6 10:14 pg_commit_ts
drwx------. 2 systemd-coredump input     6 May  6 10:14 pg_dynshmem
-rw-------. 1 systemd-coredump input  4782 May  6 10:14 pg_hba.conf
-rw-------. 1 systemd-coredump input  1636 May  6 10:14 pg_ident.conf
drwx------. 4 systemd-coredump input    68 May  6 10:14 pg_logical
drwx------. 4 systemd-coredump input    36 May  6 10:14 pg_multixact
drwx------. 2 systemd-coredump input     6 May  6 10:14 pg_notify
drwx------. 2 systemd-coredump input     6 May  6 10:14 pg_replslot
drwx------. 2 systemd-coredump input     6 May  6 10:14 pg_serial
drwx------. 2 systemd-coredump input     6 May  6 10:14 pg_snapshots
drwx------. 2 systemd-coredump input     6 May  6 10:14 pg_stat
drwx------. 2 systemd-coredump input    63 May  6 10:15 pg_stat_tmp
drwx------. 2 systemd-coredump input    18 May  6 10:14 pg_subtrans
drwx------. 2 systemd-coredump input     6 May  6 10:14 pg_tblspc
drwx------. 2 systemd-coredump input     6 May  6 10:14 pg_twophase
-rw-------. 1 systemd-coredump input     3 May  6 10:14 PG_VERSION
drwx------. 3 systemd-coredump input    60 May  6 10:14 pg_wal
drwx------. 2 systemd-coredump input    18 May  6 10:14 pg_xact
-rw-------. 1 systemd-coredump input    88 May  6 10:14 postgresql.auto.conf
-rw-------. 1 systemd-coredump input 28156 May  6 10:14 postgresql.conf
-rw-------. 1 systemd-coredump input    36 May  6 10:14 postmaster.opts
-rw-------. 1 systemd-coredump input    94 May  6 10:14 postmaster.pid

Finally, restart the synapse pod by doing:

kubectl delete pod -n element-onprem instance-synapse-main-0

Wait for that pod to restart and be completely running again. Verify with kubectl get pods -n element-onprem that you have a line similar to:

instance-synapse-main-0                     1/1     Running   0              2m36s

Root Cause

In 0.6.1, we had a bug which caused the included postgresql database to not get written to disk and thus it did not survive restarts. The bug has been fixed in 1.0.0, however, prior versions of the installer did get as far as writing a data directory into the postgresql storage set up by microk8s. As such, postgres finds this directory on start up and fails to init a new database with the specific log mentioned in the Issue section.

If you do not have this specific error, please do not run the steps in the Resolution section of this knowledge base solution.

Integrator fails with Unable to initialise application. Failed to validate config: data/jitsi_domain must match format "hostname"...

Issue

Environment

Resolution

Root Cause

There is a bug in this installer which specifies https:// in front of the default domain and this causes the error. This will be fixed in a future installer.

After an install, I only have the postgres-0 pod!

Issue

Environment

Resolution

Root Cause

The reason that this is happening is under certain scenarios, microk8s fails to load the br_netfilter kernel module and this allows the calico networking to fall back to user space routing, which fails to work in this environment and causes the calico-kube-controllers pod to not start, which cascades into the rest of the stack not really coming up. More on this specific issue can be seen here: https://github.com/canonical/microk8s/issues/3085. The microk8s team does expect to release a fix and we will work to incorporate it in the future.

Using Self-Signed Certificates with mkcert

Overview

We do not recommend using self-signed certificates with Element Enterprise On-Premise, however, we recognize that there are times when self-signed certificates can be the fastest way forward for demo or PoC purposes. It is in this spirit that these directions are provided.

Steps

The following instructions will enable you to use a tool called mkcert to generate self-signed certificates. Element does not ship this tool and so these directions are provided as one example of how to get self-signed certificates.

Ubuntu:

sudo apt-get install wget libnss3-tools

EL:

sudo yum install wget nss-tools -y

Both EL and Ubuntu:

wget
https://github.com/FiloSottile/mkcert/releases/download/v1.4.3/mkcert-v1.4.3-linux-amd64
sudo mv mkcert-v1.4.3-linux-amd64 /usr/bin/mkcert
sudo chmod +x /usr/bin/mkcert

Once you have mkcert executable, you can run:

mkcert -install
The local CA is now installed in the system trust store! ⚡️

Now, you can verify the CA Root by doing:

mkcert -CAROOT
/home/element-demo/.local/share/mkcert

Your output may not be exactly the same, but it should be similar. Once we’ve done this, we need to generate self-signed certificates for our hostnames. The following is an example of how to do it for element.local. You will need to do this for all of the aforementioned hostnames, including the fqdn.tld.

The run for the element fqdn looks like this:

mkcert element.local element 192.168.122.39 127.0.0.1

Created a new certificate valid for the following names
- "element.local"
- "element"
- "192.168.122.39"
- "127.0.0.1"

The certificate is at "./element.local+3.pem" and the key at
"./element.local+3-key.pem" ✅

It will expire on 1 May 2024

Once you have self-signed certificates, you need to copy them into the certs directory under the config directory. Certificates in the certs directory must take the form of fqdn.crt and fqdn.key.

Using our above example, these are the commands we would need to run from the installer directory: (We ran mkcert in that directory as well.)

mkdir ~/.element-onpremise-config/certs
cp element.local+3.pem  ~/.element-onpremise-config/certs/element.local.crt
cp element.local+3-key.pem  ~/.element-onpremise-config/certs/element.local.key
cp synapse.local+3.pem  ~/.element-onpremise-config/certs/synapse.local.crt
cp synapse.local+3-key.pem  ~/.element-onpremise-config/certs/synapse.local.key
cp dimension.local+3.pem  ~/.element-onpremise-config/certs/dimension.local.crt
cp dimension.local+3-key.pem  ~/.element-onpremise-config/certs/dimension.local.key
cp hookshot.local+3.pem  ~/.element-onpremise-config/certs/hookshot.local.crt
cp hookshot.local+3-key.pem  ~/.element-onpremise-config/certs/hookshot.local.key
cp local+2.pem  ~/.element-onpremise-config/certs/local.crt
cp local+2-key.pem  ~/.element-onpremise-config/certs/local.key

Installer fails on firewalld, but firewalld is not installed

Issue

I'm seeing this with the installer, but I don't have firewalld installed:

2022-08-15 15:52:20,258 p=33 u=element n=ansible | TASK [microk8s : Check that firewalld is started if installed] ******************************************************************************************************************************************************************************
2022-08-15 15:52:20,299 p=33 u=element n=ansible | fatal: [localhost]: FAILED! => {
    "changed": false,
    "msg": "Firewalld is installed. Please start it for the installer to successfully configure it.\n"
}

Environment

Resolution

Upgrade to Element Enterprise Installer 2022-08.02, which has the fixes for this.

Root Cause

On Ubuntu, systemd will report data for firewalld if it has been installed but is then uninstalled. Our installer did not account for this scenario and upon finding this scenario, we modified our checks. Those fixes went into 2022-08.02.

Getting a 502 Bad Gateway Error When Accessing Element Web

Issue

Environment

Resolution

sudo firewall-cmd --add-service={http,https} --permanent
sudo firewall-cmd --add-masquerade --permanent
sudo firewall-cmd --reload

Root Cause

By default, firewalld does not allow masquerading (Network Address Translation, NAT) through the firewall. This causes all sorts of trouble with doing the NAT required to access pods in microk8s.

Configuring a microk8s Single Node Instance to Use a Network Proxy

Overview

If you are using the microk8s Single Node Installer and your site requires proxy access to get to the internet, making a few quick changes to your operating system configuration will enable our installer to access the resources it needs over the internet. This document discusses these changes.

Steps

We also cover the case where you need to use a proxy to access the internet. Please make sure that the following host variables are set:

Ubuntu Specific Directions

If your company's proxy is http://corporate.proxy:3128, you would edit /etc/environment and add the following lines:

HTTPS_PROXY=http://corporate.proxy:3128
HTTP_PROXY=http://corporate.proxy:3128
https_proxy=http://corporate.proxy:3128
http_proxy=http://corporate.proxy:3128
NO_PROXY=10.1.0.0/16,10.152.183.0/24,127.0.0.1
no_proxy=10.1.0.0/16,10.152.183.0/24,127.0.0.1

The IP Ranges specified to NO_PROXY and no_proxy are specific to the microk8s cluster and prevent microk8s traffic from going over the proxy.

EL Specific Directions

Using the same example of having a company proxy at http://corporate.proxy:3128, you would edit /etc/profile.d/http_proxy.sh and add the following lines:

export HTTP_PROXY=http://corporate.proxy:3128
export HTTPS_PROXY=http://corporate.proxy:3128
export http_proxy=http://corporate.proxy:3128
export https_proxy=http://corporate.proxy:3128
export NO_PROXY=10.1.0.0/16,10.152.183.0/24,127.0.0.1
export no_proxy=10.1.0.0/16,10.152.183.0/24,127.0.0.1

The IP Ranges specified to NO_PROXY and no_proxy are specific to the microk8s cluster and prevent microk8s traffic from going over the proxy.

In Conclusion

You will need to log out and back in for the environment variables to be re-read after setting them. If you already have microk8s running, you will need to issue:

microk8s.stop
microk8s.start

to have it reload the new environment variables.

If you need to use an authenticated proxy, then the URL schema for both EL and Ubuntu is as follows:

protocol:user:password@host:port

So if your proxy is corporate.proxy and listens on port 3128 without SSL and requires a username of bob and a password of inmye1em3nt then your url would be formatted:

http://bob:inmye1em3nt@corporate.proxy:3128

For further help with proxies, we suggest that you contact your proxy administrator or operating system vendor.

Installer 2022-08.01 fails to pull element web into the cluster

Issue

Environment

Resolution

It is necessary to uncomment the following variables in secrets.yml :

dockerhub_username: 
dockerhub_token: 

If you have a dockerhub_username and dockerhub_token, please define them in secrets.yml. If not, then please leave them blank but uncommented.

Root Cause

Version 2022-08.01 uses an element web image hosted in ems-image-store. A defect appeared during the migration of the image, and the installers looks for the variables dockerhub_username and dockerhub_token to know if it has to configure docker secrets into the cluster.

url.js:354 error starting dimension

Issue

Starting matrix-dimension
url.js:354
      this.auth = decodeURIComponent(rest.slice(0, atSign));
                  ^

URIError: URI malformed
    at decodeURIComponent (<anonymous>)
    at Url.parse (url.js:354:19)
    at Object.urlParse [as parse] (url.js:157:13)
    at new Sequelize (/home/node/matrix-dimension/node_modules/sequelize/dist/lib/sequelize.js:1:1292)
    at new Sequelize (/home/node/matrix-dimension/node_modules/sequelize-typescript/dist/sequelize/sequelize/sequelize.js:16:9)
    at new _DimensionStore (/home/node/matrix-dimension/build/app/db/DimensionStore.js:42:30)
    at Object.<anonymous> (/home/node/matrix-dimension/build/app/db/DimensionStore.js:106:26)
    at Module._compile (internal/modules/cjs/loader.js:1072:14)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1101:10)
    at Module.load (internal/modules/cjs/loader.js:937:32)

Environment

Resolution

Ensure that you do not have any % characters in your PostgreSQL password. Once you have removed any % characters from your PostgreSQL password, please update your configuration files and re-run the installer.

Root Cause

Dimension does not properly encode the % for it's Postgresql connection URL and this triggers the above error.

Installer fails on enabling addons

Issue

The installer is stating that it's failed and I'm seeing messages like:

skipping: [localhost] => (item=host-access) 
changed: [localhost] => (item=ingress)
FAILED - RETRYING: [localhost]: enable addons (3 retries left).
FAILED - RETRYING: [localhost]: enable addons (2 retries left).
FAILED - RETRYING: [localhost]: enable addons (1 retries left).
failed: [localhost] (item=metrics-server) => {"ansible_loop_var": "item", "attempts": 3, "changed": true, "cmd": ["/snap/bin/microk8s.enable", "metrics-server"], "delta": "0:00:09.568390", "end": "2022-04-13 12:08:41.833858", "item": {"enabled": true, "name": "metrics-server"}, "msg": "non-zero return code", "rc": -15, "start": "2022-04-13 12:08:32.265468", "stderr": "Warning: apiregistration.k8s.io/v1beta1 APIService is deprecated in v1.19+, unavailable in v1.22+; use apiregistration.k8s.io/v1 APIService", "stderr_lines": ["Warning: apiregistration.k8s.io/v1beta1 APIService is deprecated in v1.19+, unavailable in v1.22+; use apiregistration.k8s.io/v1 APIService"], "stdout": "Enabling Metrics-Server\nclusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader unchanged\nclusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator unchanged\nrolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader unchanged\napiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io unchanged\nserviceaccount/metrics-server unchanged\ndeployment.apps/metrics-server unchanged\nservice/metrics-server unchanged\nclusterrole.rbac.authorization.k8s.io/system:metrics-server unchanged\nclusterrolebinding.rbac.authorization.k8s.io/system:metrics-server unchanged\nclusterrolebinding.rbac.authorization.k8s.io/microk8s-admin unchanged", "stdout_lines": ["Enabling Metrics-Server", "clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader unchanged", "clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator unchanged", "rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader unchanged", "apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io unchanged", "serviceaccount/metrics-server unchanged", "deployment.apps/metrics-server unchanged", "service/metrics-server unchanged", "clusterrole.rbac.authorization.k8s.io/system:metrics-server unchanged", "clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server unchanged", "clusterrolebinding.rbac.authorization.k8s.io/microk8s-admin unchanged"]}
skipping: [localhost] => (item=rbac) 
changed: [localhost] => (item=registry)

Environment

Resolution

Re-run the installer until these errors clear and all of the microk8s addons are enabled.

Root Cause

There is a microk8s timing issue that we have not quite figured out.