Skip to main content

Synapse database troubleshooting

Room Retention policy enabled causes Synapse database to consume a lot of disk space

Procedure (Kubernetes/Self Hosted Postgresql)
  1. Run the following command against synapse postgres database : \d+
  2. Check the space taken by the table state_groups_state. For example, here it's consuming 540 GB : public | state_groups_state | table | synapse_user | permanent | 244 GB |
  3. If you have Room retention policy enabled, there's a bug which causes some state groups to be orphaned, and as a consequence they are not cleaned up from the database automatically.
  4. Follow the instruction from the page synapse-find-unreferenced-state-groups. The tool is available for download in the following link rust-synapse-find-unreferenced-state-groups.
  5. On Standalone and Installer-managed postgresql database, you can use the following script to do it automatically :
#!/bin/sh
set -e

echo "Stopping Operator..."
kubectl scale deploy/element-operator-controller-manager -n operator-onprem --replicas=0
echo "Stopping Synapse..."
kubectl delete synapse/first-element-deployment -n element-onprem

while kubectl get statefulsets -n $NAMESPACE --no-headers -o custom-columns=":metadata.name" | grep -q "synapse"; do
  echo "Waiting for synapse StatefulSets to be deleted..."
  sleep 2
done

kubectl port-forward pods/synapse-postgres-0 -n element-onprem   15432:5432 &
port_forward_pid=$!
sleep 1s

POSTGRES_PASSWORD=`echo $(kubectl get secrets/first-element-deployment-synapse-secrets -n element-onprem -o yaml | grep postgresPassword | cut -d ':' -f2) | base64 -d`
POSTGRES_USER=synapse_user
POSTGRES_DB=synapse

./rust-synapse-find-unreferenced-state-groups -p postgres://$POSTGRES_USER:$POSTGRES_PASSWORD@localhost:15432/$POSTGRES_DB -o ./sgs.txt
kill -9 $port_forward_pid

kubectl cp ./sgs.txt -n element-onprem synapse-postgres-0:/tmp/sgs.txt
kubectl exec -it pods/synapse-postgres-0 -n element-onprem -- psql "dbname=$POSTGRES_DB user=$POSTGRES_USER password=$POSTGRES_PASSWORD" -c "CREATE TEMPORARY TABLE unreffed(id BIGINT PRIMARY KEY); COPY unreffed FROM '/tmp/sgs.txt' WITH (FORMAT 'csv'); DELETE FROM state_groups_state WHERE state_group IN (SELECT id FROM unreffed); DELETE FROM state_group_edges WHERE state_group IN (SELECT id FROM unreffed); DELETE FROM state_groups WHERE id IN (SELECT id FROM unreffed);"
echo "Starting Operator..."
kubectl scale deploy/element-operator-controller-manager -n operator-onprem --replicas=1