Symptoms
You might see one or more of following symptoms:
1. When you run the following command on the standby BIG-IQ to examine the restjavad log
$ grep 'Re-starting restjavad' /var/log/daemon*
You might see several restjavad service restarts logged. For example:
.... logger[10602]: Re-starting restjavad
.... logger[7385]: Re-starting restjavad
2. The BIG-IQ GUI might halt with a message similar to:
'Waiting for BIG-IQ services to become available'
3. If you run the following command to view the standby BIG-IQ log:
$ tail -f /var/log/ha_pg_basebackup.log
When database replication starts you will see the following message:
...Setup_slave] Configuring pg_basebackup
waiting for checkpoint
If the log shows has not reached 100% , you might see errors similar to the following:
538210/545848 kB (98%), 0/1 tablespace
pg_basebackup: could not create directory "/var/lib/pgsql/data/base/1": File exists
pg_basebackup: removing data directory "/var/lib/pgsql/data"
could not remove file or directory "/var/lib/pgsql/data": Directory not empty
pg_basebackup: failed to remove data directory
Under normal circumstances, you would typically see a success message similar to:
..Setup_slave] Completed pg_basebackup successfully.
Impact
BIG-IQ configurations with a large database or limited network bandwidth cannot form a successful BIG-IQ high availability configuration.
Conditions
When creating a BIG-IQ high availability configuration, the standby BIG-IQ pulls the database from the active BIG-IQ.
If this takes longer than 15 minutes to complete, the high availability (HA) configuration fails. This can happen on low bandwidth networks or when the database is very large.
Workaround
The following process is only to recover the standby BIG-IQ.
1. Browse to System::THIS DEVICE::BIG-IQ high availability (HA) on the primary BIG-IQ
2. DO NOT click the 'Repair Standby Database' button.
3. Log in to the standby BIG-IQ device from the command line.
4. If you see messages that restjavad service restarts, you might be unable to type commands. Type the following command to stop the service:
$ bigstart stop restjavad
5. Type the following command to reset the database on the standby BIG-IQ:
$ pgsh -i -f
6. After the standby BIG-IQ has recovered, click 'Revert to standalone' on the active BIG-IQ.
If these steps don't work, we can reset the database on the standby BIG-IQ manually by (i) stopping postgres service (ii) deleting the /var/lib/pgsql/data directory (iii) restarting postgres service.