Symptoms
vRealize Log Insight Cluster nodes are not responding.The vRealize Log Insight UI is also inaccessibleThe cassandra service crashes with log messages similar to the following seen in /var/log/loginsight/cassandra.log:
INFO [main] 2021-03-31 07:00:23,028 CassandraDaemon.java:556 - Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start itERROR [HintsDispatcher:1] 2021-03-31 07:00:24,378 HintsDispatchExecutor.java:243 - Failed to dispatch hints file 42f3e399-73fa-4c09-8562-ab495fb8e827-1617037340527-1.hints: file is corrupted ({})"
Cause
Cassandra 'hints' file being corrupted which resides under "/usr/lib/loginsight/application/lib/apache-cassandra-*/data/hints".
Impact / Risks
VRLI UI will be inaccessible if not fixed.
Resolution
To resolve this issue, follow the below steps to clear out the Cassandra hints files:Note: Make sure to take a non-memory snapshot of all nodes in the cluster before proceeding.
Stop the Log Insight service on all nodes
service loginsight stop
Remove the hints files on all nodes
rm -rf /usr/lib/loginsight/application/lib/apache-cassandra-*/data/hints/*
Force start the Cassandra service on all nodes
/usr/lib/loginsight/application/sbin/li-cassandra.sh --startnow --force
Run this command on all nodes to confirm that the output shows UN for all nodes, from all nodes.
/usr/lib/loginsight/application/lib/apache-cassandra-*/bin/nodetool-no-pass statusNote: The output will look similar to the below
Force stop the Cassandra service on all nodes
/usr/lib/loginsight/application/sbin/li-cassandra.sh --stopnow --force
Start the Log Insight service on all nodes
service loginsight start
After a few minutes, the Log Insight UI will be accessible