...
SourceOne Journaling slows down dramatically when one of several archive servers is running at high CPU for sustained periods. (e.g. 98%-100% CPU Utilization)
The SourceOne worker machines use a round-robin approach to distribute incoming data across all available SourceOne servers with the 'Archive' role.When an archiving server is processing without error but is doing so slowly, that will slow down the whole archiving process.For example, if there are 5 Archive servers and 10 Workers. If Workers are processing 5 messages per second each, each archive server will be receiving 10 messages per second.As the Workers distribute messages across Archive servers, if one Archive server is taking 30 seconds to process each message while the other Archive servers are taking 0.5 seconds, every fifth message processed is going to take 30 seconds.Effectively, 5 messages will take 32 seconds to process: 30s + (0.5s X 4) = 32s
Use ES1LogViewer.exe to open the ExJbJournal.exe.log on journaling worker machines.If the log is not in verbose mode, enable verbose mode by setting the following registry value on a worker machine: [HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\EMC\SourceOne\TraceLogs\ExJBJournal.exe]"TraceVerbosity"=dword:00000004 Look for this type of entry in the verbose ExJbJournal.exe.log on any worker machine: ***WARNING*** Message Size: 28983 Time taken (in ms): ID calculation: 184 Rules/EMCMF: 228 IngestMsg: 46706 DeleteMsg: 46906 ProcessTime: 46894 MsgId: 5E31E3AC68AFB726AC41C28C8D49ECEBC059DC394222ACxxx Subject: RE: Slow Processing The key info is "***WARNING***. That message is logged when an item takes a long time to process.Highlight the record with the above text and then right-click on the TID column for that entry and choose 'Column Filter'That filters the log based on the Thread-ID (TID) for the thread that reported the warning message.Now look back in the log for an entry that begins with : "AFTER RPC Msg" That entry will contain the name of the archive server that triggered the slow processing warning.For example: AFTER RPC Msg(5E31E3AC68AFB726AC41C28C8D49ECEBC059DC394222ACxxx) to Server(ArchiveServer03) SUCCEEDED for Try(1) nServers(7) Index: 0 Folder[\JournalMail] Error: 0x00000000 The problem server seems to be 'ArchiveServer03'. Check to see if this server is showing up in other warnings from other worker machines as well.If it is, check if the machine is running at 100% CPU, and if so , try to determine what is causing it to run at sustained, high CPU.Test removing the SourceOne archive role from that machine or stop the services on it temporarily to determine if that is indeed causing the problem.The archive role can be removed by following the below steps: Open the SourceOne Console applicationExpand 'Native Archive'Select 'Server Configuration'Right-click on the Suspect Server.Choose PropertiesOn the Archive tab, deselect the 'Enabled' checkbox.Click OK.
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.