...
Description of problem: 1] ipa-healthcheck is using retired server 2] If a server is removed, but for some reason it didn't remove all the pieces from IDM, then ipa-healthcheck should actually be reporting on that, instead of just failing because the method it used to determine the server list, and the one it picked, didn't remove properly. 3] Since it is failing without a stack trace it may be difficult to reproduce and/or track down exactly where the problem is. Note: As per the discussion with our Engineering team, Bug is required to open Version-Release number of selected component (if applicable): ipa-server-4.9.8-7.module+el8.6.0+14337+19b76db2.x86_64 rhel8 How reproducible: Steps to Reproduce: ipa-healthcheck --debug --failures-only Loading StateFile from '/var/lib/ipa/sysrestore/sysrestore.state' Loading StateFile from '/var/lib/ipa/sysrestore/sysrestore.state' keyctl_search: Required key not available Enter password for : Internal server error HTTPSConnectionPool((host='test1.example.com', port=443): Max retries exceeded with url: /ca/rest/certs/search?size=3 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f4097949e48>: Failed to establish a new connection: [Errno -2] Name or service not known',)) [ { "source": "pki.server.healthcheck.clones.connectivity_and_data", "check": "ClonesConnectivyAndDataCheck", "result": "ERROR", "uuid": "72ad2788-e0b7-4f5e-9eeb-*******", "when": "20210707180422Z", "duration": "37.131043", "kw": { "status": "ERROR: pki-tomcat : Internal error testing CA clone. Host: test1.example.com Port: 443" } }, { "source": "ipahealthcheck.ds.dse", "check": "DSECheck", "result": "ERROR", "uuid": "61862033-4c45-***-", "when": "20220707180424Z", "duration": "0.021794", "kw": { "key": "", "items": [ "Replication", "dc=idm,dc=rccad,dc=net", "Time Skew", "Skew: 21 hours, 32 minutes, 38 seconds" ], "msg": } }, { "source": "ipahealthcheck.ds.dse", "check": "DSECheck", "result": "ERROR", "uuid": "91908b4c-218d-433c-bf5d-************", "when": "20220707180424Z", "duration": "0.021849", "kw": { "key": "", "items": [ "Replication", "o=ipaca", "Time Skew", "Skew: 12 hours, 57 minutes, 57 seconds" ], "msg": "The time skew is over 12 hours. If this time skew continues to increase\nto 24 hours then replication can potentially stop working. Please continue to\nmonitor the time skew offsets for increasing values. Setting nsslapd-ignore-time-skew\nto "on" on each replica will allow replication to continue, but if the time skew\ncontinues to increase other more serious replication problems can occur." } }, { "source": "ipahealthcheck.ds.replication", "check": "ReplicationCheck", "result": "WARNING", "uuid": "dc74ec0e-1a72-493e-beb7-c43338d1810a", "when": "20210707180426Z", "duration": "1.219723", "kw": { "key": "DSREPLLE0002", "items": [ "Replication", "Conflict Entries" ], "msg": "There were 8 conflict entries found under the replication suffix "o=ipaca"." } }, { "source": "ipahealthcheck.ipa.dna", "check": "IPADNARangeCheck", "result": "WARNING", "uuid": "433da2f7-434f-4a2c-862f-***", "when": "20220707180430Z", "duration": "0.200231", "kw": { "range_start": 0, "range_max": 0, "next_start": 0, "next_max": 0, "msg": "No DNA range defined. If no masters define a range then users and groups cannot be created." } } ] Actual results: It should point exact stack point of failure in terms of retired node (ipa replica) Expected results: ipa health-check should return the exact point of failure . It would be great help from customer point of prospective. Additional info: Somewhere there is a record of this now-removed machine and it could cause runtime issues at some point. The best way forward, if the host has been removed, is to examine LDAP for where this hostname is referenced. but this is not helpful for customer point of prospective. Though this is a opensource and we can modify the script .
Duplicate