Details
-
Bug
-
Resolution: Duplicate
-
Medium
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Andreas Dilger <adilger@dilger.ca>
This issue relates to the following test suite run:
https://testing.whamcloud.com/test_sets/1fdb9023-7de9-420c-ba79-94635cea52b8
but has been failing regularly for weeks at least.
lustre-initialization failed with the following errors in the client console log:
LNet: Added LNI fd33:3981:3213:f020:0:5254:3d:1567@tcp [8/256/0/180] : Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock fd33:3981:3213:f020:0:5254:e7:dfe9@tcp:/lustre /mnt/lustre LustreError: 13050:0:(mgc_request.c:1459:mgc_apply_recover_logs()) mgc: cannot find UUID by nid 'fd33:3981:3213:f020:0:5254:c8:10ad@tcp': rc = -2 Lustre: 13050:0:(mgc_request.c:1638:mgc_process_recover_log()) MGCfd33:3981:3213:f020:0:5254:e7:dfe9@tcp: error processing lustre-cliir log recovery: rc = -2 Lustre: 13050:0:(mgc_request.c:1910:mgc_process_log()) MGCfd33:3981:3213:f020:0:5254:e7:dfe9@tcp: IR log lustre-cliir failed, not fatal: rc = -2 LustreError: lustre-MDT0000-mdc-ff208f8a85978800: operation mds_connect to node fd33:3981:3213:f020:0:5254:e7:dfe9@tcp failed: rc = -11 LNet: 1 local NIs in recovery (showing 1): fd33:3981:3213:f020:0:5254:3d:1567@tcp LustreError: lustre-MDT0003-mdc-ff208f8a85978800: operation mds_connect to node fd33:3981:3213:f020:0:5254:c8:10ad@tcp failed: rc = -11
Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/122764 - 5.14.0-503.40.1.el9_5.x86_64
servers: https://build.whamcloud.com/job/lustre-reviews/122764 - 5.14.0-503.40.1_lustre.el9.x86_64
The "mgc: cannot find UUID by nid" and "local NIs in recovery (showing 1)" messages appear to be present across most of the nodes. It might be some problem with IPv6 networking in general on the test clusters?
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
lustre-initialization lustre-initialization - "lustre-initialization timed out"