Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-20043

lustre-initialization IPv6: mgc_apply_recover_logs() mgc: cannot find UUID by nid 'fd33:3981:3213:f020:0:5254:c8:10ad@tcp'

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Medium Medium
    • None
    • None
    • None
    • 3
    • 9223372036854775807

      This issue was created by maloo for Andreas Dilger <adilger@dilger.ca>

      This issue relates to the following test suite run:
      https://testing.whamcloud.com/test_sets/1fdb9023-7de9-420c-ba79-94635cea52b8

      but has been failing regularly for weeks at least.

      lustre-initialization failed with the following errors in the client console log:

      LNet: Added LNI fd33:3981:3213:f020:0:5254:3d:1567@tcp [8/256/0/180]
      :
      Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock fd33:3981:3213:f020:0:5254:e7:dfe9@tcp:/lustre /mnt/lustre
      LustreError: 13050:0:(mgc_request.c:1459:mgc_apply_recover_logs()) mgc: cannot find UUID by nid 'fd33:3981:3213:f020:0:5254:c8:10ad@tcp': rc = -2
      Lustre: 13050:0:(mgc_request.c:1638:mgc_process_recover_log()) MGCfd33:3981:3213:f020:0:5254:e7:dfe9@tcp: error processing lustre-cliir log recovery: rc = -2
      Lustre: 13050:0:(mgc_request.c:1910:mgc_process_log()) MGCfd33:3981:3213:f020:0:5254:e7:dfe9@tcp: IR log lustre-cliir failed, not fatal: rc = -2
      LustreError: lustre-MDT0000-mdc-ff208f8a85978800: operation mds_connect to node fd33:3981:3213:f020:0:5254:e7:dfe9@tcp failed: rc = -11
      LNet: 1 local NIs in recovery (showing 1): fd33:3981:3213:f020:0:5254:3d:1567@tcp
      LustreError: lustre-MDT0003-mdc-ff208f8a85978800: operation mds_connect to node fd33:3981:3213:f020:0:5254:c8:10ad@tcp failed: rc = -11
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/122764 - 5.14.0-503.40.1.el9_5.x86_64
      servers: https://build.whamcloud.com/job/lustre-reviews/122764 - 5.14.0-503.40.1_lustre.el9.x86_64

      The "mgc: cannot find UUID by nid" and "local NIs in recovery (showing 1)" messages appear to be present across most of the nodes. It might be some problem with IPv6 networking in general on the test clusters?

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      lustre-initialization lustre-initialization - "lustre-initialization timed out"

            wc-triage WC Triage
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: