Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-20043

lustre-initialization IPv6: mgc_apply_recover_logs() mgc: cannot find UUID by nid 'fd33:3981:3213:f020:0:5254:c8:10ad@tcp'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Medium
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Andreas Dilger <adilger@dilger.ca>

      This issue relates to the following test suite run:
      https://testing.whamcloud.com/test_sets/1fdb9023-7de9-420c-ba79-94635cea52b8

      but has been failing regularly for weeks at least.

      lustre-initialization failed with the following errors in the client console log:

      LNet: Added LNI fd33:3981:3213:f020:0:5254:3d:1567@tcp [8/256/0/180]
      :
      Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock fd33:3981:3213:f020:0:5254:e7:dfe9@tcp:/lustre /mnt/lustre
      LustreError: 13050:0:(mgc_request.c:1459:mgc_apply_recover_logs()) mgc: cannot find UUID by nid 'fd33:3981:3213:f020:0:5254:c8:10ad@tcp': rc = -2
      Lustre: 13050:0:(mgc_request.c:1638:mgc_process_recover_log()) MGCfd33:3981:3213:f020:0:5254:e7:dfe9@tcp: error processing lustre-cliir log recovery: rc = -2
      Lustre: 13050:0:(mgc_request.c:1910:mgc_process_log()) MGCfd33:3981:3213:f020:0:5254:e7:dfe9@tcp: IR log lustre-cliir failed, not fatal: rc = -2
      LustreError: lustre-MDT0000-mdc-ff208f8a85978800: operation mds_connect to node fd33:3981:3213:f020:0:5254:e7:dfe9@tcp failed: rc = -11
      LNet: 1 local NIs in recovery (showing 1): fd33:3981:3213:f020:0:5254:3d:1567@tcp
      LustreError: lustre-MDT0003-mdc-ff208f8a85978800: operation mds_connect to node fd33:3981:3213:f020:0:5254:c8:10ad@tcp failed: rc = -11
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/122764 - 5.14.0-503.40.1.el9_5.x86_64
      servers: https://build.whamcloud.com/job/lustre-reviews/122764 - 5.14.0-503.40.1_lustre.el9.x86_64

      The "mgc: cannot find UUID by nid" and "local NIs in recovery (showing 1)" messages appear to be present across most of the nodes. It might be some problem with IPv6 networking in general on the test clusters?

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      lustre-initialization lustre-initialization - "lustre-initialization timed out"

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: