Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17882

failover-part-* lustre-initialization: mkfs.lustre: Invalid NID string 'trevis-70vm7:trevis-70vm8'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • Lustre 2.16.0
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/c9145dc4-956b-416d-acf3-29340ca918bd

      lustre-initialization failed with the following error due to passing multiple NIDs to mkfs.lustre --mgsnode without the "@tcp" nettype:

      mkfs.lustre --mgsnode=trevis-70vm7:trevis-70vm8 --fsname=lustre --ost --index=0 --failnode=trevis-70vm6@tcp --param=sys.timeout=20 --backfstype=ldiskfs --device-size=8388608 --mkfsoptions=\"-b 4096\" --reformat /dev/lvm-Role_OSS/P1
      mkfs.lustre: Invalid NID string 'trevis-70vm7:trevis-70vm8'
      mkfs.lustre: Can't parse NID 'trevis-70vm7:trevis-70vm8'
      mkfs.lustre: exiting with 1 (Operation not permitted)
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-master/4530 - 5.14.0-362.18.1.el9_3.x86_64
      servers: https://build.whamcloud.com/job/lustre-master/4530 - 5.14.0-362.18.1_lustre.el9.x86_64

      It looks like this has been failing continuously since patch https://review.whamcloud.com/50362 "LU-10391 obdclass: handle large NIDs for mount strings" landed on 2023-08-24.

      https://testing.whamcloud.com/search?client_branch_type_id=24a6947e-04a9-11e1-bb5f-52540025f9af&test_groups%5B%5D=failover&test_groups%5B%5D=failover-part-1&test_groups%5B%5D=failover-part-2&test_groups%5B%5D=failover-part-3&test_groups%5B%5D=failover-zfs-part-1&test_groups%5B%5D=failover-zfs-part-2&test_groups%5B%5D=failover-zfs-part-3&test_set_script_id=5e9346a2-09e0-11e9-a2cc-52540065bddc&start_date=2023-08-01&end_date=2023-10-01&source=sub_tests#redirect

      Since the failover-part-* sessions are only run during "full" testing, this was not visible during patch review testing.

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      lustre-initialization lustre-initialization - "lustre-initialization timed out"

      Attachments

        Issue Links

          Activity

            People

              simmonsja James A Simmons
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: