Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12925

interop: conf-sanity test 62 fails with “Restart of mds1 failed!”

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.4
    • Lustre 2.13.0
    • master (2.13) servers with 2.12.3 clients
    • 3
    • 9223372036854775807

    Description

      conf-sanity test_62 fails in interop testing with master servers and b2_12 clients. This test and others started failing on 24 OCT 2019 for master 2.12.58.171 with 2.12.3 clients. The last time this test passed was for 2.12.58.155 build #3964 servers with 2.12.2 build #18 on 17 OCT 2019.

      Looking at the suite_log for the failure at https://testing.whamcloud.com/test_sets/2d201dd4-f9cc-11e9-be86-52540065bddc, we see

      CMD: trevis-6vm7 mkdir -p /mnt/lustre-mds1; mount -t lustre   /dev/mapper/mds1_flakey /mnt/lustre-mds1
      trevis-6vm7: mount.lustre: mount /dev/mapper/mds1_flakey at /mnt/lustre-mds1 failed: Invalid argument
      trevis-6vm7: This may have multiple causes.
      trevis-6vm7: Are the mount options correct?
      trevis-6vm7: Check the syslog for more info.
      Start of /dev/mapper/mds1_flakey on mds1 failed 22
       conf-sanity test_62: @@@@@@ FAIL: Restart of mds1 failed! 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:5864:error()
        = /usr/lib64/lustre/tests/test-framework.sh:1586:mount_facets()
        = /usr/lib64/lustre/tests/test-framework.sh:3361:facet_failover()
        = /usr/lib64/lustre/tests/test-framework.sh:3455:fail()
        = /usr/lib64/lustre/tests/test-framework.sh:4182:stopall()
        = /usr/lib64/lustre/tests/test-framework.sh:4455:formatall()
        = /usr/lib64/lustre/tests/conf-sanity.sh:108:reformat()
        = /usr/lib64/lustre/tests/conf-sanity.sh:90:reformat_and_config()
        = /usr/lib64/lustre/tests/conf-sanity.sh:4603:test_62()
      

      Looking at the MDS (vm7) console log, we see the following errors

      [38024.851590] Lustre: DEBUG MARKER: test -b /dev/mapper/mds1_flakey
      [38025.161866] Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey
      [38025.494661] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1; mount -t lustre   /dev/mapper/mds1_flakey /mnt/lustre-mds1
      [38025.715937] LDISKFS-fs (dm-3): mounted filesystem without journal. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
      [38025.717858] LustreError: 19847:0:(osd_handler.c:7696:osd_mount()) lustre-MDT0000-osd: device /dev/mapper/mds1_flakey is mounted w/o journal
      [38025.719942] LustreError: 19847:0:(obd_config.c:575:class_setup()) setup lustre-MDT0000-osd failed (-22)
      [38025.721511] LustreError: 19847:0:(obd_mount.c:205:lustre_start_simple()) lustre-MDT0000-osd setup error -22
      [38025.723385] LustreError: 19847:0:(obd_mount_server.c:1977:server_fill_super()) Unable to start osd on /dev/mapper/mds1_flakey: -22
      [38025.725326] LustreError: 19847:0:(obd_mount.c:1669:lustre_fill_super()) Unable to mount  (-22)
      [38025.955972] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  conf-sanity test_62: @@@@@@ FAIL: Restart of mds1 failed! 
      [38026.141053] Lustre: DEBUG MARKER: conf-sanity test_62: @@@@@@ FAIL: Restart of mds1 failed!
      

      When conf-sanity test 62 fails, we also see tests 64, 65, 66, 68 and 69 fail. Tests 63 and 67 do not fail.

      We’ve seen these tests fail only once before
      https://testing.whamcloud.com/test_sets/ee1e3636-f75d-11e9-a197-52540065bddc

      Attachments

        Issue Links

          Activity

            [LU-12925] interop: conf-sanity test 62 fails with “Restart of mds1 failed!”

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36660/
            Subject: LU-12925 test: assign right initial value for test_61
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 6ab32eedff99715b9fcbc3d3f750906a658bbd7a

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36660/ Subject: LU-12925 test: assign right initial value for test_61 Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 6ab32eedff99715b9fcbc3d3f750906a658bbd7a

            Yang Sheng (ys@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36660
            Subject: LU-12925 test: assign right initial value for test_61
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 00a274c7af2d6b7c7da377b0968919f7fbef23f6

            gerrit Gerrit Updater added a comment - Yang Sheng (ys@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36660 Subject: LU-12925 test: assign right initial value for test_61 Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 00a274c7af2d6b7c7da377b0968919f7fbef23f6

            This may be an incompatibility between the b2_12 and master test-framework.sh or conf-sanity.sh scripts (e.g. how the mount options are passed, or some option that is no longer being parsed correctly). It may say in the debug kernel logs what the problem is.

            adilger Andreas Dilger added a comment - This may be an incompatibility between the b2_12 and master test-framework.sh or conf-sanity.sh scripts (e.g. how the mount options are passed, or some option that is no longer being parsed correctly). It may say in the debug kernel logs what the problem is.
            pjones Peter Jones added a comment -

            Yang Sheng

            Could you please investigate?

            Thanks

            Peter

            pjones Peter Jones added a comment - Yang Sheng Could you please investigate? Thanks Peter

            People

              ys Yang Sheng
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: