Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9437

sanity-lfsck test_33: only 0 of 4 MDTs are in completed

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.11.0, Lustre 2.10.4
    • Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0, Lustre 2.10.2, Lustre 2.10.3
    • trevis-50, full, DNE+ZFS
        EL7, master branch, v2.9.56.11, b3565
    • 3
    • 9223372036854775807

    Description

      https://testing.hpdd.intel.com/test_sessions/30cc75b6-594f-4255-accf-24fe11bdd565

      Just before the failure occured:

      From test_log:

      Started LFSCK on the device lustre-MDT0000: scrub namespace
      CMD: trevis-50vm7 /usr/sbin/lctl lfsck_query -t namespace -M lustre-MDT0000 -w |
      		      awk '/^namespace_mdts_completed/ { print \$2 }'
      CMD: trevis-50vm7 /usr/sbin/lctl lfsck_query -t namespace -M lustre-MDT0000
      namespace_mdts_init: 0
      namespace_mdts_scanning-phase1: 0
      namespace_mdts_scanning-phase2: 0
      namespace_mdts_completed: 0
      namespace_mdts_failed: 3
      namespace_mdts_stopped: 0
      namespace_mdts_paused: 0
      namespace_mdts_crashed: 0
      namespace_mdts_partial: 0
      namespace_mdts_co-failed: 1
      namespace_mdts_co-stopped: 0
      namespace_mdts_co-paused: 0
      namespace_mdts_unknown: 0
      

      And then the same output for namespace_osts, but all are 0. Then the failure was seen:

       sanity-lfsck test_33: @@@@@@ FAIL: (5) only 0 of 4 MDTs are in completed 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:4931:error()
        = /usr/lib64/lustre/tests/sanity-lfsck.sh:142:wait_all_targets_blocked()
        = /usr/lib64/lustre/tests/sanity-lfsck.sh:5046:test_33()
        = /usr/lib64/lustre/tests/test-framework.sh:5207:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:5246:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:5093:run_test()
        = /usr/lib64/lustre/tests/sanity-lfsck.sh:5052:main()
      

      Attachments

        Issue Links

          Activity

            [LU-9437] sanity-lfsck test_33: only 0 of 4 MDTs are in completed

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31518/
            Subject: LU-9437 lfsck: handle LMV EA for migrating directory
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 251cce39e896dc668c1338ea15fa145a6b262362

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31518/ Subject: LU-9437 lfsck: handle LMV EA for migrating directory Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 251cce39e896dc668c1338ea15fa145a6b262362

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31518
            Subject: LU-9437 lfsck: handle LMV EA for migrating directory
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 17527a603219098225c31a9f9887ddca3e72aeb5

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31518 Subject: LU-9437 lfsck: handle LMV EA for migrating directory Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 17527a603219098225c31a9f9887ddca3e72aeb5

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31266/
            Subject: LU-9437 lfsck: handle LMV EA for migrating directory
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 05cfe91c2714a77f5ad3de4a7e58e20b6df17b83

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31266/ Subject: LU-9437 lfsck: handle LMV EA for migrating directory Project: fs/lustre-release Branch: master Current Patch Set: Commit: 05cfe91c2714a77f5ad3de4a7e58e20b6df17b83

            Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/31266
            Subject: LU-9437 lfsck: handle LMV EA for migrating directory
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 19f5edf2f0a52686e58c420168f5a88559812115

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/31266 Subject: LU-9437 lfsck: handle LMV EA for migrating directory Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 19f5edf2f0a52686e58c420168f5a88559812115
            yujian Jian Yu added a comment - This failure occurred more than 50 times in one week, which is affecting patch testing on master branch: https://testing.hpdd.intel.com/test_sets/8844fac6-0ce4-11e8-a6ad-52540065bddc https://testing.hpdd.intel.com/test_sets/d13951c6-0ceb-11e8-a6ad-52540065bddc https://testing.hpdd.intel.com/test_sets/5d7a4c72-0cdc-11e8-a7cd-52540065bddc
            emoly.liu Emoly Liu added a comment - +1 on master: https://testing.hpdd.intel.com/test_sets/11e161d6-0ca7-11e8-a10a-52540065bddc
            mdiep Minh Diep added a comment - we have started dne-zfs-part-2 and hit this bug https://testing.hpdd.intel.com/test_sets/1e81caf6-0b54-11e8-a7cd-52540065bddc

            This test continues to fail in full test sessions for DNE with ZFS, but there are also some hangs during this test and right after the test fails. I can’t find much information about the hang in the logs, but here are a few links to recent test_33 hangs:
            https://testing.hpdd.intel.com/test_sets/14b87b72-004c-11e8-bd00-52540065bddc
            https://testing.hpdd.intel.com/test_sets/ea7624b0-fd83-11e7-a7cd-52540065bddc
            https://testing.hpdd.intel.com/sub_tests/b2d23068-f710-11e7-a6ad-52540065bddc

            I can open a new ticket if the hang is a separate issue from the test failure.

            jamesanunez James Nunez (Inactive) added a comment - This test continues to fail in full test sessions for DNE with ZFS, but there are also some hangs during this test and right after the test fails. I can’t find much information about the hang in the logs, but here are a few links to recent test_33 hangs: https://testing.hpdd.intel.com/test_sets/14b87b72-004c-11e8-bd00-52540065bddc https://testing.hpdd.intel.com/test_sets/ea7624b0-fd83-11e7-a7cd-52540065bddc https://testing.hpdd.intel.com/sub_tests/b2d23068-f710-11e7-a6ad-52540065bddc I can open a new ticket if the hang is a separate issue from the test failure.

            People

              yong.fan nasf (Inactive)
              jcasper James Casper (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: