Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12747

sanity: test 811 fail with "MDD orphan cleanup thread not quit"

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Lai Siyao <lai.siyao@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/83dd0b3a-d3ea-11e9-9fc9-52540065bddc

      onyx-33vm4: == rpc test complete, duration -o sec ================================================================ 16:37:21 (1568133441)
      onyx-33vm4: onyx-33vm4.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
      CMD: onyx-33vm4 e2label /dev/mapper/mds1_flakey 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
      CMD: onyx-33vm4 e2label /dev/mapper/mds1_flakey 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
      CMD: onyx-33vm4 e2label /dev/mapper/mds1_flakey 2>/dev/null
      Started lustre-MDT0000
      CMD: onyx-33vm4 pgrep orph_.*-MDD
       sanity test_811: @@@@@@ FAIL: MDD orphan cleanup thread not quit 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6115:error()
        = /usr/lib64/lustre/tests/sanity.sh:21633:test_811()
        = /usr/lib64/lustre/tests/test-framework.sh:6417:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:6456:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:6302:run_test()
        = /usr/lib64/lustre/tests/sanity.sh:21635:main()
      

      Attachments

        Issue Links

          Activity

            [LU-12747] sanity: test 811 fail with "MDD orphan cleanup thread not quit"
            pjones Peter Jones added a comment -

            Landed for 2.14

            pjones Peter Jones added a comment - Landed for 2.14

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37395/
            Subject: LU-12747 tests: wait properly for orhpan thread stop
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e5346a494fcb54b7f9fbc7ed4fb93003a8489231

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37395/ Subject: LU-12747 tests: wait properly for orhpan thread stop Project: fs/lustre-release Branch: master Current Patch Set: Commit: e5346a494fcb54b7f9fbc7ed4fb93003a8489231

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37395
            Subject: LU-12747 tests: wait properly for orhpan thread stop
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c5073903984d040bc5006a75e49d13dc0d7e54a1

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37395 Subject: LU-12747 tests: wait properly for orhpan thread stop Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c5073903984d040bc5006a75e49d13dc0d7e54a1
            [ 8541.797642] Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 2 clients reconnect
            [ 8542.067572] Lustre: DEBUG MARKER: pgrep orph_.*-MDD
            [ 8542.196104] Lustre: lustre-MDT0000: Recovery over after 0:01, of 2 clients 2 recovered and 0 were evicted.
            [ 8542.253891] LustreError: 27822:0:(osd_handler.c:278:osd_idc_find_or_init()) can't lookup: rc = -2
            [ 8542.255510] Lustre: 27822:0:(mdd_orphans.c:340:mdd_orphan_destroy()) lustre-MDD0000: orphan 0x200006991:0xd:0x0 [0x200006991:0xd:0x0] doesn't exist
            [ 8542.706917] Lustre: DEBUG MARKER: sanity test_811: @@@@@@ FAIL: MDD orphan cleanup thread not quit
            

            The pgrep is run shortly before mdd_orphan_destroy() is finished, a slightly longer wait would fix this.

            adilger Andreas Dilger added a comment - [ 8541.797642] Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 2 clients reconnect [ 8542.067572] Lustre: DEBUG MARKER: pgrep orph_.*-MDD [ 8542.196104] Lustre: lustre-MDT0000: Recovery over after 0:01, of 2 clients 2 recovered and 0 were evicted. [ 8542.253891] LustreError: 27822:0:(osd_handler.c:278:osd_idc_find_or_init()) can't lookup: rc = -2 [ 8542.255510] Lustre: 27822:0:(mdd_orphans.c:340:mdd_orphan_destroy()) lustre-MDD0000: orphan 0x200006991:0xd:0x0 [0x200006991:0xd:0x0] doesn't exist [ 8542.706917] Lustre: DEBUG MARKER: sanity test_811: @@@@@@ FAIL: MDD orphan cleanup thread not quit The pgrep is run shortly before mdd_orphan_destroy() is finished, a slightly longer wait would fix this.

            This seems to fail intermittently, but could be made more robust.

            adilger Andreas Dilger added a comment - This seems to fail intermittently, but could be made more robust.
            adilger Andreas Dilger added a comment - +1 on master https://testing.whamcloud.com/test_sets/64398a3e-4243-11ea-b083-52540065bddc

            People

              adilger Andreas Dilger
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: