Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8502

replay-vbr: umount hangs waiting for mgs_ir_fini_fs

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • Lustre 2.11.0
    • Lustre 2.10.0, Lustre 2.11.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Kit Westneat <kit.westneat@gmail.com>

      Test timed out with a lot of stack traces related to mgs_ir_fini_fs:

      [17640.500244]  [<ffffffff8163ba29>] schedule+0x29/0x70
      [17640.500244]  [<ffffffff81639719>] schedule_timeout+0x209/0x2d0
      [17640.500244]  [<ffffffff811bfad9>] ? discard_slab+0x39/0x50
      [17640.500244]  [<ffffffff81632d4d>] ? __slab_free+0x253/0x277
      [17640.500244]  [<ffffffff8163bdf6>] wait_for_completion+0x116/0x170
      [17640.500244]  [<ffffffff810b88c0>] ? wake_up_state+0x20/0x20
      [17640.500244]  [<ffffffffa0cb3a0e>] mgs_ir_fini_fs+0x27e/0x2ec [mgs]
      [17640.500244]  [<ffffffffa0ca0361>] mgs_free_fsdb+0x41/0x8e0 [mgs]
      [17640.500244]  [<ffffffffa0ca97d2>] mgs_cleanup_fsdb_list+0x52/0x70 [mgs]
      [17640.500244]  [<ffffffffa0c8fa87>] mgs_device_fini+0x97/0x5b0 [mgs]
      [17640.500244]  [<ffffffffa07d088c>] class_cleanup+0x94c/0xd80 [obdclass]
      [17640.500244]  [<ffffffffa07d3606>] class_process_config+0x2226/0x2f60 [obdclass]
      [17640.500244]  [<ffffffff811c2483>] ? __kmalloc+0x1f3/0x230
      [17640.500244]  [<ffffffffa07cd6cb>] ? lustre_cfg_new+0x8b/0x400 [obdclass]
      [17640.500244]  [<ffffffffa07d442f>] class_manual_cleanup+0xef/0x810 [obdclass]
      [17640.500244]  [<ffffffffa0802560>] server_put_super+0xb20/0xcd0 [obdclass]
      [17640.500244]  [<ffffffff811e1096>] generic_shutdown_super+0x56/0xe0
      [17640.500244]  [<ffffffff811e1472>] kill_anon_super+0x12/0x20
      [17640.500244]  [<ffffffffa07d7c92>] lustre_kill_super+0x32/0x50 [obdclass]
      [17640.500244]  [<ffffffff811e1829>] deactivate_locked_super+0x49/0x60
      [17640.500244]  [<ffffffff811e1e26>] deactivate_super+0x46/0x60
      [17640.500244]  [<ffffffff811fed95>] mntput_no_expire+0xc5/0x120
      [17640.500244]  [<ffffffff811ffecf>] SyS_umount+0x9f/0x3c0
      
      [17640.500244] mgs_lustre_noti S ffff88004c88dc00     0 21703      2 0x00000080
      [17640.500244]  ffff88004c83fbb0 0000000000000046 ffff88004c88dc00 ffff88004c83ffd8
      [17640.500244]  ffff88004c83ffd8 ffff88004c83ffd8 ffff88004c88dc00 ffff88004f27a800
      [17640.500244]  ffff88004c88dc00 0000000000000000 ffffffffa09d4b90 ffff88004c88dc00
      [17640.500244] Call Trace:
      [17640.500244]  [<ffffffffa09d4b90>] ? ldlm_completion_ast_async+0x300/0x300 [ptlrpc]
      [17640.500244]  [<ffffffff8163ba29>] schedule+0x29/0x70
      [17640.500244]  [<ffffffffa09d540d>] ldlm_completion_ast+0x62d/0x910 [ptlrpc]
      [17640.500244]  [<ffffffff810b88c0>] ? wake_up_state+0x20/0x20
      [17640.500244]  [<ffffffffa0c8e8f1>] mgs_completion_ast_generic+0xb1/0x1d0 [mgs]
      [17640.500244]  [<ffffffffa0c8ea23>] mgs_completion_ast_ir+0x13/0x20 [mgs]
      [17640.500244]  [<ffffffffa09d7ab0>] ldlm_cli_enqueue_local+0x230/0x860 [ptlrpc]
      [17640.500244]  [<ffffffffa0c8ea10>] ? mgs_completion_ast_generic+0x1d0/0x1d0 [mgs]
      [17640.500244]  [<ffffffffa09da820>] ? ldlm_blocking_ast_nocheck+0x310/0x310 [ptlrpc]
      [17640.500244]  [<ffffffffa0c93ddc>] mgs_revoke_lock+0x1ec/0x370 [mgs]
      [17640.500244]  [<ffffffffa09da820>] ? ldlm_blocking_ast_nocheck+0x310/0x310 [ptlrpc]
      [17640.500244]  [<ffffffffa0c8ea10>] ? mgs_completion_ast_generic+0x1d0/0x1d0 [mgs]
      [17640.500244]  [<ffffffffa0cb0462>] mgs_ir_notify+0x142/0x2a0 [mgs]
      [17640.500244]  [<ffffffff810b88c0>] ? wake_up_state+0x20/0x20
      [17640.500244]  [<ffffffffa0cb0320>] ? lprocfs_ir_set_state+0x170/0x170 [mgs]
      [17640.500244]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
      [17640.500244]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
      [17640.500244]  [<ffffffff816469d8>] ret_from_fork+0x58/0x90
      [17640.500244]  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
      

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/89fac3d8-634d-11e6-906c-5254006e85c2.

      Attachments

        Issue Links

          Activity

            [LU-8502] replay-vbr: umount hangs waiting for mgs_ir_fini_fs

            I've reviewed the last month of replay-vbr test 1b hangs and they are all during interop testing with master (future 2.11) clients and previous versions of Lustre servers including the b2_8 and b2_9 branches. Thus, this issues looks like it is fixed.

            jamesanunez James Nunez (Inactive) added a comment - I've reviewed the last month of replay-vbr test 1b hangs and they are all during interop testing with master (future 2.11) clients and previous versions of Lustre servers including the b2_8 and b2_9 branches. Thus, this issues looks like it is fixed.

            James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/27256
            Subject: LU-8502 test: Run LU-7372 patch against failing tests
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0f498ea376a184aadfb502fde9fee0b79319f6ea

            gerrit Gerrit Updater added a comment - James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/27256 Subject: LU-8502 test: Run LU-7372 patch against failing tests Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0f498ea376a184aadfb502fde9fee0b79319f6ea

            James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/27255
            Subject: LU-8502 test: Baseline test failure rates
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: bc450568faa8fb98d9650d9b834a83f8f5e2efb8

            gerrit Gerrit Updater added a comment - James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/27255 Subject: LU-8502 test: Baseline test failure rates Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bc450568faa8fb98d9650d9b834a83f8f5e2efb8

            I suspect this is the same issue as LU-7372, the patch is located at: https://review.whamcloud.com/17853

            jay Jinshan Xiong (Inactive) added a comment - I suspect this is the same issue as LU-7372 , the patch is located at: https://review.whamcloud.com/17853
            pjones Peter Jones added a comment -

            Jinshan

            Could you please advise on this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Jinshan Could you please advise on this one? Thanks Peter

            In this case, replay-dual was the last test set run:

            https://testing.hpdd.intel.com/test_sessions/2862ae67-6628-41c8-9561-e586502f1a13

            It passed all subtests, and then hung on the umount. The test set was then marked TIMEOUT. trevis-41vm7.log

            jcasper James Casper (Inactive) added a comment - In this case, replay-dual was the last test set run: https://testing.hpdd.intel.com/test_sessions/2862ae67-6628-41c8-9561-e586502f1a13 It passed all subtests, and then hung on the umount. The test set was then marked TIMEOUT. trevis-41vm7.log

            Seeing a lot of test set PASSes followed by test set test_0a TIMEOUTs:

            replay-dual PASS, replay-dual TIMEOUT
            https://testing.hpdd.intel.com/test_sessions/279f69ac-eda3-4fd2-a1e9-f9135f7c0d66

            replay-ost-single PASS, replay-dual TIMEOUT
            https://testing.hpdd.intel.com/test_sessions/4a7fd4bc-e055-44c6-afac-97569c944b02

            replay-dual PASS, replay-single TIMEOUT
            https://testing.hpdd.intel.com/test_sessions/bce01895-216d-4460-b513-24c7b02ef25e

            jcasper James Casper (Inactive) added a comment - Seeing a lot of test set PASSes followed by test set test_0a TIMEOUTs: replay-dual PASS, replay-dual TIMEOUT https://testing.hpdd.intel.com/test_sessions/279f69ac-eda3-4fd2-a1e9-f9135f7c0d66 replay-ost-single PASS, replay-dual TIMEOUT https://testing.hpdd.intel.com/test_sessions/4a7fd4bc-e055-44c6-afac-97569c944b02 replay-dual PASS, replay-single TIMEOUT https://testing.hpdd.intel.com/test_sessions/bce01895-216d-4460-b513-24c7b02ef25e

            People

              jay Jinshan Xiong (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: