Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6140

replay-single test 81h: umount -d /mnt/mds2 hung

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • None
    • Lustre 2.5.4
    • MDSCOUNT=4
    • 3
    • 17115

    Description

      While verifying patch http://review.whamcloud.com/13433 on Lustre b2_5 branch under DNE mode, replay-single test 81h hung as follows:

      == replay-single test 81h: DNE: unlink remote dir, drop request reply, fail 2 MDTs == 04:25:26 (1421555126)
      CMD: shadow-16vm8 lctl set_param fail_loc=0x119
      fail_loc=0x119
      Failing mds1 on shadow-16vm12
      CMD: shadow-16vm12 grep -c /mnt/mds1' ' /proc/mounts
      Stopping /mnt/mds1 (opts:) on shadow-16vm12
      CMD: shadow-16vm12 umount -d /mnt/mds1
      CMD: shadow-16vm12 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
      Failing mds2 on shadow-16vm8
      CMD: shadow-16vm8 grep -c /mnt/mds2' ' /proc/mounts
      Stopping /mnt/mds2 (opts:) on shadow-16vm8
      CMD: shadow-16vm8 umount -d /mnt/mds2
      

      Dmesg on MDS 2, MDS 3, MDS 4 (shadow-16vm8):

      Lustre: DEBUG MARKER: == replay-single test 81h: DNE: unlink remote dir, drop request reply, fail 2 MDTs == 04:25:26 (1421555126)
      Lustre: DEBUG MARKER: lctl set_param fail_loc=0x119
      Lustre: DEBUG MARKER: grep -c /mnt/mds2' ' /proc/mounts
      Lustre: DEBUG MARKER: umount -d /mnt/mds2
      INFO: task jbd2/dm-1-8:2042 blocked for more than 120 seconds.
            Not tainted 2.6.32-431.29.2.el6_lustre.g36cd22b.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      jbd2/dm-1-8   D 0000000000000001     0  2042      2 0x00000080
       ffff880061e1bd20 0000000000000046 0000000000000000 ffffffff8109afb6
       ffff880079f5d5c0 ffff8800023168e8 0000000000000bd6 ffff8800796b8040
       ffff8800796b85f8 ffff880061e1bfd8 000000000000fbc8 ffff8800796b85f8
      Call Trace:
       [<ffffffff8109afb6>] ? autoremove_wake_function+0x16/0x40
       [<ffffffffa03df91f>] jbd2_journal_commit_transaction+0x19f/0x15a0 [jbd2]
       [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
       [<ffffffff81083e1c>] ? lock_timer_base+0x3c/0x70
       [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40
       [<ffffffffa03e5c18>] kjournald2+0xb8/0x220 [jbd2]
       [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40
       [<ffffffffa03e5b60>] ? kjournald2+0x0/0x220 [jbd2]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      INFO: task umount:2642 blocked for more than 120 seconds.
            Not tainted 2.6.32-431.29.2.el6_lustre.g36cd22b.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      umount        D 0000000000000000     0  2642   2641 0x00000080
       ffff88006a39f9c8 0000000000000086 0000000000000000 0000000000000018
       ffff88006a39fa98 ffffffffa04946c3 0000000054bb35bf ffff88006a39f9c8
       ffff88007adc3058 ffff88006a39ffd8 000000000000fbc8 ffff88007adc3058
      Call Trace:
       [<ffffffffa04946c3>] ? libcfs_debug_vmsg2+0x5d3/0xbd0 [libcfs]
       [<ffffffffa03de18a>] start_this_handle+0x27a/0x4a0 [jbd2]
       [<ffffffff8116eeeb>] ? cache_alloc_refill+0x15b/0x240
       [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40
       [<ffffffffa03de5b0>] jbd2_journal_start+0xd0/0x110 [jbd2]
       [<ffffffffa0494d01>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
       [<ffffffffa03de603>] jbd2_journal_force_commit+0x13/0x30 [jbd2]
       [<ffffffffa0435327>] ldiskfs_force_commit+0x27/0x40 [ldiskfs]
       [<ffffffffa0c7fd75>] osd_sync+0xf5/0x100 [osd_ldiskfs]
       [<ffffffffa0d2abe5>] mdt_device_sync+0x35/0xd0 [mdt]
       [<ffffffffa0d39ce7>] mdt_iocontrol+0x217/0x870 [mdt]
       [<ffffffffa05e3af6>] class_cleanup+0x836/0xd30 [obdclass]
       [<ffffffffa0494d01>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
       [<ffffffffa05b9096>] ? class_name2dev+0x56/0xe0 [obdclass]
       [<ffffffffa05e555a>] class_process_config+0x156a/0x1ad0 [obdclass]
       [<ffffffffa05de6b3>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass]
       [<ffffffffa05e5c39>] class_manual_cleanup+0x179/0x6f0 [obdclass]
       [<ffffffffa05b9096>] ? class_name2dev+0x56/0xe0 [obdclass]
       [<ffffffffa062137c>] server_put_super+0x5ec/0xf60 [obdclass]
       [<ffffffff8118b63b>] generic_shutdown_super+0x5b/0xe0
       [<ffffffff8118b726>] kill_anon_super+0x16/0x60
       [<ffffffffa05e7ae6>] lustre_kill_super+0x36/0x60 [obdclass]
       [<ffffffff8118bec7>] deactivate_super+0x57/0x80
       [<ffffffff811ab8cf>] mntput_no_expire+0xbf/0x110
       [<ffffffff811ac41b>] sys_umount+0x7b/0x3a0
       [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      Maloo report: https://testing.hpdd.intel.com/test_sets/85944252-9f03-11e4-91b3-5254006e85c2

      Attachments

        Activity

          People

            wc-triage WC Triage
            yujian Jian Yu
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: