Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8450

replay-single test 70c: mount MDS hung

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.9.0
    • Fix Version/s: Lustre 2.9.0
    • Labels:
      None
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      replay-single test 70c hung while mounting MDS:

      Starting mds1:   /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1
      CMD: onyx-33vm7 mkdir -p /mnt/lustre-mds1; mount -t lustre   		                   /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1
      

      Console log on MDS:

      Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1; mount -t lustre                                   /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1
      LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache
      LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.2.4.127@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
      LustreError: Skipped 683 previous similar messagesLustre: 6963:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1469762588/real 1469762588]  req@ffff880051ebaa00 x1541153266605312/t0(0) o250->MGC10.2.4.126@tcp@0@lo:26/25 lens 520/544 e 0 to 1 dl 1469762613 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: 6963:0:(client.c:2113:ptlrpc_expire_one_request()) Skipped 13 previous similar messages
      Lustre: 29062:0:(service.c:1335:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply 
        req@ffff88004a8faa00 x1541153729978528/t0(0) o101->6a772ed4-43ff-dc51-4d04-2c0278989dc2@10.2.4.120@tcp:-1/-1 lens 872/3512 e 24 to 0 dl 1469763017 ref 2 fl Interpret:/0/0 rc 0/0
      Lustre: lustre-MDT0002: Client 6a772ed4-43ff-dc51-4d04-2c0278989dc2 (at 10.2.4.120@tcp) reconnecting
      Lustre: Skipped 1 previous similar message
      Lustre: lustre-MDT0002: Export ffff880057b24400 already connecting from 10.2.4.120@tcp
      Lustre: lustre-MDT0002: Export ffff880057b24400 already connecting from 10.2.4.120@tcp
      Lustre: lustre-MDT0002: Export ffff880057b24400 already connecting from 10.2.4.120@tcp
      Lustre: lustre-MDT0002: Export ffff880057b24400 already connecting from 10.2.4.120@tcp
      Lustre: Skipped 1 previous similar message
      Lustre: lustre-MDT0002: Export ffff880057b24400 already connecting from 10.2.4.120@tcp
      Lustre: Skipped 3 previous similar messages
      Lustre: lustre-MDT0002: Export ffff880057b24400 already connecting from 10.2.4.120@tcp
      Lustre: Skipped 6 previous similar messages
      Lustre: lustre-MDT0002: Export ffff880057b24400 already connecting from 10.2.4.120@tcp
      Lustre: Skipped 12 previous similar messages
      LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.2.4.127@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
      LustreError: Skipped 1909 previous similar messages
      Lustre: 6963:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1469763188/real 1469763188]  req@ffff8800546a1200 x1541153266628608/t0(0) o250->MGC10.2.4.126@tcp@0@lo:26/25 lens 520/544 e 0 to 1 dl 1469763213 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: 6963:0:(client.c:2113:ptlrpc_expire_one_request()) Skipped 19 previous similar messages
      INFO: task mdt00_002:29063 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      mdt00_002       D ffffffffa0b1d108     0 29063      2 0x00000080 
       ffff88004f7b3aa0 0000000000000046 ffff88004bc15c00 ffff88004f7b3fd8
       ffff88004f7b3fd8 ffff88004f7b3fd8 ffff88004bc15c00 ffffffffa0b1d100
       ffffffffa0b1d104 ffff88004bc15c00 00000000ffffffff ffffffffa0b1d108
      Call Trace:
       [<ffffffff8163cb09>] schedule_preempt_disabled+0x29/0x70
       [<ffffffff8163a805>] __mutex_lock_slowpath+0xc5/0x1c0
       [<ffffffff81639c6f>] mutex_lock+0x1f/0x2f
       [<ffffffffa0a8e024>] nodemap_add_member+0x34/0x1b0 [ptlrpc]
       [<ffffffffa0dbf161>] mdt_obd_reconnect+0x81/0x1d0 [mdt]
       [<ffffffffa09d1e6f>] target_handle_connect+0x1c4f/0x2e30 [ptlrpc]
       [<ffffffffa0a6f5f2>] tgt_request_handle+0x3f2/0x1320 [ptlrpc]
       [<ffffffffa0a1bccb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
       [<ffffffffa0a19888>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
       [<ffffffff810b88d2>] ? default_wake_function+0x12/0x20
       [<ffffffff810af038>] ? __wake_up_common+0x58/0x90
       [<ffffffffa0a1fd80>] ptlrpc_main+0xaa0/0x1de0 [ptlrpc]
       [<ffffffffa0a1f2e0>] ? ptlrpc_register_service+0xe40/0xe40 [ptlrpc]
       [<ffffffff810a5aef>] kthread+0xcf/0xe0
       [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
       [<ffffffff816469d8>] ret_from_fork+0x58/0x90 
       [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
      

      Maloo reports:
      https://testing.hpdd.intel.com/test_sets/3f6a9a0e-557a-11e6-906c-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/cecb3c06-54af-11e6-a39e-5254006e85c2

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                kit.westneat Kit Westneat
                Reporter:
                yujian Jian Yu
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: