Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: Lustre 2.4.1
    • Fix Version/s: None
    • Labels:
      None
    • Environment:
      our lustre source tree is at:
      https://github.com/jlan/lustre-nas
    • Severity:
      3
    • Rank (Obsolete):
      12492

      Description

      met threads hug. Forced reboot of mds 2 different time.

      uploading the following to ftp site:
      lustre-log.1391239242.7851.txt.gz
      vmcore-dmesg.txt.gz

      Lustre: MGS: haven't heard from client c546719d-1bcc-571f-a4e3-17f67dc35b50 (at 10.151.31.4@o2ib) in 199 seconds. I think it's dead, and I am evicting it. exp ffff880fd0587800, cur 1391236411 expire 1391236261 last 1391236212
      LNet: Service thread pid 7851 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Pid: 7851, comm: mdt01_055
      
      Call Trace:
       [<ffffffff815404c2>] schedule_timeout+0x192/0x2e0
       [<ffffffff81080610>] ? process_timeout+0x0/0x10
       [<ffffffffa04156d1>] cfs_waitq_timedwait+0x11/0x20 [libcfs]
       [<ffffffffa06d201d>] ldlm_completion_ast+0x4ed/0x960 [ptlrpc]
       [<ffffffffa06cd790>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
       [<ffffffff81063be0>] ? default_wake_function+0x0/0x20
       [<ffffffffa06d1758>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
       [<ffffffffa06d1b30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
       [<ffffffffa0dd7a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
       [<ffffffffa0dddc0c>] mdt_object_lock0+0x28c/0xaf0 [mdt]
       [<ffffffffa0dd7a90>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
       [<ffffffffa06d1b30>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
       [<ffffffffa0dde534>] mdt_object_lock+0x14/0x20 [mdt]
       [<ffffffffa0dde5a1>] mdt_object_find_lock+0x61/0x170 [mdt]
       [<ffffffffa0e0c80c>] mdt_reint_open+0x8cc/0x20e0 [mdt]
       [<ffffffffa043185e>] ? upcall_cache_get_entry+0x28e/0x860 [libcfs]
       [<ffffffffa06fadcc>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc]
       [<ffffffffa05921b0>] ? lu_ucred+0x20/0x30 [obdclass]
       [<ffffffffa0dd7015>] ? mdt_ucred+0x15/0x20 [mdt]
       [<ffffffffa0df31cc>] ? mdt_root_squash+0x2c/0x410 [mdt]
       [<ffffffffa0df7981>] mdt_reint_rec+0x41/0xe0 [mdt]
       [<ffffffffa0ddcb03>] mdt_reint_internal+0x4c3/0x780 [mdt]
       [<ffffffffa0ddd090>] mdt_intent_reint+0x1f0/0x530 [mdt]
       [<ffffffffa0ddaf3e>] mdt_intent_policy+0x39e/0x720 [mdt]
       [<ffffffffa06b2831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
       [<ffffffffa06d91ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
       [<ffffffffa0ddb3c6>] mdt_enqueue+0x46/0xe0 [mdt]
       [<ffffffffa0de1ad7>] mdt_handle_common+0x647/0x16d0 [mdt]
       [<ffffffffa0e1b615>] mds_regular_handle+0x15/0x20 [mdt]
       [<ffffffffa070b3c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
       [<ffffffffa04155de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
       [<ffffffffa0426d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
       [<ffffffffa0702729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
       [<ffffffff81055813>] ? __wake_up+0x53/0x70
       [<ffffffffa070c75e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
       [<ffffffffa070bc90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
       [<ffffffff8100c0ca>] child_rip+0xa/0x20
       [<ffffffffa070bc90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
       [<ffffffffa070bc90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
       [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      LustreError: dumping log to /tmp/lustre-log.1391239242.7851
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                green Oleg Drokin
                Reporter:
                mhanafi Mahmoud Hanafi
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: