Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1520

client fails MDS connection and stack threads on another client

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Minor
    • Lustre 1.8.9
    • Lustre 1.8.7
    • None
    • 3
    • 7592

    Description

      An client (cluster1) failed connection to MDS and recovered, but failed connection again by some reasons.

      Jun 11 11:28:45 cluster1 kernel: Lustre: 30906:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1402727385081248 sent from lustre-MDT0000-mdc-
      ffff880c06249800 to NID 192.168.3.45@o2ib 995s ago has timed out (995s prior to deadline).
      Jun 11 11:28:45 cluster1 kernel:  req@ffff880293aaf800 x1402727385081248/t0 o101->lustre-MDT0000_UUID@192.168.3.45@o2ib:12/10 lens 560/1616 e 3 to 1 dl 
      1339381725 ref 1 fl Rpc:/0/0 rc 0/0
      

      few hours later, call traces showed up on another client (cluster3).

      Jun 11 15:03:10 cluster3 kernel: Call Trace:
      Jun 11 15:03:10 cluster3 kernel: [<ffffffff814dbcd5>] schedule_timeout+0x215/0x2e0
      Jun 11 15:03:10 cluster3 kernel: [<ffffffffa086808d>] ? lustre_msg_early_size+0x6d/0x70 [ptlrpc]
      Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0996244>] ? mdc_intent_open_pack+0x364/0x530 [mdc]
      Jun 11 15:03:10 cluster3 kernel: [<ffffffff8115a1ae>] ? cache_alloc_refill+0x9e/0x240
      Jun 11 15:03:10 cluster3 kernel: [<ffffffff814dcbf2>] __down+0x72/0xb0
      Jun 11 15:03:10 cluster3 kernel: [<ffffffff81093f61>] down+0x41/0x50
      Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0997173>] mdc_enqueue+0x283/0xa20 [mdc]
      Jun 11 15:03:10 cluster3 kernel: [<ffffffffa081fbef>] ? __ldlm_handle2lock+0x9f/0x3d0 [ptlrpc]
      Jun 11 15:03:10 cluster3 kernel: [<ffffffffa081fbef>] ? __ldlm_handle2lock+0x9f/0x3d0 [ptlrpc]
      Jun 11 15:03:10 cluster3 kernel: [<ffffffffa09987d2>] mdc_intent_lock+0x102/0x440 [mdc]
      Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0853e90>] ? ptlrpc_req_finished+0x10/0x20 [ptlrpc]
      Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0a431a5>] ? ll_lookup_it+0x405/0x870 [lustre]
      Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0a40490>] ? ll_mdc_blocking_ast+0x0/0x5f0 [lustre]
      Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0a402ee>] ? ll_prepare_mdc_op_data+0xbe/0x120 [lustre]
      Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0a40490>] ? ll_mdc_blocking_ast+0x0/0x5f0 [lustre]
      Jun 11 15:03:10 cluster3 kernel: [<ffffffffa083f770>] ? ldlm_completion_ast+0x0/0x8a0 [ptlrpc]
      Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0a402ee>] ? ll_prepare_mdc_op_data+0xbe/0x120 [lustre]
      Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0a430b5>] ll_lookup_it+0x315/0x870 [lustre]
      Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0a40490>] ? ll_mdc_blocking_ast+0x0/0x5f0 [lustre]
      Jun 11 15:03:10 cluster3 kernel: [<ffffffffa06f97c1>] ? cfs_alloc+0x91/0xf0 [libcfs]
      Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0a43ac8>] ll_lookup_nd+0x88/0x470 [lustre]
      Jun 11 15:03:10 cluster3 kernel: [<ffffffff8118ad4e>] ? d_alloc+0x13e/0x1b0
      Jun 11 15:03:10 cluster3 kernel: [<ffffffff81181c02>] __lookup_hash+0x102/0x160
      Jun 11 15:03:10 cluster3 kernel: [<ffffffff81181d3a>] lookup_hash+0x3a/0x50
      Jun 11 15:03:10 cluster3 kernel: [<ffffffff81182768>] do_filp_open+0x2c8/0xd90
      Jun 11 15:03:10 cluster3 kernel: [<ffffffff8118f1e2>] ? alloc_fd+0x92/0x160
      Jun 11 15:03:10 cluster3 kernel: [<ffffffff8116f989>] do_sys_open+0x69/0x140
      Jun 11 15:03:10 cluster3 kernel: [<ffffffff8116faa0>] sys_open+0x20/0x30
      Jun 11 15:03:10 cluster3 kernel: [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
      

      I will upload the all log files soon.

      Attachments

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              ihara Shuichi Ihara (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: