Details
-
Bug
-
Resolution: Won't Fix
-
Minor
-
Lustre 1.8.7
-
None
-
3
-
7592
Description
An client (cluster1) failed connection to MDS and recovered, but failed connection again by some reasons.
Jun 11 11:28:45 cluster1 kernel: Lustre: 30906:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1402727385081248 sent from lustre-MDT0000-mdc- ffff880c06249800 to NID 192.168.3.45@o2ib 995s ago has timed out (995s prior to deadline). Jun 11 11:28:45 cluster1 kernel: req@ffff880293aaf800 x1402727385081248/t0 o101->lustre-MDT0000_UUID@192.168.3.45@o2ib:12/10 lens 560/1616 e 3 to 1 dl 1339381725 ref 1 fl Rpc:/0/0 rc 0/0
few hours later, call traces showed up on another client (cluster3).
Jun 11 15:03:10 cluster3 kernel: Call Trace: Jun 11 15:03:10 cluster3 kernel: [<ffffffff814dbcd5>] schedule_timeout+0x215/0x2e0 Jun 11 15:03:10 cluster3 kernel: [<ffffffffa086808d>] ? lustre_msg_early_size+0x6d/0x70 [ptlrpc] Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0996244>] ? mdc_intent_open_pack+0x364/0x530 [mdc] Jun 11 15:03:10 cluster3 kernel: [<ffffffff8115a1ae>] ? cache_alloc_refill+0x9e/0x240 Jun 11 15:03:10 cluster3 kernel: [<ffffffff814dcbf2>] __down+0x72/0xb0 Jun 11 15:03:10 cluster3 kernel: [<ffffffff81093f61>] down+0x41/0x50 Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0997173>] mdc_enqueue+0x283/0xa20 [mdc] Jun 11 15:03:10 cluster3 kernel: [<ffffffffa081fbef>] ? __ldlm_handle2lock+0x9f/0x3d0 [ptlrpc] Jun 11 15:03:10 cluster3 kernel: [<ffffffffa081fbef>] ? __ldlm_handle2lock+0x9f/0x3d0 [ptlrpc] Jun 11 15:03:10 cluster3 kernel: [<ffffffffa09987d2>] mdc_intent_lock+0x102/0x440 [mdc] Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0853e90>] ? ptlrpc_req_finished+0x10/0x20 [ptlrpc] Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0a431a5>] ? ll_lookup_it+0x405/0x870 [lustre] Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0a40490>] ? ll_mdc_blocking_ast+0x0/0x5f0 [lustre] Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0a402ee>] ? ll_prepare_mdc_op_data+0xbe/0x120 [lustre] Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0a40490>] ? ll_mdc_blocking_ast+0x0/0x5f0 [lustre] Jun 11 15:03:10 cluster3 kernel: [<ffffffffa083f770>] ? ldlm_completion_ast+0x0/0x8a0 [ptlrpc] Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0a402ee>] ? ll_prepare_mdc_op_data+0xbe/0x120 [lustre] Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0a430b5>] ll_lookup_it+0x315/0x870 [lustre] Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0a40490>] ? ll_mdc_blocking_ast+0x0/0x5f0 [lustre] Jun 11 15:03:10 cluster3 kernel: [<ffffffffa06f97c1>] ? cfs_alloc+0x91/0xf0 [libcfs] Jun 11 15:03:10 cluster3 kernel: [<ffffffffa0a43ac8>] ll_lookup_nd+0x88/0x470 [lustre] Jun 11 15:03:10 cluster3 kernel: [<ffffffff8118ad4e>] ? d_alloc+0x13e/0x1b0 Jun 11 15:03:10 cluster3 kernel: [<ffffffff81181c02>] __lookup_hash+0x102/0x160 Jun 11 15:03:10 cluster3 kernel: [<ffffffff81181d3a>] lookup_hash+0x3a/0x50 Jun 11 15:03:10 cluster3 kernel: [<ffffffff81182768>] do_filp_open+0x2c8/0xd90 Jun 11 15:03:10 cluster3 kernel: [<ffffffff8118f1e2>] ? alloc_fd+0x92/0x160 Jun 11 15:03:10 cluster3 kernel: [<ffffffff8116f989>] do_sys_open+0x69/0x140 Jun 11 15:03:10 cluster3 kernel: [<ffffffff8116faa0>] sys_open+0x20/0x30 Jun 11 15:03:10 cluster3 kernel: [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
I will upload the all log files soon.
Attachments
Issue Links
- is related to
-
LU-6529 Server side lock limits to avoid unnecessary memory exhaustion
- Closed
- Trackbacks
-
Lustre 1.8.x known issues tracker While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA