Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
Lustre 2.8.0
-
None
-
client and server: lustre-master build # 3094 RHEL7 DNE
-
3
-
9223372036854775807
Description
This issue was created by maloo for sarah_lw <wei3.liu@intel.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/24729d7e-2640-11e5-8b33-5254006e85c2.
The sub-test test_1 failed with the following error:
test failed to respond and timed out
MDS dmesg
[ 1244.736431] Lustre: DEBUG MARKER: /usr/sbin/lctl mark copying 880 files from \/etc \/bin to \/mnt\/lustre\/d1.runtests\/etc \/bin at Wed Jul 8 20:30:42 UTC 2015 [ 1244.857091] Lustre: DEBUG MARKER: copying 880 files from /etc /bin to /mnt/lustre/d1.runtests/etc /bin at Wed Jul 8 20:30:42 UTC 2015 [ 1252.974179] Lustre: 2786:0:(client.c:2018:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1436387443/real 1436387443] req@ffff880069257300 x1506161297721592/t0(0) o400->lustre-MDT0002-osp-MDT0000@10.1.5.252@tcp:24/4 lens 224/224 e 0 to 1 dl 1436387450 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 1252.979486] Lustre: lustre-MDT0002-osp-MDT0000: Connection to lustre-MDT0002 (at 10.1.5.252@tcp) was lost; in progress operations using this service will wait for recovery to complete [ 1257.239137] Lustre: 2785:0:(client.c:2018:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1436387448/real 1436387448] req@ffff880068c86700 x1506161297721644/t0(0) o400->lustre-MDT0002-osp-MDT0000@10.1.5.252@tcp:24/4 lens 224/224 e 0 to 1 dl 1436387455 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 1257.243885] Lustre: 2785:0:(client.c:2018:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [ 1258.983125] Lustre: 2784:0:(client.c:2018:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1436387450/real 1436387450] req@ffff88006912db00 x1506161297721660/t0(0) o38->lustre-MDT0003-osp-MDT0000@10.1.5.252@tcp:24/4 lens 520/544 e 0 to 1 dl 1436387456 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 1258.988595] Lustre: 2784:0:(client.c:2018:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [ 1273.982152] Lustre: 2784:0:(client.c:2018:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1436387460/real 1436387460] req@ffff880069d87300 x1506161297721744/t0(0) o38->lustre-MDT0001-osp-MDT0000@10.1.5.252@tcp:24/4 lens 520/544 e 0 to 1 dl 1436387471 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 1273.987550] Lustre: 2784:0:(client.c:2018:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [ 1285.839237] Lustre: 2815:0:(client.c:2018:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1436387476/real 1436387476] req@ffff880069d87300 x1506161297721892/t0(0) o104->lustre-MDT0000@10.1.5.252@tcp:15/16 lens 296/224 e 0 to 1 dl 1436387483 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [ 1285.844932] Lustre: 2815:0:(client.c:2018:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [ 1292.205644] Lustre: lustre-MDT0000: haven't heard from client lustre-MDT0000-lwp-MDT0001_UUID (at 10.1.5.252@tcp) in 49 seconds. I think it's dead, and I am evicting it. exp ffff88007bc28400, cur 1436387490 expire 1436387460 last 1436387441 [ 1293.982194] Lustre: 2784:0:(client.c:2018:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1436387475/real 1436387475] req@ffff880069d85b00 x1506161297721880/t0(0) o38->lustre-MDT0001-osp-MDT0000@10.1.5.252@tcp:24/4 lens 520/544 e 0 to 1 dl 1436387491 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 1318.879183] LNet: Service thread pid 2815 was inactive for 40.04s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [ 1318.882589] Pid: 2815, comm: mdt00_001 [ 1318.883358] Call Trace: [ 1318.884177] [<ffffffff8160a409>] schedule+0x29/0x70 [ 1318.884703] [<ffffffff816082b5>] schedule_timeout+0x175/0x2d0 [ 1318.885134] [<ffffffff8107ee80>] ? process_timeout+0x0/0x10 [ 1318.885707] [<ffffffffa09a8320>] ? ptlrpc_interrupted_set+0x0/0x120 [ptlrpc] [ 1318.886241] [<ffffffffa09b2978>] ptlrpc_set_wait+0x4b8/0x9e0 [ptlrpc] [ 1318.886787] [<ffffffff810a9650>] ? default_wake_function+0x0/0x20 [ 1318.887253] [<ffffffffa09b2f1d>] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc] [ 1318.887764] [<ffffffffa0f433c5>] osp_remote_sync+0xd5/0x1b0 [osp] [ 1318.888210] [<ffffffffa0f2a25d>] osp_attr_get+0x41d/0x690 [osp] [ 1318.888728] [<ffffffffa0f25ab4>] osp_object_init+0x114/0x280 [osp] [ 1318.889220] [<ffffffffa076058f>] lu_object_alloc+0xdf/0x310 [obdclass] [ 1318.889785] [<ffffffffa076099c>] lu_object_find_try+0x17c/0x2c0 [obdclass] [ 1318.890295] [<ffffffffa0760b8c>] lu_object_find_at+0xac/0xe0 [obdclass] [ 1318.890940] [<ffffffffa0e6ebb5>] ? lod_index_lookup+0x25/0x30 [lod] [ 1318.891407] [<ffffffffa0ecc307>] ? __mdd_lookup.isra.17+0x317/0x440 [mdd] [ 1318.891993] [<ffffffffa0760bd6>] lu_object_find+0x16/0x20 [obdclass] [ 1318.892575] [<ffffffffa0d8a73b>] mdt_object_find+0x4b/0x170 [mdt] [ 1318.893146] [<ffffffffa0db0726>] mdt_reint_open+0x2776/0x2d50 [mdt] [ 1318.893624] [<ffffffffa07795f9>] ? upcall_cache_get_entry+0x3e9/0x8e0 [obdclass] [ 1318.894169] [<ffffffff812ddd72>] ? strlcpy+0x42/0x60 [ 1318.894571] [<ffffffffa0da46e0>] mdt_reint_rec+0x80/0x210 [mdt] [ 1318.894985] [<ffffffffa0d884e9>] mdt_reint_internal+0x5d9/0xb20 [mdt] [ 1318.895456] [<ffffffffa0d88b92>] mdt_intent_reint+0x162/0x410 [mdt] [ 1318.896056] [<ffffffffa0d921ea>] mdt_intent_policy+0x57a/0xb30 [mdt] [ 1318.896619] [<ffffffffa0970223>] ldlm_lock_enqueue+0x353/0x930 [ptlrpc] [ 1318.897117] [<ffffffffa0998da2>] ldlm_handle_enqueue0+0x4f2/0x16e0 [ptlrpc] [ 1318.897725] [<ffffffffa09c0360>] ? lustre_swab_ldlm_request+0x0/0x30 [ptlrpc] [ 1318.898274] [<ffffffffa0a1eed2>] tgt_enqueue+0x62/0x210 [ptlrpc] [ 1318.898762] [<ffffffffa0a2329b>] tgt_request_handle+0x88b/0x1100 [ptlrpc] [ 1318.899278] [<ffffffffa09cafbb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] [ 1318.899926] [<ffffffffa09c8078>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] [ 1318.900416] [<ffffffff810a9662>] ? default_wake_function+0x12/0x20 [ 1318.900956] [<ffffffff810a0898>] ? __wake_up_common+0x58/0x90 [ 1318.901397] [<ffffffffa09ce900>] ptlrpc_main+0xc00/0x1f60 [ptlrpc] [ 1318.901953] [<ffffffffa09cdd00>] ? ptlrpc_main+0x0/0x1f60 [ptlrpc] [ 1318.902407] [<ffffffff8109739f>] kthread+0xcf/0xe0 [ 1318.902851] [<ffffffff810972d0>] ? kthread+0x0/0xe0 [ 1318.903215] [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0 [ 1318.903640] [<ffffffff810972d0>] ? kthread+0x0/0xe0