Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.8.0
-
None
-
client and server: lustre-master build #2967 zfs
-
3
-
9223372036854775807
Description
This issue was created by maloo for sarah <sarah@whamcloud.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/83ac0620-d67e-11e4-8a24-5254006e85c2.
The sub-test test_failover_ost failed with the following error:
test_failover_ost returned 3
MDS console
Lustre: lustre-MDT0000: Recovery over after 0:09, of 2 clients 2 recovered and 0 were evicted. Lustre: 2817:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1427510311/real 1427510311] req@ffff88006b6859c0 x1496853024604548/t0(0) o8->lustre-OST0002-osc-MDT0000@10.2.4.226@tcp:28/4 lens 400/544 e 0 to 1 dl 1427510316 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: 2817:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1427510321/real 1427510321] req@ffff8800722326c0 x1496853024604620/t0(0) o8->lustre-OST0006-osc-MDT0000@10.2.4.226@tcp:28/4 lens 400/544 e 0 to 1 dl 1427510331 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: 2817:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 2 previous similar messages LNet: Service thread pid 3181 was inactive for 40.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Pid: 3181, comm: mdt00_003 Call Trace: [<ffffffff8152b162>] schedule_timeout+0x192/0x2e0 [<ffffffff810874f0>] ? process_timeout+0x0/0x10 [<ffffffffa1325951>] osp_precreate_reserve+0x3d1/0x810 [osp] [<ffffffffa131b740>] ? osp_object_free+0x2d0/0x4a0 [osp] [<ffffffff81064b90>] ? default_wake_function+0x0/0x20 [<ffffffffa131fbb6>] osp_declare_object_create+0x1a6/0x650 [osp] [<ffffffffa125d81a>] lod_qos_declare_object_on+0x12a/0x4f0 [lod] [<ffffffffa12603f3>] lod_alloc_rr.clone.2+0xbb3/0x1020 [lod] [<ffffffffa079dfb1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] [<ffffffffa1261442>] lod_qos_prep_create+0xbe2/0x19e0 [lod] [<ffffffffa1254d42>] lod_declare_striped_object+0x162/0x9b0 [lod] [<ffffffffa125c547>] lod_declare_object_create+0x2c7/0x460 [lod] [<ffffffffa12c4506>] mdd_declare_object_create_internal+0x116/0x340 [mdd] [<ffffffffa12bd79e>] mdd_create+0x68e/0x1730 [mdd] [<ffffffffa118f7b8>] mdo_create+0x18/0x50 [mdt] [<ffffffffa1196bbf>] mdt_reint_open+0x1f8f/0x2c70 [mdt] [<ffffffffa0906cbc>] ? upcall_cache_get_entry+0x29c/0x880 [obdclass] [<ffffffffa1180cad>] mdt_reint_rec+0x5d/0x200 [mdt] [<ffffffffa116511b>] mdt_reint_internal+0x4cb/0x7a0 [mdt] [<ffffffffa11655e6>] mdt_intent_reint+0x1f6/0x430 [mdt] [<ffffffffa1163bd4>] mdt_intent_policy+0x494/0xce0 [mdt] [<ffffffffa0abf4f9>] ldlm_lock_enqueue+0x129/0x9d0 [ptlrpc] [<ffffffffa0aeb52b>] ldlm_handle_enqueue0+0x51b/0x13f0 [ptlrpc] [<ffffffffa0b6bbb1>] tgt_enqueue+0x61/0x230 [ptlrpc] [<ffffffffa0b6c7fe>] tgt_request_handle+0x8be/0x1000 [ptlrpc] [<ffffffffa0b1c661>] ptlrpc_main+0xe41/0x1960 [ptlrpc] [<ffffffffa0b1b820>] ? ptlrpc_main+0x0/0x1960 [ptlrpc] [<ffffffff8109e66e>] kthread+0x9e/0xc0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 LustreError: dumping log to /tmp/lustre-log.1427510345.3181 LNet: Service thread pid 3181 completed after 80.04s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Lustre: DEBUG MARKER: /usr/sbin/lctl mark Duration: 86400 Server failover period: 1200 seconds Exited after: 37 seconds