Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.9.0
-
None
-
autotest
-
3
-
9223372036854775807
Description
sanity-sec timesout on MDS umount. From the suite_stdout log, after all sanity-sec tests were run, we see:
13:18:19:CMD: onyx-36vm7 grep -c /mnt/lustre-mds1' ' /proc/mounts 13:18:19:Stopping /mnt/lustre-mds1 (opts:-f) on onyx-36vm7 13:18:19:CMD: onyx-36vm7 umount -d -f /mnt/lustre-mds1 14:17:19:********** Timeout by autotest system **********
Looking at the test_complete log for MDS1, vm7, doesn’t really show anything interesting. Yet, if we look at the console log for MDS1 for test_27 we see the LBUG:
13:20:13:[30936.817772] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 13:20:13:[30942.959187] Lustre: 3708:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1467231499/real 1467231499] req@ffff88007b2bdb00 x1538472463395376/t0(0) o39->lustre-MDT0000-lwp-MDT0000@0@lo:12/10 lens 224/224 e 0 to 1 dl 1467231505 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 13:20:13:[30942.968919] Lustre: 3708:0:(client.c:2113:ptlrpc_expire_one_request()) Skipped 39 previous similar messages 13:20:13:[31024.097193] Lustre: 11462:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1467231575/real 1467231575] req@ffff88007b2bfc00 x1538472463395360/t0(0) o104->lustre-MDT0000@10.2.4.148@tcp:15/16 lens 296/224 e 0 to 1 dl 1467231586 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 13:20:13:[31024.108553] Lustre: 11462:0:(client.c:2113:ptlrpc_expire_one_request()) Skipped 7 previous similar messages 13:20:13:[31046.110142] LustreError: 11462:0:(ldlm_lockd.c:691:ldlm_handle_ast_error()) ### client (nid 10.2.4.148@tcp) failed to reply to blocking AST (req@ffff88007b2bfc00 x1538472463395360 status 0 rc -110), evict it ns: mdt-lustre-MDT0000_UUID lock: ffff88007a8ea200/0x40220881aef8307 lrc: 3/0,0 mode: PR/PR res: [0x200000007:0x1:0x0].0x0 bits 0x13 rrc: 1 type: IBT flags: 0x60200400000020 nid: 10.2.4.148@tcp remote: 0x3f8b8faf93471726 expref: 5 pid: 11463 timeout: 4325802406 lvb_type: 0 13:20:13:[31046.122612] LustreError: 138-a: lustre-MDT0000: A client on nid 10.2.4.148@tcp was evicted due to a lock blocking callback time out: rc -110 13:20:13:[31046.135477] LustreError: 13044:0:(nodemap_member.c:50:nm_member_del()) ASSERTION( exp->u.eu_target_data.ted_nodemap == nodemap ) failed: 13:20:13:[31046.140099] LustreError: 13044:0:(nodemap_member.c:50:nm_member_del()) LBUG 13:20:13:[31046.142126] Pid: 13044, comm: mdt00_003 13:20:13:[31046.143802] 13:20:13:[31046.143802] Call Trace: 13:20:13:[31046.146737] [<ffffffffa06a87d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] 13:20:13:[31046.148763] [<ffffffffa06a8d75>] lbug_with_loc+0x45/0xc0 [libcfs] 13:20:13:[31046.150703] [<ffffffffa0a8c79d>] nm_member_del+0x18d/0x190 [ptlrpc] 13:20:13:[31046.152683] [<ffffffffa0a8743f>] nodemap_del_member+0x5f/0x170 [ptlrpc] 13:20:13:[31046.154752] [<ffffffffa0db95a5>] mdt_obd_disconnect+0x155/0x640 [mdt] 13:20:13:[31046.156740] [<ffffffffa09cd15b>] target_handle_disconnect+0x10b/0x4a0 [ptlrpc] 13:20:13:[31046.158828] [<ffffffffa0a64e37>] tgt_disconnect+0x37/0x140 [ptlrpc] 13:20:13:[31046.160881] [<ffffffffa0a69595>] tgt_request_handle+0x915/0x1320 [ptlrpc] 13:20:13:[31046.162920] [<ffffffffa0a15b1b>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] 13:20:13:[31046.165045] [<ffffffffa0a136d8>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] 13:20:13:[31046.166963] [<ffffffff810b88b2>] ? default_wake_function+0x12/0x20 13:20:13:[31046.168920] [<ffffffff810af018>] ? __wake_up_common+0x58/0x90 13:20:13:[31046.170841] [<ffffffffa0a19bd0>] ptlrpc_main+0xaa0/0x1de0 [ptlrpc] 13:20:13:[31046.172759] [<ffffffffa0a19130>] ? ptlrpc_main+0x0/0x1de0 [ptlrpc] 13:20:13:[31046.174784] [<ffffffff810a5acf>] kthread+0xcf/0xe0 13:20:13:[31046.176658] [<ffffffff810a5a00>] ? kthread+0x0/0xe0 13:20:13:[31046.178486] [<ffffffff81646318>] ret_from_fork+0x58/0x90 13:20:13:[31046.180329] [<ffffffff810a5a00>] ? kthread+0x0/0xe0 13:20:13:[31046.182177] 13:20:13:[31046.183948] Kernel panic - not syncing: LBUG 13:20:13:[31046.184689] CPU: 0 PID: 13044 Comm: mdt00_003 Tainted: G OE ------------ 3.10.0-327.18.2.el7_lustre.x86_64 #1 13:20:13:[31046.184689] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 13:20:13:[31046.184689] ffffffffa06c5def 000000006a616767 ffff88004311bb00 ffffffff81635c14 13:20:13:[31046.184689] ffff88004311bb80 ffffffff8162f48a ffffffff00000008 ffff88004311bb90 13:20:13:[31046.184689] ffff88004311bb30 000000006a616767 ffffffffa0aa54eb 0000000000000246
There are several instances of this failure. Logs are at
2016-06-29 - https://testing.hpdd.intel.com/test_sets/2cbf7ebc-3e44-11e6-80b9-5254006e85c2
2016-06-23 - https://testing.hpdd.intel.com/test_sets/a6a39816-39c2-11e6-acf3-5254006e85c2
Attachments
Issue Links
- is related to
-
LU-3291 IU UID/GID Mapping Feature
- Resolved