Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.7.0
-
OpenSFS cluster with two MDSs with one MDT each, three OSSs with two OSTs each and three clients. Lustre tag 2.6.92 build #2812
-
3
-
17107
Description
I ran a small version of the Stability test in the LFSCK test plan; created 5 subdirectories with 100 objects in each. One proc runs LFSCK namespace, one deletes existing files and all other procs create more directories and files.
LFSCK namespace completed on MDT01, but was stuck on MDT02 with
status: scanning-phase2
After a few minutes, I ran the following on MDS01:
lctl lfsck_stop -A -M scratch-MDT0000
This command did not and has not returned for over 30 minutes. So, I tried to stop LFSCK on MDS02:
lctl lfsck_stop -A -M scratch-MDT0001
and this command has not returned.
The dmesg logs complain about losing contact with a client, but, from the user perspective, all nodes look up and responsive when I get on them.
From MDS02, containing MDT02 that LFSCK namespace did not complete, dmesg contains:
LustreError: 11-0: scratch-MDT0000-osp-MDT0001: operation mds_connect to node 192.168.2.125@o2ib failed: rc = -11 Lustre: scratch-MDT0001: haven't heard from client d9f65e0e-5d98-7dfb-985b-592359f9067c (at 192.168.2.111@o2ib) in 234 seconds. I think it's dead, and I am evicting it. exp ffff880292852c00, cur 1421640751 expire 1421640601 last 1421640517 Lustre: cli-ctl-scratch-MDT0001: Allocated super-sequence [0x00000003c0000400-0x0000000400000400):1:mdt] LNet: Service thread pid 32420 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Pid: 32420, comm: mdt_out02_000 Call Trace: [<ffffffff81061d12>] ? default_wake_function+0x12/0x20 [<ffffffff810546b9>] ? __wake_up_common+0x59/0x90 [<ffffffffa0ebd14c>] lfsck_stop+0x34c/0x3f0 [lfsck] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa091a842>] ? __req_capsule_get+0x162/0x6d0 [ptlrpc] [<ffffffffa0ebe9b0>] lfsck_in_notify+0x2a0/0x330 [lfsck] [<ffffffffa094d694>] tgt_handle_lfsck_notify+0x64/0x150 [ptlrpc] [<ffffffffa095597e>] tgt_request_handle+0x8be/0x1000 [ptlrpc] [<ffffffffa0905711>] ptlrpc_main+0xe41/0x1960 [ptlrpc] [<ffffffff81069290>] ? pick_next_task_fair+0xd0/0x130 [<ffffffff81529466>] ? schedule+0x176/0x3b0 [<ffffffffa09048d0>] ? ptlrpc_main+0x0/0x1960 [ptlrpc] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 LustreError: dumping log to /tmp/lustre-log.1421730002.32420 INFO: task lctl:5236 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.gefdacb7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lctl D 000000000000000d 0 5236 8921 0x00000080 ffff88102b211b58 0000000000000086 0000000000000000 ffff88083fd304c0 ffff88102b211ae8 ffff88083fd304c0 0000000000000000 ffff88083fd304c0 ffff880c97827058 ffff88102b211fd8 000000000000fbc8 ffff880c97827058 Call Trace: [<ffffffff8152ae5e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8152acfb>] mutex_lock+0x2b/0x50 [<ffffffffa0ebcf2f>] lfsck_stop+0x12f/0x3f0 [lfsck] [<ffffffffa1130c7b>] ? osp_key_init+0x6b/0x190 [osp] [<ffffffffa06cf30f>] ? keys_fill+0x6f/0x190 [obdclass] [<ffffffffa10c74cf>] mdd_iocontrol+0x1ff/0xb70 [mdd] [<ffffffffa06d3c23>] ? lu_context_init+0xa3/0x240 [obdclass] [<ffffffffa0f7f05e>] mdt_iocontrol+0x6be/0x8e0 [mdt] [<ffffffffa069fad6>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa069b77c>] class_handle_ioctl+0x15fc/0x2180 [obdclass] [<ffffffff8119c3c8>] ? do_filp_open+0x798/0xd20 [<ffffffffa06822ab>] obd_class_ioctl+0x4b/0x190 [obdclass] [<ffffffff8119e972>] vfs_ioctl+0x22/0xa0 [<ffffffff8119eb14>] do_vfs_ioctl+0x84/0x580 [<ffffffff81196dd6>] ? final_putname+0x26/0x50 [<ffffffff8119f091>] sys_ioctl+0x81/0xa0 [<ffffffff810e202e>] ? __audit_syscall_exit+0x25e/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task lctl:5236 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.gefdacb7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lctl D 000000000000000d 0 5236 8921 0x00000080 ffff88102b211b58 0000000000000086 0000000000000000 ffff88083fd304c0 ffff88102b211ae8 ffff88083fd304c0 0000000000000000 ffff88083fd304c0 ffff880c97827058 ffff88102b211fd8 000000000000fbc8 ffff880c97827058 Call Trace: [<ffffffff8152ae5e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8152acfb>] mutex_lock+0x2b/0x50 [<ffffffffa0ebcf2f>] lfsck_stop+0x12f/0x3f0 [lfsck] [<ffffffffa1130c7b>] ? osp_key_init+0x6b/0x190 [osp] [<ffffffffa06cf30f>] ? keys_fill+0x6f/0x190 [obdclass] [<ffffffffa10c74cf>] mdd_iocontrol+0x1ff/0xb70 [mdd] [<ffffffffa06d3c23>] ? lu_context_init+0xa3/0x240 [obdclass] [<ffffffffa0f7f05e>] mdt_iocontrol+0x6be/0x8e0 [mdt] [<ffffffffa069fad6>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa069b77c>] class_handle_ioctl+0x15fc/0x2180 [obdclass] [<ffffffff8119c3c8>] ? do_filp_open+0x798/0xd20 [<ffffffffa06822ab>] obd_class_ioctl+0x4b/0x190 [obdclass] [<ffffffff8119e972>] vfs_ioctl+0x22/0xa0 [<ffffffff8119eb14>] do_vfs_ioctl+0x84/0x580 [<ffffffff81196dd6>] ? final_putname+0x26/0x50 [<ffffffff8119f091>] sys_ioctl+0x81/0xa0 [<ffffffff810e202e>] ? __audit_syscall_exit+0x25e/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task lctl:5236 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.gefdacb7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lctl D 000000000000000d 0 5236 8921 0x00000080 ffff88102b211b58 0000000000000086 0000000000000000 ffff88083fd304c0 ffff88102b211ae8 ffff88083fd304c0 0000000000000000 ffff88083fd304c0 ffff880c97827058 ffff88102b211fd8 000000000000fbc8 ffff880c97827058 Call Trace: [<ffffffff8152ae5e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8152acfb>] mutex_lock+0x2b/0x50 [<ffffffffa0ebcf2f>] lfsck_stop+0x12f/0x3f0 [lfsck] [<ffffffffa1130c7b>] ? osp_key_init+0x6b/0x190 [osp] [<ffffffffa06cf30f>] ? keys_fill+0x6f/0x190 [obdclass] [<ffffffffa10c74cf>] mdd_iocontrol+0x1ff/0xb70 [mdd] [<ffffffffa06d3c23>] ? lu_context_init+0xa3/0x240 [obdclass] [<ffffffffa0f7f05e>] mdt_iocontrol+0x6be/0x8e0 [mdt] [<ffffffffa069fad6>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa069b77c>] class_handle_ioctl+0x15fc/0x2180 [obdclass] [<ffffffff8119c3c8>] ? do_filp_open+0x798/0xd20 [<ffffffffa06822ab>] obd_class_ioctl+0x4b/0x190 [obdclass] [<ffffffff8119e972>] vfs_ioctl+0x22/0xa0 [<ffffffff8119eb14>] do_vfs_ioctl+0x84/0x580 [<ffffffff81196dd6>] ? final_putname+0x26/0x50 [<ffffffff8119f091>] sys_ioctl+0x81/0xa0 [<ffffffff810e202e>] ? __audit_syscall_exit+0x25e/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Lustre: 5459:0:(service.c:1335:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff880c5f8d46c0 x1490612292269532/t0(0) o1101->scratch-MDT0000-mdtlov_UUID@192.168.2.125@o2ib:471/0 lens 320/224 e 24 to 0 dl 1421730401 ref 2 fl Interpret:/0/0 rc 0/0 INFO: task lctl:5236 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.gefdacb7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lctl D 000000000000000d 0 5236 8921 0x00000080 ffff88102b211b58 0000000000000086 0000000000000000 ffff88083fd304c0 ffff88102b211ae8 ffff88083fd304c0 0000000000000000 ffff88083fd304c0 ffff880c97827058 ffff88102b211fd8 000000000000fbc8 ffff880c97827058 Call Trace: [<ffffffff8152ae5e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8152acfb>] mutex_lock+0x2b/0x50 [<ffffffffa0ebcf2f>] lfsck_stop+0x12f/0x3f0 [lfsck] [<ffffffffa1130c7b>] ? osp_key_init+0x6b/0x190 [osp] [<ffffffffa06cf30f>] ? keys_fill+0x6f/0x190 [obdclass] [<ffffffffa10c74cf>] mdd_iocontrol+0x1ff/0xb70 [mdd] [<ffffffffa06d3c23>] ? lu_context_init+0xa3/0x240 [obdclass] [<ffffffffa0f7f05e>] mdt_iocontrol+0x6be/0x8e0 [mdt] [<ffffffffa069fad6>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa069b77c>] class_handle_ioctl+0x15fc/0x2180 [obdclass] [<ffffffff8119c3c8>] ? do_filp_open+0x798/0xd20 [<ffffffffa06822ab>] obd_class_ioctl+0x4b/0x190 [obdclass] [<ffffffff8119e972>] vfs_ioctl+0x22/0xa0 [<ffffffff8119eb14>] do_vfs_ioctl+0x84/0x580 [<ffffffff81196dd6>] ? final_putname+0x26/0x50 [<ffffffff8119f091>] sys_ioctl+0x81/0xa0 [<ffffffff810e202e>] ? __audit_syscall_exit+0x25e/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Lustre: scratch-MDT0001: Client scratch-MDT0000-mdtlov_UUID (at 192.168.2.125@o2ib) reconnecting INFO: task lctl:5236 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.gefdacb7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lctl D 000000000000000d 0 5236 8921 0x00000080 ffff88102b211b58 0000000000000086 0000000000000000 ffff88083fd304c0 ffff88102b211ae8 ffff88083fd304c0 0000000000000000 ffff88083fd304c0 ffff880c97827058 ffff88102b211fd8 000000000000fbc8 ffff880c97827058 Call Trace: [<ffffffff8152ae5e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8152acfb>] mutex_lock+0x2b/0x50 [<ffffffffa0ebcf2f>] lfsck_stop+0x12f/0x3f0 [lfsck] [<ffffffffa1130c7b>] ? osp_key_init+0x6b/0x190 [osp] [<ffffffffa06cf30f>] ? keys_fill+0x6f/0x190 [obdclass] [<ffffffffa10c74cf>] mdd_iocontrol+0x1ff/0xb70 [mdd] [<ffffffffa06d3c23>] ? lu_context_init+0xa3/0x240 [obdclass] [<ffffffffa0f7f05e>] mdt_iocontrol+0x6be/0x8e0 [mdt] [<ffffffffa069fad6>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa069b77c>] class_handle_ioctl+0x15fc/0x2180 [obdclass] [<ffffffff8119c3c8>] ? do_filp_open+0x798/0xd20 [<ffffffffa06822ab>] obd_class_ioctl+0x4b/0x190 [obdclass] [<ffffffff8119e972>] vfs_ioctl+0x22/0xa0 [<ffffffff8119eb14>] do_vfs_ioctl+0x84/0x580 [<ffffffff81196dd6>] ? final_putname+0x26/0x50 [<ffffffff8119f091>] sys_ioctl+0x81/0xa0 [<ffffffff810e202e>] ? __audit_syscall_exit+0x25e/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task lctl:5236 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.gefdacb7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lctl D 000000000000000d 0 5236 8921 0x00000080 ffff88102b211b58 0000000000000086 0000000000000000 ffff88083fd304c0 ffff88102b211ae8 ffff88083fd304c0 0000000000000000 ffff88083fd304c0 ffff880c97827058 ffff88102b211fd8 000000000000fbc8 ffff880c97827058 Call Trace: [<ffffffff8152ae5e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8152acfb>] mutex_lock+0x2b/0x50 [<ffffffffa0ebcf2f>] lfsck_stop+0x12f/0x3f0 [lfsck] [<ffffffffa1130c7b>] ? osp_key_init+0x6b/0x190 [osp] [<ffffffffa06cf30f>] ? keys_fill+0x6f/0x190 [obdclass] [<ffffffffa10c74cf>] mdd_iocontrol+0x1ff/0xb70 [mdd] [<ffffffffa06d3c23>] ? lu_context_init+0xa3/0x240 [obdclass] [<ffffffffa0f7f05e>] mdt_iocontrol+0x6be/0x8e0 [mdt] [<ffffffffa069fad6>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa069b77c>] class_handle_ioctl+0x15fc/0x2180 [obdclass] [<ffffffff8119c3c8>] ? do_filp_open+0x798/0xd20 [<ffffffffa06822ab>] obd_class_ioctl+0x4b/0x190 [obdclass] [<ffffffff8119e972>] vfs_ioctl+0x22/0xa0 [<ffffffff8119eb14>] do_vfs_ioctl+0x84/0x580 [<ffffffff81196dd6>] ? final_putname+0x26/0x50 [<ffffffff8119f091>] sys_ioctl+0x81/0xa0 [<ffffffff810e202e>] ? __audit_syscall_exit+0x25e/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task lctl:5236 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.gefdacb7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lctl D 000000000000000d 0 5236 8921 0x00000080 ffff88102b211b58 0000000000000086 0000000000000000 ffff88083fd304c0 ffff88102b211ae8 ffff88083fd304c0 0000000000000000 ffff88083fd304c0 ffff880c97827058 ffff88102b211fd8 000000000000fbc8 ffff880c97827058 Call Trace: [<ffffffff8152ae5e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8152acfb>] mutex_lock+0x2b/0x50 [<ffffffffa0ebcf2f>] lfsck_stop+0x12f/0x3f0 [lfsck] [<ffffffffa1130c7b>] ? osp_key_init+0x6b/0x190 [osp] [<ffffffffa06cf30f>] ? keys_fill+0x6f/0x190 [obdclass] [<ffffffffa10c74cf>] mdd_iocontrol+0x1ff/0xb70 [mdd] [<ffffffffa06d3c23>] ? lu_context_init+0xa3/0x240 [obdclass] [<ffffffffa0f7f05e>] mdt_iocontrol+0x6be/0x8e0 [mdt] [<ffffffffa069fad6>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa069b77c>] class_handle_ioctl+0x15fc/0x2180 [obdclass] [<ffffffff8119c3c8>] ? do_filp_open+0x798/0xd20 [<ffffffffa06822ab>] obd_class_ioctl+0x4b/0x190 [obdclass] [<ffffffff8119e972>] vfs_ioctl+0x22/0xa0 [<ffffffff8119eb14>] do_vfs_ioctl+0x84/0x580 [<ffffffff81196dd6>] ? final_putname+0x26/0x50 [<ffffffff8119f091>] sys_ioctl+0x81/0xa0 [<ffffffff810e202e>] ? __audit_syscall_exit+0x25e/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task lctl:5236 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.gefdacb7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lctl D 000000000000000d 0 5236 8921 0x00000080 ffff88102b211b58 0000000000000086 0000000000000000 ffff88083fd304c0 ffff88102b211ae8 ffff88083fd304c0 0000000000000000 ffff88083fd304c0 ffff880c97827058 ffff88102b211fd8 000000000000fbc8 ffff880c97827058 Call Trace: [<ffffffff8152ae5e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8152acfb>] mutex_lock+0x2b/0x50 [<ffffffffa0ebcf2f>] lfsck_stop+0x12f/0x3f0 [lfsck] [<ffffffffa1130c7b>] ? osp_key_init+0x6b/0x190 [osp] [<ffffffffa06cf30f>] ? keys_fill+0x6f/0x190 [obdclass] [<ffffffffa10c74cf>] mdd_iocontrol+0x1ff/0xb70 [mdd] [<ffffffffa06d3c23>] ? lu_context_init+0xa3/0x240 [obdclass] [<ffffffffa0f7f05e>] mdt_iocontrol+0x6be/0x8e0 [mdt] [<ffffffffa069fad6>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa069b77c>] class_handle_ioctl+0x15fc/0x2180 [obdclass] [<ffffffff8119c3c8>] ? do_filp_open+0x798/0xd20 [<ffffffffa06822ab>] obd_class_ioctl+0x4b/0x190 [obdclass] [<ffffffff8119e972>] vfs_ioctl+0x22/0xa0 [<ffffffff8119eb14>] do_vfs_ioctl+0x84/0x580 [<ffffffff81196dd6>] ? final_putname+0x26/0x50 [<ffffffff8119f091>] sys_ioctl+0x81/0xa0 [<ffffffff810e202e>] ? __audit_syscall_exit+0x25e/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task lctl:5236 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.gefdacb7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lctl D 000000000000000d 0 5236 8921 0x00000080 ffff88102b211b58 0000000000000086 0000000000000000 ffff88083fd304c0 ffff88102b211ae8 ffff88083fd304c0 0000000000000000 ffff88083fd304c0 ffff880c97827058 ffff88102b211fd8 000000000000fbc8 ffff880c97827058 Call Trace: [<ffffffff8152ae5e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8152acfb>] mutex_lock+0x2b/0x50 [<ffffffffa0ebcf2f>] lfsck_stop+0x12f/0x3f0 [lfsck] [<ffffffffa1130c7b>] ? osp_key_init+0x6b/0x190 [osp] [<ffffffffa06cf30f>] ? keys_fill+0x6f/0x190 [obdclass] [<ffffffffa10c74cf>] mdd_iocontrol+0x1ff/0xb70 [mdd] [<ffffffffa06d3c23>] ? lu_context_init+0xa3/0x240 [obdclass] [<ffffffffa0f7f05e>] mdt_iocontrol+0x6be/0x8e0 [mdt] [<ffffffffa069fad6>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa069b77c>] class_handle_ioctl+0x15fc/0x2180 [obdclass] [<ffffffff8119c3c8>] ? do_filp_open+0x798/0xd20 [<ffffffffa06822ab>] obd_class_ioctl+0x4b/0x190 [obdclass] [<ffffffff8119e972>] vfs_ioctl+0x22/0xa0 [<ffffffff8119eb14>] do_vfs_ioctl+0x84/0x580 [<ffffffff81196dd6>] ? final_putname+0x26/0x50 [<ffffffff8119f091>] sys_ioctl+0x81/0xa0 [<ffffffff810e202e>] ? __audit_syscall_exit+0x25e/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task lctl:5236 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.gefdacb7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. lctl D 000000000000000d 0 5236 8921 0x00000080 ffff88102b211b58 0000000000000086 0000000000000000 ffff88083fd304c0 ffff88102b211ae8 ffff88083fd304c0 0000000000000000 ffff88083fd304c0 ffff880c97827058 ffff88102b211fd8 000000000000fbc8 ffff880c97827058 Call Trace: [<ffffffff8152ae5e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8152acfb>] mutex_lock+0x2b/0x50 [<ffffffffa0ebcf2f>] lfsck_stop+0x12f/0x3f0 [lfsck] [<ffffffffa1130c7b>] ? osp_key_init+0x6b/0x190 [osp] [<ffffffffa06cf30f>] ? keys_fill+0x6f/0x190 [obdclass] [<ffffffffa10c74cf>] mdd_iocontrol+0x1ff/0xb70 [mdd] [<ffffffffa06d3c23>] ? lu_context_init+0xa3/0x240 [obdclass] [<ffffffffa0f7f05e>] mdt_iocontrol+0x6be/0x8e0 [mdt] [<ffffffffa069fad6>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa069b77c>] class_handle_ioctl+0x15fc/0x2180 [obdclass] [<ffffffff8119c3c8>] ? do_filp_open+0x798/0xd20 [<ffffffffa06822ab>] obd_class_ioctl+0x4b/0x190 [obdclass] [<ffffffff8119e972>] vfs_ioctl+0x22/0xa0 [<ffffffff8119eb14>] do_vfs_ioctl+0x84/0x580 [<ffffffff81196dd6>] ? final_putname+0x26/0x50 [<ffffffff8119f091>] sys_ioctl+0x81/0xa0 [<ffffffff810e202e>] ? __audit_syscall_exit+0x25e/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Lustre: scratch-MDT0001: Client scratch-MDT0000-mdtlov_UUID (at 192.168.2.125@o2ib) reconnecting
On MDS01, containing the MDS01 that LFSCK did complete, dmesg contains:
LustreError: 11-0: scratch-MDT0001-osp-MDT0000: operation mds_connect to node 192.168.2.126@o2ib failed: rc = -11 Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400):0:ost Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000280000400-0x00000002c0000400):4:ost Lustre: Skipped 1 previous similar message Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000300000400-0x0000000340000400):3:ost Lustre: Skipped 1 previous similar message Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000340000400-0x0000000380000400):5:ost Lustre: MGS: haven't heard from client b55379e3-6789-caf6-3733-61f9e03b0716 (at 192.168.2.111@o2ib) in 234 seconds. I think it's dead, and I am evicting it. exp ffff8807fed38000, cur 1421640751 expire 1421640601 last 1421640517 Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x0000000380000400-0x00000003c0000400):0:mdt Lustre: ctl-scratch-MDT0000: super-sequence allocation rc = 0 [0x00000003c0000400-0x0000000400000400):1:mdt LustreError: 2296:0:(class_obd.c:397:class_handle_ioctl()) OBD ioctl : No Device -12066 LustreError: 2297:0:(class_obd.c:397:class_handle_ioctl()) OBD ioctl : No Device -12066 LustreError: 2300:0:(class_obd.c:397:class_handle_ioctl()) OBD ioctl : No Device -12066 Lustre: 2301:0:(client.c:1942:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1421729801/real 1421729801] req@ffff880796e3a6c0 x1490612292269532/t0(0) o1101->scratch-MDT0001-osp-MDT0000@192.168.2.126@o2ib:24/4 lens 320/224 e 24 to 1 dl 1421730557 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Lustre: scratch-MDT0001-osp-MDT0000: Connection to scratch-MDT0001 (at 192.168.2.126@o2ib) was lost; in progress operations using this service will wait for recovery to complete Lustre: scratch-MDT0001-osp-MDT0000: Connection restored to scratch-MDT0001 (at 192.168.2.126@o2ib) Lustre: 2301:0:(client.c:1942:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1421730557/real 1421730557] req@ffff880796e3a6c0 x1490612292269532/t0(0) o1101->scratch-MDT0001-osp-MDT0000@192.168.2.126@o2ib:24/4 lens 320/224 e 24 to 1 dl 1421731313 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Lustre: scratch-MDT0001-osp-MDT0000: Connection to scratch-MDT0001 (at 192.168.2.126@o2ib) was lost; in progress operations using this service will wait for recovery to complete Lustre: scratch-MDT0001-osp-MDT0000: Connection restored to scratch-MDT0001 (at 192.168.2.126@o2ib)
I will upload the log from /tmp/lustre-log.1421730002.32420 from MDS02.
I have not tried to reproduce this error.
Attachments
Issue Links
- is related to
-
LU-5791 LFSCK 5: use bottom object for consistency verification
- Resolved