[LU-10321] MDS - umount hangs during failback Created: 04/Dec/17 Updated: 02/Feb/18 Resolved: 04/Jan/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.2 |
| Fix Version/s: | Lustre 2.11.0, Lustre 2.10.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Cliff White (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | soak | ||
| Environment: |
Soak stress cluster MLNX networking stack. |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
MDT 2 (soak-10) fails over to soak-11, with errors Dec 2 07:27:00 soak-11 kernel: LustreError: 2976:0:(llog_osd.c:960:llog_osd_next_block()) soaked-MDT0003-osp-MDT0002: missed desired record? 2 > 1 Dec 2 07:27:00 soak-11 kernel: LustreError: 2976:0:(lod_dev.c:419:lod_sub_recovery_thread()) soaked-MDT0003-osp-MDT0002 getting update log failed: rc = -2 Dec 2 07:27:00 soak-11 kernel: LustreError: 2976:0:(lod_dev.c:419:lod_sub_recovery_thread()) Skipped 3 previous similar messages Dec 2 07:27:01 soak-11 kernel: LustreError: 2381:0:(mdt_open.c:1167:mdt_cross_open()) soaked-MDT0002: [0x280002b4c:0xa44:0x0] doesn't exist!: rc = -14 Dec 2 07:27:02 soak-11 kernel: Lustre: 2977:0:(ldlm_lib.c:2059:target_recovery_overseer()) recovery is aborted, evict exports in recovery Dec 2 07:27:02 soak-11 kernel: Lustre: 2977:0:(ldlm_lib.c:2059:target_recovery_overseer()) Skipped 2 previous similar messages Dec 2 07:27:02 soak-11 kernel: Lustre: soaked-MDT0002: disconnecting 31 stale clients Soak attempts a umount which hangs: 2017-12-02 07:27:16,430:fsmgmt.fsmgmt:INFO Unmounting soaked-MDT0002 on soak-11 ... soak-11 Dec 2 07:30:16 soak-11 kernel: INFO: task umount:3039 blocked for more than 120 seconds. Dec 2 07:30:16 soak-11 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 2 07:30:16 soak-11 kernel: umount D ffff8803c81f4008 0 3039 3037 0x00000080 Dec 2 07:30:16 soak-11 kernel: ffff8803ce3afa30 0000000000000086 ffff88081fa50000 ffff8803ce3affd8 Dec 2 07:30:16 soak-11 kernel: ffff8803ce3affd8 ffff8803ce3affd8 ffff88081fa50000 ffff8803c81f4000 Dec 2 07:30:16 soak-11 kernel: ffff8803c81f4004 ffff88081fa50000 00000000ffffffff ffff8803c81f4008 Dec 2 07:30:16 soak-11 kernel: Call Trace: Dec 2 07:30:16 soak-11 kernel: [<ffffffff816aa489>] schedule_preempt_disabled+0x29/0x70 Dec 2 07:30:16 soak-11 kernel: [<ffffffff816a83b7>] __mutex_lock_slowpath+0xc7/0x1d0 Dec 2 07:30:16 soak-11 kernel: [<ffffffff816a77cf>] mutex_lock+0x1f/0x2f Dec 2 07:30:16 soak-11 kernel: [<ffffffffc14560c7>] lfsck_stop+0x167/0x4e0 [lfsck] Dec 2 07:30:16 soak-11 kernel: [<ffffffff810c4832>] ? default_wake_function+0x12/0x20 Dec 2 07:30:16 soak-11 kernel: [<ffffffff811e0593>] ? __kmalloc+0x1e3/0x230 Dec 2 07:30:16 soak-11 kernel: [<ffffffffc1625aa6>] mdd_iocontrol+0x96/0x16a0 [mdd] Dec 2 07:30:17 soak-11 kernel: [<ffffffffc0ec9619>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Dec 2 07:30:17 soak-11 kernel: [<ffffffffc1500fc1>] mdt_device_fini+0x71/0x920 [mdt] Dec 2 07:30:17 soak-11 kernel: [<ffffffffc0ed6911>] class_cleanup+0x971/0xcd0 [obdclass] Dec 2 07:30:17 soak-11 kernel: [<ffffffffc0ed8cad>] class_process_config+0x19cd/0x23b0 [obdclass] Dec 2 07:30:17 soak-11 kernel: [<ffffffffc0dc6bc7>] ? libcfs_debug_msg+0x57/0x80 [libcfs] Dec 2 07:30:17 soak-11 kernel: [<ffffffffc0ed9856>] class_manual_cleanup+0x1c6/0x710 [obdclass] Dec 2 07:30:17 soak-11 kernel: [<ffffffffc0f07fee>] server_put_super+0x8de/0xcd0 [obdclass] Dec 2 07:30:17 soak-11 kernel: [<ffffffff81203692>] generic_shutdown_super+0x72/0x100 Dec 2 07:30:17 soak-11 kernel: [<ffffffff81203a62>] kill_anon_super+0x12/0x20 Dec 2 07:30:17 soak-11 kernel: [<ffffffffc0edc152>] lustre_kill_super+0x32/0x50 [obdclass] Dec 2 07:30:17 soak-11 kernel: [<ffffffff81203e19>] deactivate_locked_super+0x49/0x60 Dec 2 07:30:17 soak-11 kernel: [<ffffffff81204586>] deactivate_super+0x46/0x60 Dec 2 07:30:17 soak-11 kernel: [<ffffffff812217cf>] cleanup_mnt+0x3f/0x80 Dec 2 07:30:18 soak-11 kernel: [<ffffffff81221862>] __cleanup_mnt+0x12/0x20 Dec 2 07:30:18 soak-11 kernel: [<ffffffff810ad275>] task_work_run+0xc5/0xf0 Dec 2 07:30:18 soak-11 kernel: [<ffffffff8102ab62>] do_notify_resume+0x92/0xb0 Dec 2 07:30:18 soak-11 kernel: [<ffffffff816b533d>] int_signal+0x12/0x17 Dec 2 07:30:19 soak-11 kernel: LustreError: 11-0: soaked-OST0016-osc-MDT0002: operation ost_connect to node 192.168.1.106@o2ib failed: rc = -114 This wedges soak, no further faults are attempted, jobs stop scheduling. |
| Comments |
| Comment by John Hammond [ 04/Dec/17 ] |
|
Maybe a lfsck start/recovery dealdock. umount is waiting in lfsck_stop() on li_mutex. tgt_recover_2 is in lfsck_start() holding li_mutex. Possibly waiting on lfsck_layout (pid 3013) which is sending notifications but is stuck waiting for import recovery. 00100000:10000000:17.0:1512199622.506187:0:3013:0:(lfsck_engine.c:1576:lfsck_assistant_engine()) soaked-MDT0002-osd: lfsck_layout LFSCK assistant thread start 00000100:00080000:17.0:1512199622.506589:0:3013:0:(client.c:1563:ptlrpc_send_new_req()) @@@ req from PID 0 waiting for recovery: (FULL != CONNECTING) req@ffff8803ee66b900 x1585651536798656/t0(0) o1101->soaked-OST0016-osc-MDT0002@192.168.1.106@o2ib:28/4 lens 320/224 e 0 to 0 dl 0 ref 1 fl Rpc:W/0/ffffffff rc 0/-1 Relevant backtraces: lfsck_namespace S ffff8803c81f4000 0 3012 2 0x00000080^M [<---------------->] schedule+0x29/0x70^M [<---------------->] lfsck_assistant_engine+0x118d/0x20c0 [lfsck]^M [<---------------->] ? dequeue_task_fair+0x41e/0x660^M [<---------------->] ? __schedule+0x39d/0x8b0^M [<---------------->] ? wake_up_state+0x20/0x20^M [<---------------->] ? lfsck_master_engine+0x12f0/0x12f0 [lfsck]^M [<---------------->] kthread+0xcf/0xe0^M [<---------------->] ? insert_kthread_work+0x40/0x40^M [<---------------->] ret_from_fork+0x58/0x90^M [<---------------->] ? insert_kthread_work+0x40/0x40^M lfsck S ffff8803f6dc9800 0 3010 2 0x00000080^M [<---------------->] schedule+0x29/0x70^M [<---------------->] lfsck_start_assistant+0x1d5/0x320 [lfsck]^M [<---------------->] ? dt_read+0x14/0x50 [obdclass]^M [<---------------->] ? lfsck_layout_load_bitmap+0x4bb/0x710 [lfsck]^M [<---------------->] ? wake_up_state+0x20/0x20^M [<---------------->] lfsck_layout_master_prep+0xfe/0x250 [lfsck]^M [<---------------->] lfsck_master_engine+0x176/0x12f0 [lfsck]^M [<---------------->] ? __switch_to+0xd7/0x510^M [<---------------->] ? __schedule+0x39d/0x8b0^M [<---------------->] ? lfsck_master_oit_engine+0x1150/0x1150 [lfsck]^M [<---------------->] kthread+0xcf/0xe0^M [<---------------->] ? insert_kthread_work+0x40/0x40^M [<---------------->] ret_from_fork+0x58/0x90^M [<---------------->] ? insert_kthread_work+0x40/0x40^M lfsck_layout S 000000000000f908 0 3013 2 0x00000080^M [<---------------->] schedule+0x29/0x70^M [<---------------->] schedule_timeout+0x174/0x2c0^M [<---------------->] ? internal_add_timer+0x70/0x70^M [<---------------->] ? ptlrpc_init_rq_pool+0x110/0x110 [ptlrpc]^M [<---------------->] ptlrpc_set_wait+0x4c0/0x910 [ptlrpc]^M [<---------------->] ? wake_up_state+0x20/0x20^M [<---------------->] lfsck_assistant_notify_others+0x110f/0x1390 [lfsck]^M [<---------------->] ? libcfs_debug_msg+0x57/0x80 [libcfs]^M [<---------------->] ? lfsck_master_engine+0x12f0/0x12f0 [lfsck]^M [<---------------->] lfsck_assistant_engine+0x140/0x20c0 [lfsck]^M [<---------------->] ? check_preempt_curr+0x78/0xa0^M [<---------------->] ? dequeue_task_fair+0x41e/0x660^M [<---------------->] ? __switch_to+0xd7/0x510^M [<---------------->] ? __schedule+0x39d/0x8b0^M [<---------------->] ? lfsck_master_engine+0x12f0/0x12f0 [lfsck]^M [<---------------->] kthread+0xcf/0xe0^M [<---------------->] ? insert_kthread_work+0x40/0x40^M [<---------------->] ret_from_fork+0x58/0x90^M [<---------------->] ? insert_kthread_work+0x40/0x40^M tgt_recover_2 S ffff8808067d9fa0 0 2977 2 0x00000080^M [<---------------->] schedule+0x29/0x70^M [<---------------->] lfsck_start+0xf25/0x17c0 [lfsck]^M [<---------------->] ? llog_key_init+0x20/0xd0 [obdclass]^M [<---------------->] ? wake_up_state+0x20/0x20^M [<---------------->] mdd_iocontrol+0x3d3/0x16a0 [mdd]^M [<---------------->] ? osp_key_init+0x3b/0xd0 [osp]^M [<---------------->] mdt_postrecov+0x148/0x1d0 [mdt]^M [<---------------->] mdt_obd_postrecov+0x8d/0xb0 [mdt]^M [<---------------->] target_recovery_thread+0xaf2/0x1360 [ptlrpc]^M [<---------------->] ? __schedule+0x39d/0x8b0^M [<---------------->] ? replay_request_or_update.isra.21+0x8c0/0x8c0 [ptlrpc]^M [<---------------->] kthread+0xcf/0xe0^M [<---------------->] ? insert_kthread_work+0x40/0x40^M [<---------------->] ret_from_fork+0x58/0x90^M [<---------------->] ? insert_kthread_work+0x40/0x40^M umount D ffff8803c81f4008 0 3039 3037 0x00000080^M [<---------------->] schedule_preempt_disabled+0x29/0x70^M [<---------------->] __mutex_lock_slowpath+0xc7/0x1d0^M [<---------------->] mutex_lock+0x1f/0x2f^M [<---------------->] lfsck_stop+0x167/0x4e0 [lfsck]^M [<---------------->] ? default_wake_function+0x12/0x20^M [<---------------->] ? __kmalloc+0x1e3/0x230^M [<---------------->] mdd_iocontrol+0x96/0x16a0 [mdd]^M [<---------------->] ? lprocfs_counter_add+0xf9/0x160 [obdclass]^M [<---------------->] mdt_device_fini+0x71/0x920 [mdt]^M [<---------------->] class_cleanup+0x971/0xcd0 [obdclass]^M [<---------------->] class_process_config+0x19cd/0x23b0 [obdclass]^M [<---------------->] ? libcfs_debug_msg+0x57/0x80 [libcfs]^M [<---------------->] class_manual_cleanup+0x1c6/0x710 [obdclass]^M [<---------------->] server_put_super+0x8de/0xcd0 [obdclass]^M [<---------------->] generic_shutdown_super+0x72/0x100^M [<---------------->] kill_anon_super+0x12/0x20^M [<---------------->] lustre_kill_super+0x32/0x50 [obdclass]^M [<---------------->] deactivate_locked_super+0x49/0x60^M [<---------------->] deactivate_super+0x46/0x60^M [<---------------->] cleanup_mnt+0x3f/0x80^M [<---------------->] __cleanup_mnt+0x12/0x20^M [<---------------->] task_work_run+0xc5/0xf0^M [<---------------->] do_notify_resume+0x92/0xb0^M [<---------------->] int_signal+0x12/0x17^M |
| Comment by John Hammond [ 04/Dec/17 ] |
|
Fan Yong, does this look familiar? |
| Comment by nasf (Inactive) [ 05/Dec/17 ] |
|
According to current implementation (master), the LFSCK notify RPC is interruptible, so would you please to try "kill -9 3013" to check whether the layout LFSCK assistant engine can be interrupted? If yes, then the lfsck_start can go ahead (exit with failure), and then the lfsck_stop will not be blocked. |
| Comment by Cliff White (Inactive) [ 05/Dec/17 ] |
|
Okay, so you want lfsck_layout process killed if this happens again? |
| Comment by nasf (Inactive) [ 06/Dec/17 ] |
|
Yes, please kill the blocked (on recovery) LFSCK assistant thread, such as lfsck_layout or lfsck_namespace. It is expected to work, otherwise, we need to find out the solution. |
| Comment by Cliff White (Inactive) [ 06/Dec/17 ] |
|
It does not work. [root@soak-8 ~]# ps -ef |grep lfsck root 6695 6693 0 04:35 ? 00:00:00 bash -c ??lctl lfsck_start -M soaked-MDT0000 -s 1000 -t all -A ??if [[ $? != 0 ]]; then ???lctl lfsck_start -M soaked-MDT0000 -s 1000 -t namespace,layout -A ???if [[ $? != 0 ]]; then ????lctl lfsck_start -M soaked-MDT0000 -s 1000 -t namespace -A ???fi ??fi root 6713 6695 0 04:35 ? 00:00:00 lctl lfsck_start -M soaked-MDT0000 -s 1000 -t all -A root 6714 2 0 04:35 ? 00:00:00 [lfsck] root 6716 2 0 04:35 ? 00:00:00 [lfsck_layout] root 53709 53676 0 16:09 pts/0 00:00:00 lctl lfsck_stop -M soaked-MDT0000 root 53748 53717 0 16:10 pts/1 00:00:00 grep --color=auto lfsck [root@soak-8 ~]# kill -9 6716 [root@soak-8 ~]# kill -9 6716 [root@soak-8 ~]# ps -ef |grep lfsck root 6695 6693 0 04:35 ? 00:00:00 bash -c ??lctl lfsck_start -M soaked-MDT0000 -s 1000 -t all -A ??if [[ $? != 0 ]]; then ???lctl lfsck_start -M soaked-MDT0000 -s 1000 -t namespace,layout -A ???if [[ $? != 0 ]]; then ????lctl lfsck_start -M soaked-MDT0000 -s 1000 -t namespace -A ???fi ??fi root 6713 6695 0 04:35 ? 00:00:00 lctl lfsck_start -M soaked-MDT0000 -s 1000 -t all -A root 6714 2 0 04:35 ? 00:00:00 [lfsck] root 6716 2 0 04:35 ? 00:00:00 [lfsck_layout] root 53709 53676 0 16:09 pts/0 00:00:00 lctl lfsck_stop -M soaked-MDT0000 root 53754 53717 0 16:11 pts/1 00:00:00 grep --color=auto lfsck Process is un-killable, soak-8 is wedged. I can leave it like this if you want to login. |
| Comment by nasf (Inactive) [ 06/Dec/17 ] |
|
Just login soak-8, but seems reboot. |
| Comment by Cliff White (Inactive) [ 06/Dec/17 ] |
|
Sorry, I'll let you know when the next hang happens. I should have dumped stacks before the reboot. |
| Comment by Cliff White (Inactive) [ 06/Dec/17 ] |
|
I think i have re-produced the hang, I'll leave it like this until you have a look. |
| Comment by Gerrit Updater [ 07/Dec/17 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/30420 |
| Comment by Gerrit Updater [ 07/Dec/17 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/30422 |
| Comment by Andreas Dilger [ 07/Dec/17 ] |
|
I think there is already an option no scrub to prevent Oi scrub from starting. However, it would also be useful if it was possible to force LFSCK to abort the running scan at startup, in case there is a problem like this at a customer. Is “no scrub” along with “lctl lfsck_stop —reset” enough to kill the running LFSCK, and erase the saved state? |
| Comment by nasf (Inactive) [ 08/Dec/17 ] |
|
For this case, it is NOT the LFSCK auto-resumed from the last check-point, instead, it was started manually (or by scripts) after MDT failover. So no related with mount option. |
| Comment by Cliff White (Inactive) [ 08/Dec/17 ] |
|
Hit the problem again last night. System completed a failback, MDTs are all mounted in the proper place, lfsck is wedged. Will leave the system if you wish to have a look. |
| Comment by Cliff White (Inactive) [ 08/Dec/17 ] |
|
Dumped lustre logs from all MDS, output in /scratch/results/soak/soak-X-hung.lfsck.Dec08.txt |
| Comment by Cliff White (Inactive) [ 11/Dec/17 ] |
|
Had a simular hang on soak-3 and soak-6 (OSS) during umount. Dumped lustre-logs, attached. Also dumped stack traces and crash dumped both nodes, output available on spirit. |
| Comment by nasf (Inactive) [ 12/Dec/17 ] |
|
cliffw, as you said on Skype, there are multiple issues during current Spirit tests. The original hung happened on the MDT because of the blocked uninterrupted LFSCK. The patch 30420 is used for resolving such trouble. But it does not means the Spirit will not hung after applying the patch, because some other issues may also block the system. For the new hung, it happened on the OST side, different from the original LFSCK hung. So would you please to check whether the original LFSCK hung issue resolved or not? Thanks! |
| Comment by Cliff White (Inactive) [ 12/Dec/17 ] |
|
We have not seen lfsck hangs with MDT failover after the patch. The two hangs occurred with OST failover. After the most recent hang, i was able to reboot/remount the system and then I could start and stop lfsck without the hang. |
| Comment by nasf (Inactive) [ 13/Dec/17 ] |
|
According to the logs, three were new LFSCK started just after lfsck_stop during the MDT umount. Then nobody will stop the new triggered LFSCK as to the MDT cannot umount. I will make patch to resolve the race condition. |
| Comment by Gerrit Updater [ 13/Dec/17 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/30513 |
| Comment by Gerrit Updater [ 13/Dec/17 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/30514 |
| Comment by Cliff White (Inactive) [ 21/Dec/17 ] |
|
We just hit this again on master, so we need a version of the patch for master. |
| Comment by Cliff White (Inactive) [ 21/Dec/17 ] |
|
Okay, we have an IB build so we will test this. |
| Comment by Cliff White (Inactive) [ 26/Dec/17 ] |
|
We are still hitting this issue, with the patch. Soak-10 just hit it, crash dump vmcore-dmesg and lustre-log attached |
| Comment by Gerrit Updater [ 04/Jan/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30420/ |
| Comment by Gerrit Updater [ 04/Jan/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30513/ |
| Comment by nasf (Inactive) [ 04/Jan/18 ] |
|
Patches have been landed to master, other LFSCK start trouble will be handled via |
| Comment by Gerrit Updater [ 02/Feb/18 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30422/ |
| Comment by Gerrit Updater [ 02/Feb/18 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30514/ |