[LU-10800] Mount hangs on clients. Created: 09/Mar/18 Updated: 19/Mar/18 Resolved: 19/Mar/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Cliff White (Inactive) | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | soak | ||
| Environment: |
Soak stress cluster, lustre-master-next-ib build 1 |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Mounts frequently hang on clients. Mar 9 18:14:41 soak-36 kernel: INFO: task mount.lustre:2807 blocked for more than 120 seconds. Mar 9 18:14:41 soak-36 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 9 18:14:41 soak-36 kernel: mount.lustre D ffff88085b7a0000 0 2807 2806 0x00000080 Mar 9 18:14:41 soak-36 kernel: Call Trace: Mar 9 18:14:41 soak-36 kernel: [<ffffffff816ab8a9>] schedule+0x29/0x70 Mar 9 18:14:41 soak-36 kernel: [<ffffffff816a92b9>] schedule_timeout+0x239/0x2c0 Mar 9 18:14:41 soak-36 kernel: [<ffffffff81050b4c>] ? native_smp_send_reschedule+0x4c/0x70 Mar 9 18:14:41 soak-36 kernel: [<ffffffff810c2358>] ? resched_curr+0xa8/0xc0 Mar 9 18:14:41 soak-36 kernel: [<ffffffff810c30d8>] ? check_preempt_curr+0x78/0xa0 Mar 9 18:14:41 soak-36 kernel: [<ffffffff810c3119>] ? ttwu_do_wakeup+0x19/0xd0 Mar 9 18:14:41 soak-36 kernel: [<ffffffff816abc5d>] wait_for_completion+0xfd/0x140 Mar 9 18:14:42 soak-36 kernel: [<ffffffff810c6620>] ? wake_up_state+0x20/0x20 Mar 9 18:14:42 soak-36 kernel: [<ffffffffc0b28854>] llog_process_or_fork+0x244/0x450 [obdclass] Mar 9 18:14:42 soak-36 kernel: [<ffffffffc0b28a74>] llog_process+0x14/0x20 [obdclass] Mar 9 18:14:42 soak-36 kernel: [<ffffffffc0b5b1c5>] class_config_parse_llog+0x125/0x350 [obdclass] Mar 9 18:14:42 soak-36 kernel: [<ffffffffc06501c8>] mgc_process_cfg_log+0x788/0xc40 [mgc] Mar 9 18:14:42 soak-36 kernel: [<ffffffffc0652243>] mgc_process_log+0x3d3/0x8b0 [mgc] Mar 9 18:14:42 soak-36 kernel: [<ffffffffc0b63240>] ? class_config_dump_handler+0x7e0/0x7e0 [obdclass] Mar 9 18:14:42 soak-36 kernel: [<ffffffffc0652968>] ? do_config_log_add+0x248/0x580 [mgc] Mar 9 18:14:42 soak-36 kernel: [<ffffffffc0653840>] mgc_process_config+0x890/0x13f0 [mgc] Mar 9 18:14:42 soak-36 kernel: [<ffffffffc0b66c85>] lustre_process_log+0x2d5/0xae0 [obdclass] Mar 9 18:14:42 soak-36 kernel: [<ffffffffc0855e27>] ? libcfs_debug_msg+0x57/0x80 [libcfs] Mar 9 18:14:43 soak-36 kernel: [<ffffffffc0f3e3bb>] ll_fill_super+0x45b/0x1100 [lustre] Mar 9 18:14:43 soak-36 kernel: [<ffffffffc0b6caa6>] lustre_fill_super+0x286/0x910 [obdclass] Mar 9 18:14:43 soak-36 kernel: [<ffffffffc0b6c820>] ? lustre_common_put_super+0x270/0x270 [obdclass] Mar 9 18:14:43 soak-36 kernel: [<ffffffff81206abd>] mount_nodev+0x4d/0xb0 Mar 9 18:14:43 soak-36 kernel: [<ffffffffc0b64ab8>] lustre_mount+0x38/0x60 [obdclass] Mar 9 18:14:43 soak-36 kernel: [<ffffffff81207549>] mount_fs+0x39/0x1b0 Mar 9 18:14:43 soak-36 kernel: [<ffffffff81224177>] vfs_kern_mount+0x67/0x110 Mar 9 18:14:43 soak-36 kernel: [<ffffffff81226683>] do_mount+0x233/0xaf0 Mar 9 18:14:43 soak-36 kernel: [<ffffffff811894ee>] ? __get_free_pages+0xe/0x40 Mar 9 18:14:43 soak-36 kernel: [<ffffffff812272c6>] SyS_mount+0x96/0xf0 Mar 9 18:14:43 soak-36 kernel: [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b Mar 9 18:16:43 soak-36 kernel: INFO: task mount.lustre:2807 blocked for more than 120 seconds. Mar 9 18:16:43 soak-36 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. I dumped the lustre log during the hang, attached. I also crash-dumped the client, files available on soak |
| Comments |
| Comment by John Hammond [ 09/Mar/18 ] |
|
Lustre: Build Version: 2.10.58_140_ge103776 |
| Comment by Cliff White (Inactive) [ 09/Mar/18 ] |
|
Tried lustre-master-ib build 58, per Peter - same issue version=2.10.58_118_gcf6b8f5 [ 842.767852] INFO: task mount.lustre:2760 blocked for more than 120 seconds. [ 842.791128] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 842.817070] mount.lustre D ffff88105acb0000 0 2760 2759 0x00000080 [ 842.840464] Call Trace: [ 842.848565] [<ffffffff810cae08>] ? __enqueue_entity+0x78/0x80 [ 842.867884] [<ffffffff810d17fc>] ? enqueue_entity+0x26c/0xb60 [ 842.887205] [<ffffffff816ab8a9>] schedule+0x29/0x70 [ 842.903644] [<ffffffff816a92b9>] schedule_timeout+0x239/0x2c0 [ 842.922960] [<ffffffff810c95d5>] ? sched_clock_cpu+0x85/0xc0 [ 842.941987] [<ffffffff810c30d8>] ? check_preempt_curr+0x78/0xa0 [ 842.961880] [<ffffffff810c3119>] ? ttwu_do_wakeup+0x19/0xd0 [ 842.980620] [<ffffffff816abc5d>] wait_for_completion+0xfd/0x140 [ 843.000516] [<ffffffff810c6620>] ? wake_up_state+0x20/0x20 [ 843.019150] [<ffffffffc0a24854>] llog_process_or_fork+0x244/0x450 [obdclass] [ 843.042950] [<ffffffffc0a24a74>] llog_process+0x14/0x20 [obdclass] [ 843.063850] [<ffffffffc0a56da5>] class_config_parse_llog+0x125/0x350 [obdclass] [ 843.088364] [<ffffffffc05a51c8>] mgc_process_cfg_log+0x788/0xc40 [mgc] [ 843.110283] [<ffffffffc05a7243>] mgc_process_log+0x3d3/0x8b0 [mgc] [ 843.131181] [<ffffffffc0a5ee20>] ? class_config_dump_handler+0x7e0/0x7e0 [obdclass] [ 843.156774] [<ffffffffc05a7968>] ? do_config_log_add+0x248/0x580 [mgc] [ 843.178634] [<ffffffffc05a8840>] mgc_process_config+0x890/0x13f0 [mgc] [ 843.200574] [<ffffffffc0a62865>] lustre_process_log+0x2d5/0xae0 [obdclass] [ 843.223581] [<ffffffffc0939e27>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [ 843.245497] [<ffffffffc0ea994b>] ll_fill_super+0x45b/0x1100 [lustre] [ 843.266861] [<ffffffffc0a68686>] lustre_fill_super+0x286/0x910 [obdclass] [ 843.289656] [<ffffffffc0a68400>] ? lustre_common_put_super+0x270/0x270 [obdclass] [ 843.314667] [<ffffffff81206abd>] mount_nodev+0x4d/0xb0 [ 843.331977] [<ffffffffc0a60698>] lustre_mount+0x38/0x60 [obdclass] [ 843.352676] [<ffffffff81207549>] mount_fs+0x39/0x1b0 [ 843.369355] [<ffffffff81224177>] vfs_kern_mount+0x67/0x110 [ 843.387756] [<ffffffff81226683>] do_mount+0x233/0xaf0 [ 843.404720] [<ffffffff811894ee>] ? __get_free_pages+0xe/0x40 [ 843.423693] [<ffffffff812272c6>] SyS_mount+0x96/0xf0 [ 843.440374] [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b |
| Comment by John Hammond [ 12/Mar/18 ] |
|
I got onto soak-34 while it was stuck in mount and found the following stack trace for the llog processing thread: 13498 llog_process_th [<ffffffffc099d61b>] lnet_discover_peer_locked+0x10b/0x380 [lnet] [<ffffffffc099d900>] LNetPrimaryNID+0x70/0x1a0 [lnet] [<ffffffffc0c1937e>] ptlrpc_connection_get+0x3e/0x450 [ptlrpc] [<ffffffffc0c0da1c>] ptlrpc_uuid_to_connection+0xec/0x1a0 [ptlrpc] [<ffffffffc0bf21b2>] import_set_conn+0xb2/0x7a0 [ptlrpc] [<ffffffffc0bf35f9>] client_obd_setup+0xd39/0x1450 [ptlrpc] [<ffffffffc0d83c0b>] osc_setup_common+0x5b/0x330 [osc] [<ffffffffc0d83f13>] osc_setup+0x33/0x2f0 [osc] [<ffffffffc0d8f7c5>] osc_device_alloc+0xa5/0x240 [osc] [<ffffffffc0a2be2a>] obd_setup+0x11a/0x2b0 [obdclass] [<ffffffffc0a2d188>] class_setup+0x2a8/0x840 [obdclass] [<ffffffffc0a30662>] class_process_config+0x1ae2/0x27b0 [obdclass] [<ffffffffc0a32e94>] class_config_llog_handler+0x914/0x1330 [obdclass] [<ffffffffc09f69b9>] llog_process_thread+0x839/0x1560 [obdclass] [<ffffffffc09f80ef>] llog_process_thread_daemonize+0x9f/0xe0 [obdclass] [<ffffffff810b4031>] kthread+0xd1/0xe0 [<ffffffff816c0577>] ret_from_fork+0x77/0xb0 [<ffffffffffffffff>] 0xffffffffffffffff 13473 mount.lustre [<ffffffffc09f7924>] llog_process_or_fork+0x244/0x450 [obdclass] [<ffffffffc09f7b44>] llog_process+0x14/0x20 [obdclass] [<ffffffffc0a2a495>] class_config_parse_llog+0x125/0x350 [obdclass] [<ffffffffc0652038>] mgc_process_cfg_log+0x788/0xc40 [mgc] [<ffffffffc0655233>] mgc_process_log+0x3d3/0x8b0 [mgc] [<ffffffffc0656830>] mgc_process_config+0x890/0x13f0 [mgc] [<ffffffffc0a35fd8>] lustre_process_log+0x2d8/0xaf0 [obdclass] [<ffffffffc0e835eb>] ll_fill_super+0x45b/0x1100 [lustre] [<ffffffffc0a3be3c>] lustre_fill_super+0x28c/0x920 [obdclass] [<ffffffff8120948f>] mount_nodev+0x4f/0xb0 [<ffffffffc0a33df8>] lustre_mount+0x38/0x60 [obdclass] [<ffffffff81209f1e>] mount_fs+0x3e/0x1b0 [<ffffffff81226d57>] vfs_kern_mount+0x67/0x110 [<ffffffff81229263>] do_mount+0x233/0xaf0 [<ffffffff81229ea6>] SyS_mount+0x96/0xf0 [<ffffffff816c0715>] system_call_fastpath+0x1c/0x21 [<ffffffffffffffff>] 0xffffffffffffffff This is at the schedule() call in the for (;;) loop of lnet_discover_peer_locked(): [root@soak-34 ~]# nm /lib/modules/3.10.0-693.21.1.el7.x86_64/extra/lustre-client/net/lnet.ko | grep lnet_discover_peer_locked 0000000000035510 T lnet_discover_peer_locked [root@soak-34 ~]# printf '%#x\n' $((0x0000000000035510 + 0x10b)) 0x3561b [root@soak-34 ~]# addr2line -e /lib/modules/3.10.0-693.21.1.el7.x86_64/extra/lustre-client/net/lnet.ko --functions --inlines 0x3561b lnet_discover_peer_locked /usr/src/debug/lustre-2.10.58_139_g630cd49/lnet/lnet/peer.c:2064 The last lines in the debug buffer for the processing thread were: 00000020:01000000:17.0:1520891497.612056:0:13498:0:(obd_config.c:1599:class_config_llog_handler()) Marker, inst_flg=0x2 mark_flg=0x2 00000020:00000080:17.0:1520891497.612058:0:13498:0:(obd_config.c:1135:class_process_config()) processing cmd: cf010 00000020:00000080:17.0:1520891497.612059:0:13498:0:(obd_config.c:1205:class_process_config()) marker 79 (0x2) soaked-OST0017 add osc |
| Comment by Amir Shehata (Inactive) [ 15/Mar/18 ] |
|
It appears that the issue is that with peer discovery enabled some nodes are not responding to the ping (REPLY to the GET). Discovery is initiated in: ptlrpc_connection_get() by calling LNetPrimaryNID() This function blocks until discovery completes. However because discovery is not getting a reply, the timeout for the reply is 3 minutes. In some of the logs I have seen, it shows that two OSTs are timing out, so that adds 6 minutes to mount time. What we see is that after this long period the mount completes. So there appears to be two problems:
For 1, I have run a simple test where I modprobe lustre on all the nodes and then run lctl ping <ost nid> on all the nodes. What I found is that some would fail the ping. If I run the ping again, the nodes which failed to ping succeed the second time around. This behavior is unrelated to discovery. I'll need to look at it in more details tomorrow. To solve 2, there are two options to consider:
|
| Comment by Gerrit Updater [ 15/Mar/18 ] |
|
Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/31663 |
| Comment by Gerrit Updater [ 16/Mar/18 ] |
|
John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/31675 |
| Comment by Amir Shehata (Inactive) [ 16/Mar/18 ] |
|
John found the reason for ping being dropped. There was a patch that was introduced which removed handling the early rxs from the o2iblnd. I believe we are hitting this case. Which triggered the discovery ping timeout. We tested the revert on soak and it works properly. I believe we should land both the revert and the timeout change. The revert will eliminate the drop ping issue. The timeout change will deal with cases where there is a legitimate communication failure, like trying to mount while some of the OSTs are down. |
| Comment by Gerrit Updater [ 17/Mar/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31663/ |
| Comment by Gerrit Updater [ 19/Mar/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31675/ |
| Comment by Peter Jones [ 19/Mar/18 ] |
|
Landed for 2.11 |