[LU-14661] Provide kernel API for adding peer/peer NI Created: 30/Apr/21 Updated: 15/Apr/23 Resolved: 24/Jan/23 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Chris Horn | Assignee: | Chris Horn |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
Provide kernel API for adding peer and peer NI Implement LNetAddPeer() and LNetAddPeerNI() APIs to allow other Peers created via these APIs are not marked as having been configured |
| Comments |
| Comment by Gerrit Updater [ 30/Apr/21 ] |
|
Chris Horn (chris.horn@hpe.com) uploaded a new patch: https://review.whamcloud.com/43508 |
| Comment by Gerrit Updater [ 30/Apr/21 ] |
|
Chris Horn (chris.horn@hpe.com) uploaded a new patch: https://review.whamcloud.com/43509 |
| Comment by Gerrit Updater [ 30/Apr/21 ] |
|
Chris Horn (chris.horn@hpe.com) uploaded a new patch: https://review.whamcloud.com/43510 |
| Comment by Gerrit Updater [ 08/Jun/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43508/ |
| Comment by Peter Jones [ 09/Jun/21 ] |
|
Landed for 2.15 |
| Comment by Chris Horn [ 09/Jun/21 ] |
|
pjones there are still two outstanding patches for this ticket, so I am reopening it. |
| Comment by Gerrit Updater [ 18/Aug/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/43509/ |
| Comment by Gerrit Updater [ 18/Aug/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/43510/ |
| Comment by Peter Jones [ 18/Aug/21 ] |
|
Everything seems to have landed now |
| Comment by Alex Zhuravlev [ 01/Dec/22 ] |
|
we met a problem with this patch - config log can contain unavailable NIDs, which result in sequence like this: 00000040:00001000:12.0:1669888833.903403:0:82546:0:(llog.c:605:llog_process_thread()) processing rec 0x00000000453477cb type 0x10620000 00000040:00001000:12.0:1669888833.903403:0:82546:0:(llog.c:611:llog_process_thread()) after swabbing, type=0x10620000 idx=26 00000040:00001000:12.0:1669888833.903404:0:82546:0:(llog.c:713:llog_process_thread()) lrh_index: 26 lrh_len: 120 (4768 remains) 00000040:00001000:12.0:1669888833.903404:0:82546:0:(llog.c:730:llog_process_thread()) index: 26, lh_last_idx: 254 synced_idx: 0 lgh_last_idx: 254 00000020:00000001:12.0:1669888833.903405:0:82546:0:(obd_config.c:1807:class_config_llog_handler()) Process entered 00000020:00000010:12.0:1669888833.903405:0:82546:0:(obd_config.c:2023:class_config_llog_handler()) kmalloced '(lcfg_new)': 96 at 00000000c5e5bbaf. 00000020:00000080:12.0:1669888833.903406:0:82546:0:(obd_config.c:1399:class_process_config()) processing cmd: cf00b 00000020:00000001:12.0:1669888833.903408:0:82546:0:(obd_config.c:965:class_add_conn()) Process entered 00000020:00000001:12.0:1669888833.903408:0:82546:0:(obd_class.h:776:obd_add_conn()) Process entered 00010000:00000001:12.0:1669888833.903409:0:82546:0:(ldlm_lib.c:67:import_set_conn()) Process entered 00000100:00000200:12.0:1669888833.903410:0:82546:0:(events.c:580:ptlrpc_uuid_to_peer()) 172.16.0.80@o2ib->12345-172.16.0.80@o2ib 00000100:00000001:12.0:1669888833.903410:0:82546:0:(connection.c:84:ptlrpc_connection_get()) Process entered 00000400:00000200:12.0:1669888833.903411:0:82546:0:(api-ni.c:1405:lnet_nid_cpt_hash()) Match nid 172.16.0.80@o2ib to cpt 3 00000400:00000200:12.0:1669888833.903415:0:82546:0:(peer.c:2052:lnet_peer_queue_for_discovery()) Queue peer 172.16.0.80@o2ib: 0 00000400:00000200:12.0:1669888833.903416:0:82546:0:(peer.c:2370:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:8.0:1669888867.664662:0:82546:0:(peer.c:2420:lnet_discover_peer_locked()) peer 172.16.0.80@o2ib NID 172.16.0.80@o2ib: -110. discovery complete 00000400:00000200:8.0:1669888867.664664:0:82546:0:(peer.c:1320:LNetPrimaryNID()) NID 172.16.0.80@o2ib primary NID 172.16.0.80@o2ib rc -110 notice 30+ second delay. a log may contain number of unavailable NIDs. so overall config llog processing can take hundreds of seconds. the problem gets worse when few services need to start (in case of failover, for example) - they all are serialized on cl_mgc_mutex. sometimes mounting takes long enough so HA decides it makes no progress and initiate another failover. |
| Comment by Chris Horn [ 01/Dec/22 ] |
|
bzzz Do you have a reproducer? If so, can you see if https://review.whamcloud.com/#/c/fs/lustre-release/+/47322/ helps? |
| Comment by Chris Horn [ 01/Dec/22 ] |
|
Also, it isn't clear to me, based on the output you provided, that this problem is related to my patch. Peers are discovered as part of llog processing with our without |
| Comment by Alex Zhuravlev [ 01/Dec/22 ] |
yes, we do have a reproducer, will try that patch.
I'm not really familiar with lnet code, but the patch adds this: + rc = LNetAddPeer(entry->un_nids, entry->un_nid_count); and I've got one succesful pass with that patch reverted. we'll do more testing of course. |
| Comment by Alex Zhuravlev [ 01/Dec/22 ] |
|
we've tested with the patch reverted on two different systems, successfuly. |
| Comment by Chris Horn [ 01/Dec/22 ] |
Yes, but LNetAddPeer() does not directly result in discovery. Discovery is triggered by call to LNetPrimaryNID() in ptlrpc_connection_get(), and these calls should exist with or without my patch. So we'll need to understand why the behavior is different with the patch. |
| Comment by Alex Zhuravlev [ 01/Dec/22 ] |
|
hornc please check the lines I posted 2 hours ago - this is a case where llog processing took very long. 00000020:00000080:4.0:1669847133.980952:0:19526:0:(obd_config.c:1389:class_process_config()) processing cmd: cf00b 00010000:00080000:4.0:1669847133.980956:0:19526:0:(ldlm_lib.c:99:import_set_conn()) imp ffff9a54fb49b000@testfs-OST0001-osc-MDT0002: found existing conn 172.16.0.75@o2ib 00000020:00000080:4.0:1669847133.980957:0:19526:0:(obd_config.c:1389:class_process_config()) processing cmd: cf005 00000020:00000080:4.0:1669847133.980957:0:19526:0:(obd_config.c:1401:class_process_config()) adding mapping from uuid 172.16.0.77@o2ib to nid 0x50000ac10004d (172.16.0.77@o2ib) 00000020:00000080:4.0:1669847133.980958:0:19526:0:(obd_config.c:1389:class_process_config()) processing cmd: cf005 00000020:00000080:4.0:1669847133.980959:0:19526:0:(obd_config.c:1401:class_process_config()) adding mapping from uuid 172.16.0.77@o2ib to nid 0x50000ac10004e (172.16.0.78@o2ib) 00000020:00000080:4.0:1669847133.980960:0:19526:0:(obd_config.c:1389:class_process_config()) processing cmd: cf00b 00010000:00080000:4.0:1669847133.980961:0:19526:0:(ldlm_lib.c:116:import_set_conn()) imp ffff9a54fb49b000@testfs-OST0001-osc-MDT0002: add connection 172.16.0.77@o2ib at tail 00000020:00000080:4.0:1669847133.980962:0:19526:0:(obd_config.c:1389:class_process_config()) processing cmd: cf005 00000020:00000080:4.0:1669847133.980963:0:19526:0:(obd_config.c:1401:class_process_config()) adding mapping from uuid 172.16.0.79@o2ib to nid 0x50000ac10004f (172.16.0.79@o2ib) 00000020:00000080:4.0:1669847133.980964:0:19526:0:(obd_config.c:1389:class_process_config()) processing cmd: cf005 00000020:00000080:4.0:1669847133.980964:0:19526:0:(obd_config.c:1401:class_process_config()) adding mapping from uuid 172.16.0.79@o2ib to nid 0x50000ac100050 (172.16.0.80@o2ib) 00000020:00000080:4.0:1669847133.980965:0:19526:0:(obd_config.c:1389:class_process_config()) processing cmd: cf00b 00010000:00080000:4.0:1669847133.980967:0:19526:0:(ldlm_lib.c:116:import_set_conn()) imp ffff9a54fb49b000@testfs-OST0001-osc-MDT0002: add connection 172.16.0.79@o2ib at tail 00000020:00000080:4.0:1669847133.980968:0:19526:0:(obd_config.c:1389:class_process_config()) processing cmd: cf005 00000020:00000080:4.0:1669847133.980969:0:19526:0:(obd_config.c:1401:class_process_config()) adding mapping from uuid 172.16.0.81@o2ib to nid 0x50000ac100051 (172.16.0.81@o2ib) 00000020:00000080:4.0:1669847133.980970:0:19526:0:(obd_config.c:1389:class_process_config()) processing cmd: cf005 00000020:00000080:4.0:1669847133.980970:0:19526:0:(obd_config.c:1401:class_process_config()) adding mapping from uuid 172.16.0.81@o2ib to nid 0x50000ac100052 (172.16.0.82@o2ib) 00000020:00000080:4.0:1669847133.980971:0:19526:0:(obd_config.c:1389:class_process_config()) processing cmd: cf00b |
| Comment by Chris Horn [ 01/Dec/22 ] |
|
The log for good case doesn't seem to show the same log processing. e.g. there are no calls to ptlrpc_uuid_to_peer()/ptlrpc_connection_get(). Do you see those calls anywhere in the log? Is "net" trace enabled in debug mask in the log for good case? Can you compare the calls to LNetPrimaryNID() in good/bad case? e.g. count the number of LNetPrimaryNID() calls and compare in good/bad case? |
| Comment by Alex Zhuravlev [ 02/Dec/22 ] |
|
hornc please check the log I've just attached. this was gathered using pre- |
| Comment by Chris Horn [ 02/Dec/22 ] |
|
bzzz Can you please attach the full log for the bad case as well? |
| Comment by Chris Horn [ 02/Dec/22 ] |
|
And can you provide the exact Lustre hash(es) being tested? |
| Comment by Alex Zhuravlev [ 02/Dec/22 ] |
just attached |
| Comment by Chris Horn [ 02/Dec/22 ] |
|
Is LNet peer discovery disabled in the good case but enabled in the bad case? |
| Comment by Alex Zhuravlev [ 02/Dec/22 ] |
not quite ready to answer, will have to check with collegue. |
| Comment by Alex Zhuravlev [ 07/Dec/22 ] |
according to the testing, no, it doesn't help. |
| Comment by Serguei Smirnov [ 09/Dec/22 ] |
|
I was experimenting with the mount command which lists nids as X1,X2:Y1,Y2 In my experiment all nids were fake, so all attempts to connect were expected to fail. I was just checking which nids LNet would try to connect to. Basically, it looks like before |
| Comment by Alex Zhuravlev [ 09/Dec/22 ] |
|
the problem that mount takes too long in such a scenario, HA gives up and initiates a failover which can hit the same problem. |
| Comment by Gaurang Tapase [ 09/Dec/22 ] |
|
We can definitely improve HA agents to wait a bit more (right now they wait for 450s before giving up). I had tried doubling the timeout (900s), still saw the problem. We need a definite timeout before which the mount would succeed. |
| Comment by Alex Zhuravlev [ 09/Dec/22 ] |
|
IMO, increasing timeout isn't the optimial way - mount takes very long (so failover), users have bad experience, etc. |
| Comment by Gaurang Tapase [ 09/Dec/22 ] |
Agree. |
| Comment by Alex Zhuravlev [ 14/Dec/22 ] |
|
hornc can we discover new NIDs asynchronously so that the thread processing llog doesn't block? |
| Comment by Chris Horn [ 17/Jan/23 ] |
|
Couple points:
To be clear, a client mount attempt performs exactly the same with or without With [root@ct7-mds1 lustre-filesystem]# mount -t lustre 1.1.1.1@tcp,1.1.1.2@tcp:2.2.2.1@tcp,2.2.2.2@tcp:/lustre /mnt/lustre mount.lustre: mount 1.1.1.1@tcp,1.1.1.2@tcp:2.2.2.1@tcp,2.2.2.2@tcp:/lustre at /mnt/lustre failed: Input/output error Is the MGS running? [root@ct7-mds1 lustre-filesystem]# lctl dk > /tmp/dk.log [root@ct7-mds1 lustre-filesystem]# grep -e lnet_health_check -e TRACE -e lnet_discover_peer_locked /tmp/dk.log | egrep -e 1.1.1. -e 2.2.2. -e lnet_discover_peer_locked 00000400:00000200:0.0:1673986593.772495:0:5947:0:(peer.c:2528:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:0.0:1673986593.772524:0:5509:0:(lib-move.c:2009:lnet_handle_send()) TRACE: 10.73.20.11@tcp(10.73.20.11@tcp:<?>) -> 1.1.1.1@tcp(1.1.1.1@tcp:1.1.1.1@tcp) <?> : GET try# 0 00000400:00000200:0.0:1673986652.199830:0:5506:0:(lib-msg.c:811:lnet_health_check()) health check: 10.73.20.11@tcp->1.1.1.1@tcp: GET: LOCAL_TIMEOUT 00000400:00000200:0.0:1673986652.199877:0:5510:0:(lib-move.c:2009:lnet_handle_send()) TRACE: 10.73.20.12@tcp(10.73.20.12@tcp:<?>) -> 1.1.1.2@tcp(1.1.1.1@tcp:1.1.1.2@tcp) <?> : GET try# 1 00000400:00000200:0.0:1673986703.226629:0:5506:0:(lib-msg.c:811:lnet_health_check()) health check: 10.73.20.12@tcp->1.1.1.2@tcp: GET: LOCAL_TIMEOUT 00000400:00000200:0.0:1673986703.226700:0:5510:0:(lib-move.c:2009:lnet_handle_send()) TRACE: 10.73.20.11@tcp(10.73.20.11@tcp:<?>) -> 1.1.1.1@tcp(1.1.1.2@tcp:1.1.1.1@tcp) <?> : GET try# 2 00000400:00000200:0.0:1673986721.024165:0:5502:0:(lib-msg.c:811:lnet_health_check()) health check: 10.73.20.11@tcp->1.1.1.1@tcp: GET: LOCAL_TIMEOUT 00000400:00000200:0.0:1673986721.024196:0:5947:0:(peer.c:2578:lnet_discover_peer_locked()) peer 1.1.1.1@tcp NID 1.1.1.1@tcp: -110. discovery complete 00000400:00000200:0.0:1673986721.027992:0:5947:0:(peer.c:2528:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:0.0:1673986721.028022:0:5509:0:(lib-move.c:2009:lnet_handle_send()) TRACE: 10.73.20.12@tcp(10.73.20.12@tcp:<?>) -> 2.2.2.1@tcp(2.2.2.1@tcp:2.2.2.1@tcp) <?> : GET try# 0 00000400:00000200:0.0:1673986778.248439:0:5506:0:(lib-msg.c:811:lnet_health_check()) health check: 10.73.20.12@tcp->2.2.2.1@tcp: GET: LOCAL_TIMEOUT 00000400:00000200:0.0:1673986778.248497:0:5510:0:(lib-move.c:2009:lnet_handle_send()) TRACE: 10.73.20.11@tcp(10.73.20.11@tcp:<?>) -> 2.2.2.2@tcp(2.2.2.1@tcp:2.2.2.2@tcp) <?> : GET try# 1 00000400:00000200:0.0:1673986829.257114:0:5506:0:(lib-msg.c:811:lnet_health_check()) health check: 10.73.20.11@tcp->2.2.2.2@tcp: GET: LOCAL_TIMEOUT 00000400:00000200:0.0:1673986829.257158:0:5510:0:(lib-move.c:2009:lnet_handle_send()) TRACE: 10.73.20.12@tcp(10.73.20.12@tcp:<?>) -> 2.2.2.1@tcp(2.2.2.2@tcp:2.2.2.1@tcp) <?> : GET try# 2 00000400:00000200:0.0:1673986848.256819:0:5505:0:(lib-msg.c:811:lnet_health_check()) health check: 10.73.20.12@tcp->2.2.2.1@tcp: GET: LOCAL_TIMEOUT 00000400:00000200:0.0:1673986848.256871:0:5947:0:(peer.c:2578:lnet_discover_peer_locked()) peer 2.2.2.1@tcp NID 2.2.2.1@tcp: -110. discovery complete ... [root@ct7-mds1 lustre-filesystem]# echo 1673986848.256871 - 1673986593.772495 | bc 254.484376 [root@ct7-mds1 lustre-filesystem]# Without [root@ct7-mds1 lustre-filesystem]# mount -t lustre 1.1.1.1@tcp,1.1.1.2@tcp:2.2.2.1@tcp,2.2.2.2@tcp:/lustre /mnt/lustre mount.lustre: mount 1.1.1.1@tcp,1.1.1.2@tcp:2.2.2.1@tcp,2.2.2.2@tcp:/lustre at /mnt/lustre failed: Input/output error Is the MGS running? [root@ct7-mds1 lustre-filesystem]# lctl dk > /tmp/dk.log2 [root@ct7-mds1 lustre-filesystem]# grep -e lnet_health_check -e TRACE -e lnet_discover_peer_locked /tmp/dk.log2 | egrep -e 1.1.1. -e 2.2.2. -e lnet_discover_peer_locked 00000400:00000200:0.0:1673988188.818846:0:7434:0:(peer.c:2528:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:0.0:1673988188.818874:0:7032:0:(lib-move.c:2009:lnet_handle_send()) TRACE: 10.73.20.11@tcp(10.73.20.11@tcp:<?>) -> 1.1.1.1@tcp(1.1.1.1@tcp:1.1.1.1@tcp) <?> : GET try# 0 00000400:00000200:0.0:1673988249.480663:0:7029:0:(lib-msg.c:811:lnet_health_check()) health check: 10.73.20.11@tcp->1.1.1.1@tcp: GET: LOCAL_TIMEOUT 00000400:00000200:0.0:1673988249.480727:0:7033:0:(lib-move.c:2009:lnet_handle_send()) TRACE: 10.73.20.12@tcp(10.73.20.12@tcp:<?>) -> 1.1.1.1@tcp(1.1.1.1@tcp:1.1.1.1@tcp) <?> : GET try# 1 00000400:00000200:0.0:1673988300.480228:0:7029:0:(lib-msg.c:811:lnet_health_check()) health check: 10.73.20.12@tcp->1.1.1.1@tcp: GET: LOCAL_TIMEOUT 00000400:00000200:0.0:1673988300.480289:0:7033:0:(lib-move.c:2009:lnet_handle_send()) TRACE: 10.73.20.11@tcp(10.73.20.11@tcp:<?>) -> 1.1.1.1@tcp(1.1.1.1@tcp:1.1.1.1@tcp) <?> : GET try# 2 00000400:00000200:0.0:1673988316.033279:0:7025:0:(lib-msg.c:811:lnet_health_check()) health check: 10.73.20.11@tcp->1.1.1.1@tcp: GET: LOCAL_TIMEOUT 00000400:00000200:0.0:1673988316.033351:0:7434:0:(peer.c:2578:lnet_discover_peer_locked()) peer 1.1.1.1@tcp NID 1.1.1.1@tcp: -110. discovery complete 00000400:00000200:0.0:1673988316.034632:0:7434:0:(peer.c:2528:lnet_discover_peer_locked()) Discovery attempt # 1 00000400:00000200:0.0:1673988316.034665:0:7032:0:(lib-move.c:2009:lnet_handle_send()) TRACE: 10.73.20.12@tcp(10.73.20.12@tcp:<?>) -> 2.2.2.1@tcp(2.2.2.1@tcp:2.2.2.1@tcp) <?> : GET try# 0 00000400:00000200:0.0:1673988375.486277:0:7029:0:(lib-msg.c:811:lnet_health_check()) health check: 10.73.20.12@tcp->2.2.2.1@tcp: GET: LOCAL_TIMEOUT 00000400:00000200:0.0:1673988375.486332:0:7033:0:(lib-move.c:2009:lnet_handle_send()) TRACE: 10.73.20.11@tcp(10.73.20.11@tcp:<?>) -> 2.2.2.1@tcp(2.2.2.1@tcp:2.2.2.1@tcp) <?> : GET try# 1 00000400:00000200:0.0:1673988426.497680:0:7029:0:(lib-msg.c:811:lnet_health_check()) health check: 10.73.20.11@tcp->2.2.2.1@tcp: GET: LOCAL_TIMEOUT 00000400:00000200:0.0:1673988426.497744:0:7033:0:(lib-move.c:2009:lnet_handle_send()) TRACE: 10.73.20.12@tcp(10.73.20.12@tcp:<?>) -> 2.2.2.1@tcp(2.2.2.1@tcp:2.2.2.1@tcp) <?> : GET try# 2 00000400:00000200:0.0:1673988443.264923:0:7028:0:(lib-msg.c:811:lnet_health_check()) health check: 10.73.20.12@tcp->2.2.2.1@tcp: GET: LOCAL_TIMEOUT 00000400:00000200:0.0:1673988443.264993:0:7434:0:(peer.c:2578:lnet_discover_peer_locked()) peer 2.2.2.1@tcp NID 2.2.2.1@tcp: -110. discovery complete ... [root@ct7-mds1 lustre-filesystem]# echo 1673988443.264993 - 1673988188.818846 | bc 254.446147 [root@ct7-mds1 lustre-filesystem]# We can see the amount of time spent in blocking discovery calls is identical with or without the patch. Again, the All traffic is driven by Lustre - either by calling LNetPrimaryNID while processing llog or by trying to send RPCs. Yes, if you attempt to mount the MGS/MDT0 and you have a lot of unreachable NIDs in the llog then this can cause delays in starting MGS/MDT0, but it is not clear to me how I reviewed the attached log files but they do not seem to show an identical experiment with/without hornc@C02V50B9HTDG Downloads % grep lustre_fill_super exa61_lustre_debug_mgs_logs_successful_failover 00000020:00000001:2.0:1669879252.327292:0:1363:0:(obd_mount.c:1603:lustre_fill_super()) Process entered 00000020:01200004:2.0:1669879252.327293:0:1363:0:(obd_mount.c:1605:lustre_fill_super()) VFS Op: sb ffff99d65f79c000 00000020:01000004:2.0:1669879252.327519:0:1363:0:(obd_mount.c:1659:lustre_fill_super()) Mounting server from /dev/sdl 00000020:00000001:14.0:1669879252.330511:0:1373:0:(obd_mount.c:1603:lustre_fill_super()) Process entered 00000020:01200004:14.0:1669879252.330512:0:1373:0:(obd_mount.c:1605:lustre_fill_super()) VFS Op: sb ffff99d67ce4e000 00000020:01000004:14.0:1669879252.330803:0:1373:0:(obd_mount.c:1659:lustre_fill_super()) Mounting server from /dev/sdh 00000020:00000001:7.0:1669879252.333040:0:1364:0:(obd_mount.c:1603:lustre_fill_super()) Process entered 00000020:01200004:7.0:1669879252.333041:0:1364:0:(obd_mount.c:1605:lustre_fill_super()) VFS Op: sb ffff99d65c0e4000 00000020:01000004:7.0:1669879252.333251:0:1364:0:(obd_mount.c:1659:lustre_fill_super()) Mounting server from /dev/sdi 00000020:00000001:1.0:1669879252.336037:0:1368:0:(obd_mount.c:1603:lustre_fill_super()) Process entered 00000020:01200004:1.0:1669879252.336037:0:1368:0:(obd_mount.c:1605:lustre_fill_super()) VFS Op: sb ffff99b235271800 00000020:01000004:1.0:1669879252.336229:0:1368:0:(obd_mount.c:1659:lustre_fill_super()) Mounting server from /dev/sdm 00000020:00000001:18.0:1669879252.839585:0:2016:0:(obd_mount.c:1603:lustre_fill_super()) Process entered 00000020:01200004:18.0:1669879252.839587:0:2016:0:(obd_mount.c:1605:lustre_fill_super()) VFS Op: sb ffff99d67cca0800 00000020:01000004:18.0:1669879252.839789:0:2016:0:(obd_mount.c:1659:lustre_fill_super()) Mounting server from /dev/mapper/vg_mdt0003_testfs-mdt0003 00000020:00000001:3.0:1669879252.898029:0:2180:0:(obd_mount.c:1603:lustre_fill_super()) Process entered 00000020:01200004:3.0:1669879252.898031:0:2180:0:(obd_mount.c:1605:lustre_fill_super()) VFS Op: sb ffff99d650fd0800 00000020:01000004:3.0:1669879252.898227:0:2180:0:(obd_mount.c:1659:lustre_fill_super()) Mounting server from /dev/mapper/vg_mdt0001_testfs-mdt0001 00000020:00000001:3.0:1669879294.415448:0:1373:0:(obd_mount.c:1677:lustre_fill_super()) Process leaving via out (rc=0 : 0 : 0x0) 00000020:00000004:3.0:1669879294.415449:0:1373:0:(obd_mount.c:1684:lustre_fill_super()) Mount /dev/sdh complete 00000020:00000001:4.0:1669879294.459274:0:1364:0:(obd_mount.c:1677:lustre_fill_super()) Process leaving via out (rc=0 : 0 : 0x0) 00000020:00000004:4.0:1669879294.459275:0:1364:0:(obd_mount.c:1684:lustre_fill_super()) Mount /dev/sdi complete 00000020:00000001:4.0:1669879294.504129:0:1363:0:(obd_mount.c:1677:lustre_fill_super()) Process leaving via out (rc=0 : 0 : 0x0) 00000020:00000004:4.0:1669879294.504129:0:1363:0:(obd_mount.c:1684:lustre_fill_super()) Mount /dev/sdl complete 00000020:00000001:7.0:1669879294.547133:0:1368:0:(obd_mount.c:1677:lustre_fill_super()) Process leaving via out (rc=0 : 0 : 0x0) 00000020:00000004:7.0:1669879294.547134:0:1368:0:(obd_mount.c:1684:lustre_fill_super()) Mount /dev/sdm complete 00000020:00000001:9.0:1669879434.214316:0:2016:0:(obd_mount.c:1677:lustre_fill_super()) Process leaving via out (rc=0 : 0 : 0x0) 00000020:00000004:9.0:1669879434.214317:0:2016:0:(obd_mount.c:1684:lustre_fill_super()) Mount /dev/mapper/vg_mdt0003_testfs-mdt0003 complete 00000020:00000001:6.0:1669879434.306276:0:2180:0:(obd_mount.c:1677:lustre_fill_super()) Process leaving via out (rc=0 : 0 : 0x0) 00000020:00000004:6.0:1669879434.306277:0:2180:0:(obd_mount.c:1684:lustre_fill_super()) Mount /dev/mapper/vg_mdt0001_testfs-mdt0001 complete hornc@C02V50B9HTDG Downloads % We can see MDT0003/MDT0001 are started on this server. Here's the "failure" case: 00000020:01200004:18.0:1669887849.684402:0:26184:0:(obd_mount.c:1605:lustre_fill_super()) VFS Op: sb 000000006da5383a 00000020:01000004:18.0:1669887849.684416:0:26184:0:(obd_mount.c:1659:lustre_fill_super()) Mounting server from /dev/mapper/vg_mgs-mgs 00000020:01200004:4.0F:1669887851.305182:0:26399:0:(obd_mount.c:1605:lustre_fill_super()) VFS Op: sb 0000000083d05345 00000020:01000004:4.0:1669887851.305239:0:26399:0:(obd_mount.c:1659:lustre_fill_super()) Mounting server from /dev/mapper/vg_mdt0000_testfs-mdt0000 00000020:01200004:18.0:1669887896.681846:0:26695:0:(obd_mount.c:1605:lustre_fill_super()) VFS Op: sb 0000000011e1fd8a 00000020:01000004:18.0:1669887896.681899:0:26695:0:(obd_mount.c:1659:lustre_fill_super()) Mounting server from /dev/sdf 00000020:01200004:8.0:1669887896.689944:0:26694:0:(obd_mount.c:1605:lustre_fill_super()) VFS Op: sb 00000000fd561f91 00000020:01000004:8.0:1669887896.689984:0:26694:0:(obd_mount.c:1659:lustre_fill_super()) Mounting server from /dev/sdg 00000020:01200004:14.0:1669888000.593093:0:34933:0:(obd_mount.c:1605:lustre_fill_super()) VFS Op: sb 00000000015e2fa5 00000020:01000004:14.0:1669888000.593106:0:34933:0:(obd_mount.c:1659:lustre_fill_super()) Mounting server from /dev/mapper/vg_mgs-mgs 00000020:01200004:4.0:1669888003.512820:0:35427:0:(obd_mount.c:1605:lustre_fill_super()) VFS Op: sb 00000000b90f4eca 00000020:01000004:4.0:1669888003.512868:0:35427:0:(obd_mount.c:1659:lustre_fill_super()) Mounting server from /dev/mapper/vg_mdt0000_testfs-mdt0000 00000020:01200004:17.0:1669888003.512942:0:35428:0:(obd_mount.c:1605:lustre_fill_super()) VFS Op: sb 000000005d688404 00000020:01000004:17.0:1669888003.512993:0:35428:0:(obd_mount.c:1659:lustre_fill_super()) Mounting server from /dev/sdf 00000020:01200004:15.0:1669888158.886882:0:50631:0:(obd_mount.c:1605:lustre_fill_super()) VFS Op: sb 00000000fc1a622b 00000020:01000004:15.0:1669888158.886894:0:50631:0:(obd_mount.c:1632:lustre_fill_super()) Mounting client testfs-client 00000020:00000001:12.0:1669888790.461036:0:79837:0:(obd_mount.c:1603:lustre_fill_super()) Process entered 00000020:01200004:12.0:1669888790.461037:0:79837:0:(obd_mount.c:1605:lustre_fill_super()) VFS Op: sb 000000003e0cdb4c 00000020:01000004:12.0:1669888790.461270:0:79837:0:(obd_mount.c:1659:lustre_fill_super()) Mounting server from /dev/sdl 00000020:00000001:8.0:1669888790.462048:0:79841:0:(obd_mount.c:1603:lustre_fill_super()) Process entered 00000020:01200004:8.0:1669888790.462049:0:79841:0:(obd_mount.c:1605:lustre_fill_super()) VFS Op: sb 00000000d89b60bc 00000020:01000004:8.0:1669888790.462268:0:79841:0:(obd_mount.c:1659:lustre_fill_super()) Mounting server from /dev/sdh 00000020:00000001:14.0:1669888790.465089:0:79842:0:(obd_mount.c:1603:lustre_fill_super()) Process entered 00000020:01200004:14.0:1669888790.465090:0:79842:0:(obd_mount.c:1605:lustre_fill_super()) VFS Op: sb 000000000a7dd246 00000020:01000004:14.0:1669888790.465316:0:79842:0:(obd_mount.c:1659:lustre_fill_super()) Mounting server from /dev/sdm 00000020:00000001:13.0:1669888790.466087:0:79843:0:(obd_mount.c:1603:lustre_fill_super()) Process entered 00000020:01200004:13.0:1669888790.466088:0:79843:0:(obd_mount.c:1605:lustre_fill_super()) VFS Op: sb 0000000076093839 00000020:01000004:13.0:1669888790.466360:0:79843:0:(obd_mount.c:1659:lustre_fill_super()) Mounting server from /dev/sdi 00000020:00000001:18.0:1669888790.967385:0:80366:0:(obd_mount.c:1603:lustre_fill_super()) Process entered 00000020:01200004:18.0:1669888790.967386:0:80366:0:(obd_mount.c:1605:lustre_fill_super()) VFS Op: sb 00000000602ad4f3 00000020:01000004:18.0:1669888790.967614:0:80366:0:(obd_mount.c:1659:lustre_fill_super()) Mounting server from /dev/mapper/vg_mdt0003_testfs-mdt0003 00000020:00000001:16.0:1669888790.993980:0:80397:0:(obd_mount.c:1603:lustre_fill_super()) Process entered 00000020:01200004:16.0:1669888790.993981:0:80397:0:(obd_mount.c:1605:lustre_fill_super()) VFS Op: sb 000000003510bf98 00000020:01000004:16.0:1669888790.994203:0:80397:0:(obd_mount.c:1659:lustre_fill_super()) Mounting server from /dev/mapper/vg_mdt0001_testfs-mdt0001 00000020:00000001:14.0:1669888833.193485:0:79841:0:(obd_mount.c:1677:lustre_fill_super()) Process leaving via out (rc=0 : 0 : 0x0) 00000020:00000004:14.0:1669888833.193486:0:79841:0:(obd_mount.c:1684:lustre_fill_super()) Mount /dev/sdh complete 00000020:00000001:18.0:1669888833.236872:0:79842:0:(obd_mount.c:1677:lustre_fill_super()) Process leaving via out (rc=0 : 0 : 0x0) 00000020:00000004:18.0:1669888833.236873:0:79842:0:(obd_mount.c:1684:lustre_fill_super()) Mount /dev/sdm complete 00000020:00000001:3.0:1669888833.289210:0:79837:0:(obd_mount.c:1677:lustre_fill_super()) Process leaving via out (rc=0 : 0 : 0x0) 00000020:00000004:3.0:1669888833.289211:0:79837:0:(obd_mount.c:1684:lustre_fill_super()) Mount /dev/sdl complete 00000020:00000001:13.0:1669888833.341585:0:79843:0:(obd_mount.c:1677:lustre_fill_super()) Process leaving via out (rc=0 : 0 : 0x0) 00000020:00000004:13.0:1669888833.341586:0:79843:0:(obd_mount.c:1684:lustre_fill_super()) Mount /dev/sdi complete Here we see MGS, MDT0000, a client mount, MDT0003 and MDT0001 are started on this node. If we look at just portion of both logs that cover mount of MDT0003 and MDT0001 then it appears they spend about the same amount of time. So, again, it is not clear why I suspect that you are actually seeing a delay in mounting MGS/MDT0, and maybe
This is the idea in https://jira.whamcloud.com/browse/LU-14668 but those patches have not landed. |
| Comment by Serguei Smirnov [ 17/Jan/23 ] |
|
Hi Chris, I think there's a difference in the examples you posted: "With "Without" trace shows client trying to discover 1.1.1.1@tcp, then 2.2.2.1@tcp and that's it. I thought that this difference in behaviour was responsible for the difference in time it took for the mount to fail, because in the "with" case there were more nids being discovered sequentially, but your example demonstrates it is not so as it looks like the same number of discovery attempts is made per peer regardless of destination nid. Either way, like I said before, I don't see anything wrong with Thanks, Serguei. |
| Comment by Chris Horn [ 18/Jan/23 ] |
|
Yes, the NIDs being discovered are different, but my point was that in both cases you have three blocking discovery calls for each "MGS", and those calls contribute the same amount of time to the failed mount with or without the patch. +1 on reviving |
| Comment by Alex Zhuravlev [ 19/Jan/23 ] |
|
well, still the fact that reverting that patch fixed the problem. I guess Guarang can confirm. |
| Comment by Chris Horn [ 23/Jan/23 ] |
|
bzzz I think this ticket should be closed and a new ticket opened for this issue. I am not saying there is no problem, but the logs that are provided so far do not show the problem (AFAICT). |
| Comment by Alex Zhuravlev [ 24/Jan/23 ] |
I'm fine with another ticket or any either way, that's totally fine. but not sure why are you saying above. the logs I pasted above show that in once case config log's processing took 30 seconds and in a good case (with the patch reverted) it took less a second. isn't this a problem? |
| Comment by Chris Horn [ 24/Jan/23 ] |
|
Please open a new ticket and we can continue discussion there. |