[LU-14802] MGS configuration problems - cannot add new OST, change parameters, hanging Created: 01/Jul/21 Updated: 27/Aug/22 Resolved: 31/Jan/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Matt Rásó-Barnett (Inactive) | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Server Environment
# Lustre 2.12.5
[root@rds-mds7 ~]# lfs --version
lfs 2.12.5
# Kernel
[root@rds-mds7 ~]# uname -r
3.10.0-1127.8.2.el7_lustre.x86_64
# MOFED 4.9
[root@rds-mds7 ~]# ofed_info -n
4.9-0.1.7.0.202008231437
# Filesystem
4 MDTs, rds-d6-MDT000[0-3]
64 OSTs initially - attempting to expand this with additional 16 OSTs
Client Environment # Lustre 2.12.6 almost everywhere [root@gpu-e-80 ~]# lfs --version lfs 2.12.6 # Mixture of RHEL 7.9 and RHEL 8.3 clients [root@gpu-e-80 ~]# uname -r 3.10.0-1160.31.1.el7.csd3.x86_64 # Mixture of MOFED 4.9 and MOFED 5.1 [root@gpu-e-80 ~]# ofed_info -n 4.9-2.2.4.0 |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Severity: | 2 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Hello, We've had a number of length periods of filesystem unavailability due to this, since the weekend, mostly from when I've tried to unmount/remount the MGS which is colocated on an MDS server with some of the MDTs in the filesystem, and the act of unmounting has caused the MDS load to spike and access to those MDTs to hang from all clients. The filesystem is currently up for existing clients, but rebooted clients are failing to mount the filesystem again, hence the severity here. Here are the MDS and OSS server kernel logs at the time the first OSTs were added on Sunday: MDS Server logs Sun 27th June Jun 27 22:28:18 rds-mds7 kernel: Lustre: ctl-rds-d6-MDT0000: super-sequence allocation rc = 0 [0x0000001300000400-0x0000001340000400]:40:ost Jun 27 22:28:24 rds-mds7 kernel: LNet: 2968:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Timed out tx for 10.44.163.219@o2ib2: 0 seconds Jun 27 22:28:24 rds-mds7 kernel: LNet: 2968:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Skipped 177 previous similar messages Jun 27 22:30:05 rds-mds7 kernel: LNet: 2968:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Timed out tx for 10.44.163.219@o2ib2: 0 seconds Jun 27 22:30:05 rds-mds7 kernel: LNet: 2968:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Skipped 2 previous similar messages Jun 27 22:30:37 rds-mds7 kernel: Lustre: rds-d6-MDT0002: Connection restored to 0bb09763-547b-56c8-0a7a-c2613e26070a (at 10.47.2.157@o2ib1) Jun 27 22:30:37 rds-mds7 kernel: Lustre: Skipped 216 previous similar messages Jun 27 22:31:55 rds-mds7 kernel: Lustre: rds-d6-MDT0000: Client 472702b7-5951-4007-482a-a3ef0f9d65fb (at 10.44.161.6@o2ib2) reconnecting Jun 27 22:31:55 rds-mds7 kernel: Lustre: Skipped 186 previous similar messages Jun 27 22:32:35 rds-mds7 kernel: LNet: 2968:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Timed out tx for 10.44.163.219@o2ib2: 0 seconds Jun 27 22:32:35 rds-mds7 kernel: LNet: 2968:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Skipped 4 previous similar messages Jun 27 22:32:52 rds-mds7 kernel: LustreError: 166-1: MGC10.44.241.1@o2ib2: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Jun 27 22:32:52 rds-mds7 kernel: LustreError: 8092:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1624829272, 300s ago), entering recovery for MGS@10.44.241.1@o2ib2 ns: MGC10.44.241.1@o2ib2 lock: ffff975ed48c3840/0x184fda0e60b72a0e lrc: 4/1,0 m Jun 27 22:33:09 rds-mds7 kernel: LustreError: 8162:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1624829289, 300s ago); not entering recovery in server code, just going back to sleep ns: MGS lock: ffff975c83dbd100/0x184fda0e60b67c50 lrc: 3/0,1 Jun 27 22:33:09 rds-mds7 kernel: LustreError: 8162:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) Skipped 43 previous similar messages Jun 27 22:33:37 rds-mds7 kernel: LNet: Service thread pid 12985 was inactive for 200.33s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 27 22:33:37 rds-mds7 kernel: LNet: Skipped 3 previous similar messages Jun 27 22:33:37 rds-mds7 kernel: Pid: 12985, comm: ll_mgs_0005 3.10.0-1127.8.2.el7_lustre.x86_64 #1 SMP Sun Aug 23 13:52:28 UTC 2020 Jun 27 22:33:37 rds-mds7 kernel: Call Trace: Jun 27 22:33:37 rds-mds7 kernel: [<ffffffffc1313c90>] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 27 22:33:37 rds-mds7 kernel: [<ffffffffc16a076c>] mgs_completion_ast_generic+0x5c/0x200 [mgs] Jun 27 22:33:37 rds-mds7 kernel: [<ffffffffc16a0983>] mgs_completion_ast_config+0x13/0x20 [mgs] Jun 27 22:33:37 rds-mds7 kernel: [<ffffffffc13147b1>] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 27 22:33:37 rds-mds7 kernel: [<ffffffffc16a53b4>] mgs_revoke_lock+0x104/0x380 [mgs] Jun 27 22:33:37 rds-mds7 kernel: [<ffffffffc16a5ad2>] mgs_target_reg+0x4a2/0x1320 [mgs] Jun 27 22:33:37 rds-mds7 kernel: [<ffffffffc13b29da>] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 27 22:33:37 rds-mds7 kernel: [<ffffffffc135748b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 27 22:33:37 rds-mds7 kernel: [<ffffffffc135adf4>] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 27 22:33:37 rds-mds7 kernel: [<ffffffff964c6691>] kthread+0xd1/0xe0 Jun 27 22:33:37 rds-mds7 kernel: [<ffffffff96b92d1d>] ret_from_fork_nospec_begin+0x7/0x21 Jun 27 22:33:37 rds-mds7 kernel: [<ffffffffffffffff>] 0xffffffffffffffff Jun 27 22:33:37 rds-mds7 kernel: LustreError: dumping log to /tmp/lustre-log.1624829616.12985 Jun 27 22:34:53 rds-mds7 kernel: Lustre: MGS: Received new LWP connection from 10.47.1.153@o2ib1, removing former export from same NID Jun 27 22:34:53 rds-mds7 kernel: Lustre: Skipped 1998 previous similar messages Jun 27 22:35:24 rds-mds7 kernel: LustreError: 12985:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1624829424, 300s ago); not entering recovery in server code, just going back to sleep ns: MGS lock: ffff97351fb81200/0x184fda0e60c14203 lrc: 3/0, Jun 27 22:35:25 rds-mds7 kernel: LustreError: 140-5: Server rds-d6-OST0041 requested index 65, but that index is already in use. Use --writeconf to force Jun 27 22:35:25 rds-mds7 kernel: LustreError: 13001:0:(mgs_handler.c:526:mgs_target_reg()) Failed to write rds-d6-OST0041 log (-98) Jun 27 22:35:26 rds-mds7 kernel: LustreError: 140-5: Server rds-d6-OST0041 requested index 65, but that index is already in use. Use --writeconf to force Jun 27 22:35:26 rds-mds7 kernel: LustreError: 13001:0:(mgs_handler.c:526:mgs_target_reg()) Failed to write rds-d6-OST0041 log (-98) Jun 27 22:37:52 rds-mds7 kernel: LustreError: 166-1: MGC10.44.241.1@o2ib2: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Jun 27 22:37:52 rds-mds7 kernel: LustreError: 8092:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1624829572, 300s ago), entering recovery for MGS@10.44.241.1@o2ib2 ns: MGC10.44.241.1@o2ib2 lock: ffff9745624b8240/0x184fda0e60cb24e5 lrc: 4/1,0 m Jun 27 22:37:52 rds-mds7 kernel: LustreError: 205913:0:(ldlm_resource.c:1147:ldlm_resource_complain()) MGC10.44.241.1@o2ib2: namespace resource [0x36642d736472:0x0:0x0].0x0 (ffff973ee9f4cc00) refcount nonzero (1) after lock cleanup; forcing cleanup. Jun 27 22:37:52 rds-mds7 kernel: LustreError: 8092:0:(mgc_request.c:599:do_requeue()) failed processing log: -5 Jun 27 22:39:41 rds-mds7 kernel: LNet: 2968:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Timed out tx for 10.44.163.219@o2ib2: 0 seconds Jun 27 22:39:41 rds-mds7 kernel: LNet: 2968:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Skipped 3 previous similar messages Jun 27 22:40:11 rds-mds7 kernel: Lustre: 13006:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff974d89171850 x1703756897980928/t0(0) o253->dec8c782-f0e7-81a2-d397-7bf60e27e1ff@10.44.241.25@o2ib2:556/0 lens 4768/4768 e 20 to 0 dl 1624830016 ref 2 fl Interpret:/0/0 rc 0/0 Jun 27 22:40:11 rds-mds7 kernel: Lustre: 13006:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 131 previous similar messages Jun 27 22:40:49 rds-mds7 kernel: Lustre: rds-d6-MDT0003: Connection restored to 2f43bc44-4d4e-e6e8-8ca0-dd0bfa20fa04 (at 10.47.2.114@o2ib1) Jun 27 22:40:49 rds-mds7 kernel: Lustre: Skipped 4151 previous similar messages Jun 27 22:41:59 rds-mds7 kernel: Lustre: rds-d6-MDT0002: Client 03c49b2b-134c-390d-9059-dc732d34be0b (at 10.47.1.253@o2ib1) reconnecting Jun 27 22:41:59 rds-mds7 kernel: Lustre: Skipped 179 previous similar messages Jun 27 22:42:52 rds-mds7 kernel: LustreError: 166-1: MGC10.44.241.1@o2ib2: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Jun 27 22:42:52 rds-mds7 kernel: LustreError: 8092:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1624829872, 300s ago), entering recovery for MGS@10.44.241.1@o2ib2 ns: MGC10.44.241.1@o2ib2 lock: ffff9738f01418c0/0x184fda0e60faac19 lrc: 4/1,0 m Jun 27 22:42:52 rds-mds7 kernel: LustreError: 212692:0:(ldlm_resource.c:1147:ldlm_resource_complain()) MGC10.44.241.1@o2ib2: namespace resource [0x36642d736472:0x0:0x0].0x0 (ffff9724c6f7a6c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Jun 27 22:42:52 rds-mds7 kernel: LustreError: 8092:0:(mgc_request.c:599:do_requeue()) failed processing log: -5 OSS Server logs Jun 27 22:26:34 rds-oss59 kernel: LDISKFS-fs (dm-2): file extents enabled, maximum tree depth=5 Jun 27 22:26:34 rds-oss59 kernel: LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. Opts: errors=remount-ro Jun 27 22:26:35 rds-oss59 kernel: Lustre: Lustre: Build Version: 2.12.5 Jun 27 22:26:35 rds-oss59 kernel: LDISKFS-fs (dm-2): file extents enabled, maximum tree depth=5 Jun 27 22:26:35 rds-oss59 kernel: LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc Jun 27 22:27:24 rds-oss59 kernel: LustreError: 137-5: rds-d6-OST0040_UUID: not available for connect from 10.47.0.2@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 27 22:27:24 rds-oss59 kernel: LustreError: Skipped 135 previous similar messages Jun 27 22:27:24 rds-oss59 kernel: LustreError: 137-5: rds-d6-OST0040_UUID: not available for connect from 10.44.163.107@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 27 22:27:24 rds-oss59 kernel: LustreError: Skipped 1088 previous similar messages Jun 27 22:27:25 rds-oss59 kernel: LustreError: 137-5: rds-d6-OST0040_UUID: not available for connect from 10.44.161.43@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 27 22:27:25 rds-oss59 kernel: LustreError: Skipped 588 previous similar messages Jun 27 22:27:27 rds-oss59 kernel: LustreError: 137-5: rds-d6-OST0040_UUID: not available for connect from 10.44.18.2@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 27 22:27:27 rds-oss59 kernel: LustreError: Skipped 55 previous similar messages Jun 27 22:27:42 rds-oss59 kernel: Lustre: rds-d6-OST0040: new disk, initializing Jun 27 22:27:42 rds-oss59 kernel: Lustre: rds-d6-OST0040: Not available for connect from 10.44.161.6@o2ib2 (not set up) Jun 27 22:27:43 rds-oss59 kernel: Lustre: srv-rds-d6-OST0040: No data found on store. Initialize space Jun 27 22:27:43 rds-oss59 kernel: Lustre: rds-d6-OST0040: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Jun 27 22:28:13 rds-oss59 kernel: Lustre: rds-d6-OST0040: Connection restored to (at 10.47.2.15@o2ib1) Jun 27 22:28:13 rds-oss59 kernel: Lustre: rds-d6-OST0040: Connection restored to (at 10.47.1.172@o2ib1) Jun 27 22:28:14 rds-oss59 kernel: Lustre: rds-d6-OST0040: Connection restored to (at 10.47.21.88@o2ib1) Jun 27 22:28:14 rds-oss59 kernel: Lustre: Skipped 8 previous similar messages Jun 27 22:28:15 rds-oss59 kernel: Lustre: rds-d6-OST0040: Connection restored to (at 10.44.74.52@o2ib2) Jun 27 22:28:15 rds-oss59 kernel: Lustre: Skipped 1456 previous similar messages Jun 27 22:28:17 rds-oss59 kernel: Lustre: rds-d6-OST0040: Connection restored to (at 10.43.12.31@tcp2) Jun 27 22:28:17 rds-oss59 kernel: Lustre: Skipped 447 previous similar messages Jun 27 22:28:18 rds-oss59 kernel: Lustre: cli-rds-d6-OST0040-super: Allocated super-sequence [0x0000001300000400-0x0000001340000400]:40:ost] Jun 27 22:28:32 rds-oss59 kernel: Lustre: rds-d6-OST0040: Connection restored to (at 10.44.161.6@o2ib2) Jun 27 22:28:32 rds-oss59 kernel: Lustre: Skipped 45 previous similar messages Jun 27 22:29:34 rds-oss59 kernel: Lustre: rds-d6-OST0040: Connection restored to (at 10.43.240.201@tcp2) Jun 27 22:29:34 rds-oss59 kernel: Lustre: Skipped 1 previous similar message Jun 27 22:29:42 rds-oss59 kernel: Lustre: rds-d6-OST0040: Client 472702b7-5951-4007-482a-a3ef0f9d65fb (at 10.44.161.6@o2ib2) reconnecting Jun 27 22:30:14 rds-oss59 kernel: LDISKFS-fs (dm-7): file extents enabled, maximum tree depth=5 Jun 27 22:30:14 rds-oss59 kernel: LDISKFS-fs (dm-7): mounted filesystem with ordered data mode. Opts: errors=remount-ro Jun 27 22:30:14 rds-oss59 kernel: LDISKFS-fs (dm-7): file extents enabled, maximum tree depth=5 Jun 27 22:30:15 rds-oss59 kernel: LDISKFS-fs (dm-7): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc Jun 27 22:30:35 rds-oss59 kernel: Lustre: rds-d6-OST0040: Connection restored to (at 10.44.6.58@o2ib2) Jun 27 22:30:35 rds-oss59 kernel: Lustre: Skipped 2 previous similar messages Jun 27 22:32:15 rds-oss59 kernel: Lustre: rds-d6-OST0040: Connection restored to (at 10.47.7.13@o2ib1) Jun 27 22:33:35 rds-oss59 kernel: Lustre: rds-d6-OST0040: Connection restored to (at 10.47.7.15@o2ib1) Jun 27 22:33:35 rds-oss59 kernel: Lustre: Skipped 1 previous similar message Jun 27 22:35:23 rds-oss59 kernel: LustreError: 166-1: MGC10.44.241.1@o2ib2: Connection to MGS (at 10.44.241.1@o2ib2) was lost; in progress operations using this service will fail Jun 27 22:35:23 rds-oss59 kernel: LustreError: 258659:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1624829423, 300s ago), entering recovery for MGS@10.44.241.1@o2ib2 ns: MGC10.44.241.1@o2ib2 lock: ffff9bc4d83c8b40/0xc34bf6a4691f74a9 lrc: 4/1, Jun 27 22:35:23 rds-oss59 kernel: LustreError: 263517:0:(ldlm_resource.c:1147:ldlm_resource_complain()) MGC10.44.241.1@o2ib2: namespace resource [0x36642d736472:0x0:0x0].0x0 (ffff9bc4d59bf5c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Jun 27 22:35:23 rds-oss59 kernel: LustreError: 258659:0:(mgc_request.c:599:do_requeue()) failed processing log: -5 Jun 27 22:35:25 rds-oss59 kernel: LustreError: 15f-b: rds-d6-OST0041: cannot register this server with the MGS: rc = -98. Is the MGS running? Jun 27 22:35:25 rds-oss59 kernel: LustreError: 262284:0:(obd_mount_server.c:1992:server_fill_super()) Unable to start targets: -98 Jun 27 22:35:25 rds-oss59 kernel: LustreError: 262284:0:(obd_mount_server.c:1600:server_put_super()) no obd rds-d6-OST0041 Jun 27 22:35:25 rds-oss59 kernel: LustreError: 262284:0:(obd_mount_server.c:134:server_deregister_mount()) rds-d6-OST0041 not registered Jun 27 22:35:25 rds-oss59 kernel: Lustre: server umount rds-d6-OST0041 complete Jun 27 22:35:25 rds-oss59 kernel: LustreError: 262284:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount (-98) So from above it shows OST0040 being added successfully, but then OST0041 failing to be added, and all subsequent OSTs (they were started together), failing in the same way. The lustre internal log dump file mentioned above lustre-log.1624829616.12985 is attached. Since sunday, I have managed to mount some of the new OSTs successfully, through a mixture of restarting the MGS, and then needing to reformat the new OSTs with the '--replace' flag, to get around the error mentioned in the log: Jun 27 22:35:24 rds-mds7 kernel: LustreError: 12985:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1624829424, 300s ago); not entering recovery in server code, just going back to sleep ns: MGS lock: ffff97351fb81200/0x184fda0e60c14203 lrc: 3/0, Jun 27 22:35:25 rds-mds7 kernel: LustreError: 140-5: Server rds-d6-OST0041 requested index 65, but that index is already in use. Use --writeconf to force Jun 27 22:35:25 rds-mds7 kernel: LustreError: 13001:0:(mgs_handler.c:526:mgs_target_reg()) Failed to write rds-d6-OST0041 log (-98) This worked for a couple of OSTs at a time, before the MGS would hang again refusing any further config. So far I have managed to mount about 8 of the 16 OSTs that I was intending to add on Sundary. I've been looking around on Jira since sunday for similar tickets and I came across: Following the discussion in Unfortunately this doesn't appear to have fixed the issue for us, this is me subsequently trying to add an OST following a restart of the MGS onto this new version:
Jun 30 22:29:10 rds-mds8 kernel: LustreError: 142-7: The target rds-d6-OST004b has not registered yet. It must be started before failnids can be added.
Jun 30 22:29:10 rds-mds8 kernel: LustreError: 166987:0:(mgs_llog.c:4301:mgs_write_log_param()) err -2 on param 'failover.node=10.44.241.26@o2ib2:10.44.241.25@o2ib2'
Jun 30 22:29:10 rds-mds8 kernel: LustreError: 166987:0:(mgs_handler.c:526:mgs_target_reg()) Failed to write rds-d6-OST004b log (-2)
I am also getting errors from rebooted clients that are failing to mount the filesystem: Jul 01 10:28:14 lustre-transfer-02 kernel: LustreError: 22964:0:(mgc_request.c:249:do_config_log_add()) MGC10.44.241.1@o2ib2: failed processing log, type 1: rc = -5 Jul 01 10:28:14 lustre-transfer-02 kernel: LustreError: 22964:0:(mgc_request.c:249:do_config_log_add()) Skipped 2 previous similar messages Jul 01 10:28:33 lustre-transfer-02 kernel: Lustre: rds-d6: root_squash is set to 99:99 Jul 01 10:28:33 lustre-transfer-02 kernel: LustreError: 22964:0:(llite_lib.c:516:client_common_fill_super()) cannot mds_connect: rc = -2 Jul 01 10:28:33 lustre-transfer-02 kernel: Lustre: rds-d6: nosquash_nids set to 10.44.241.[1-2]@o2ib2 10.44.161.[81-84]@o2ib2 10.44.71.[2-3]@o2ib2 10.43.240.[201-202]@tcp2 10.43.240.[198-199]@tcp2 10.144.9.[50-51]@o2ib 10.47.4.[210-213]@o2ib1 Jul 01 10:28:33 lustre-transfer-02 kernel: Lustre: Unmounted rds-d6-client I'm not sure what to try next. Can I get some support in tracking down what is wrong with the MGS? Thanks,
|
| Comments |
| Comment by Matt Rásó-Barnett (Inactive) [ 01/Jul/21 ] |
|
Another thread onto this problem - we have a second identical filesystem that also shows this problem. Again triggered by attempting to add OSTs to it on Sunday evening. I just now tried unmounting the MGS as clients were unable to mount the filesystem and it generated the following stack trace and lustre debug log as it hung for some time before unmounting. Just thought I would add incase this particular trace helps: Unable to find source-code formatter for language: rds-mds10 mgs log on unmount of mgt. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml Jul 01 11:27:11 rds-mds10 kernel: LNet: Service thread pid 204490 was inactive for 200.34s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jul 01 11:27:11 rds-mds10 kernel: Pid: 204490, comm: ll_mgs_0002 3.10.0-1127.8.2.el7_lustre.x86_64 #1 SMP Sun Aug 23 13:52:28 UTC 2020 Jul 01 11:27:11 rds-mds10 kernel: Call Trace: Jul 01 11:27:11 rds-mds10 kernel: [<ffffffffc108ac90>] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jul 01 11:27:11 rds-mds10 kernel: [<ffffffffc141776c>] mgs_completion_ast_generic+0x5c/0x200 [mgs] Jul 01 11:27:11 rds-mds10 kernel: [<ffffffffc1417983>] mgs_completion_ast_config+0x13/0x20 [mgs] Jul 01 11:27:11 rds-mds10 kernel: [<ffffffffc108b7b1>] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jul 01 11:27:11 rds-mds10 kernel: [<ffffffffc141c3b4>] mgs_revoke_lock+0x104/0x380 [mgs] Jul 01 11:27:11 rds-mds10 kernel: [<ffffffffc141cad2>] mgs_target_reg+0x4a2/0x1320 [mgs] Jul 01 11:27:11 rds-mds10 kernel: [<ffffffffc11299da>] tgt_request_handle+0xada/0x1570 [ptlrpc] Jul 01 11:27:11 rds-mds10 kernel: [<ffffffffc10ce48b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jul 01 11:27:11 rds-mds10 kernel: [<ffffffffc10d1df4>] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jul 01 11:27:11 rds-mds10 kernel: [<ffffffff82ac6691>] kthread+0xd1/0xe0 Jul 01 11:27:11 rds-mds10 kernel: [<ffffffff83192d1d>] ret_from_fork_nospec_begin+0x7/0x21 Jul 01 11:27:11 rds-mds10 kernel: [<ffffffffffffffff>] 0xffffffffffffffff Jul 01 11:27:11 rds-mds10 kernel: LustreError: dumping log to /tmp/lustre-log.1625135231.204490 |
| Comment by Peter Jones [ 01/Jul/21 ] |
|
Mike Could you please advise? Thanks Peter |
| Comment by Matt Rásó-Barnett (Inactive) [ 05/Jul/21 ] |
|
Hi Mike, I tried to test this by only updating the MGS server to a 2.12.5 version with this patch backported, and nothing else. However looking more closely at the commit, is this actually a client patch? If so, should I update all the clients in our environment with this patch to see the effect? How about the servers, is there any reason for the servers to have this patch as well? Thanks for your time, |
| Comment by Mikhail Pershin [ 05/Jul/21 ] |
|
Matt, yes, the patch you are referring is client/server patch, both parts are needed actually. Meanwhile I tend to think that originally we have LNET problem here and that caused related problem with MGS. Could you also collect LNET configuration: lnetctl global show lnetctl net show -v 4 also configuration defined in ko2iblnd.conf and the values of the configurations in /sys/module/lnet/parameters/ and /sys/module/ko2iblnd/parameters/ if any. |
| Comment by Matt Rásó-Barnett (Inactive) [ 06/Jul/21 ] |
|
Here is the LNET configuration for the servers: Server Config
[root@rds-mds8 ~]# lnetctl global show
global:
numa_range: 0
max_intf: 200
discovery: 1
drop_asym_route: 0
retry_count: 2
transaction_timeout: 50
health_sensitivity: 100
recovery_interval: 1
[root@rds-mds8 ~]# lnetctl net show -v 4
net:
- net type: lo
local NI(s):
- nid: 0@lo
status: up
statistics:
send_count: 38220
recv_count: 38220
drop_count: 0
sent_stats:
put: 38220
get: 0
reply: 0
ack: 0
hello: 0
received_stats:
put: 38220
get: 0
reply: 0
ack: 0
hello: 0
dropped_stats:
put: 0
get: 0
reply: 0
ack: 0
hello: 0
health stats:
health value: 0
interrupts: 0
dropped: 0
aborted: 0
no route: 0
timeouts: 0
error: 0
tunables:
peer_timeout: 0
peer_credits: 0
peer_buffer_credits: 0
credits: 0
dev cpt: 0
tcp bonding: 0
CPT: "[0,1]"
local NI(s):
- nid: 10.44.241.2@o2ib2
status: up
interfaces:
0: ib0
statistics:
send_count: 41253701
recv_count: 40228442
drop_count: 1038348
sent_stats:
put: 41184595
get: 69106
reply: 0
ack: 0
hello: 0
received_stats:
put: 40150520
get: 3158
reply: 65947
ack: 8817
hello: 0
dropped_stats:
put: 1038348
get: 0
reply: 0
ack: 0
hello: 0
health stats:
health value: 1000
interrupts: 0
dropped: 311
aborted: 0
no route: 0
timeouts: 102
error: 0
tunables:
peer_timeout: 0
peer_credits: 16
peer_buffer_credits: 0
credits: 2048
peercredits_hiw: 8
map_on_demand: 256
concurrent_sends: 16
fmr_pool_size: 512
fmr_flush_trigger: 384
fmr_cache: 1
ntx: 2048
conns_per_peer: 1
lnd tunables:
dev cpt: 0
tcp bonding: 0
CPT: "[0,1]"
[root@rds-mds8 ~]# cat /etc/modprobe.d/ko2iblnd.conf
# Ansible managed
# Currently it isn't possible to auto-tune the o2iblnd parameters optimally
# inside the kernel since the OFED API hides the details from us.
# Unfortunately, there isn't a single set of parameters that provide optimal
# performance on different HCA/HFI types. This file provides optimized
# tunables for the OPA cards.
#
# ** Please note that the below settings are the recommended settings only for
# OPA cards. If other IB cards are also present along with OPA cards then
# these settings will be applied across all the configured IB interfaces.
#
# Card detection and tunable selection is handled via /usr/sbin/ko2iblnd-probe
# at runtime when the ko2iblnd module is installed, either at boot or when
# Lustre is first mounted.
alias ko2iblnd-opa ko2iblnd
options ko2iblnd-opa peer_credits=32 peer_credits_hiw=16 credits=1024 concurrent_sends=64 ntx=2048 map_on_demand=256 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4
install ko2iblnd /usr/sbin/ko2iblnd-probe
[root@rds-mds8 ~]# cat /etc/modprobe.d/lnet.conf
# Ansible managed
options lnet networks="o2ib2(ib0)" auto_down=1 avoid_asym_router_failure=1 check_routers_before_use=1 live_router_check_interval=60 dead_router_check_interval=60 router_ping_timeout=60
[root@rds-mds8 ~]# cat /etc/lnet.conf
# Ansible managed
---
# lnet.conf - configuration file for lnet routes to be imported by lnetctl
#
# This configuration file is formatted as YAML and can be imported
# by lnetctl.
net:
- net type: o2ib2
local NI(s):
- nid: 10.44.241.2@o2ib2
interfaces:
0: ib0
tunables:
peer_credits: 16
peer_credits_hiw: 8
concurrent_sends: 16
credits: 2048
lnd tunables:
map_on_demand: 256
ntx: 2048
route:
- net: o2ib0
gateway: 10.44.240.165@o2ib2
- net: o2ib0
gateway: 10.44.240.166@o2ib2
- net: o2ib0
gateway: 10.44.240.167@o2ib2
- net: o2ib0
gateway: 10.44.240.168@o2ib2
- net: o2ib1
gateway: 10.44.240.161@o2ib2
- net: o2ib1
gateway: 10.44.240.162@o2ib2
- net: o2ib1
gateway: 10.44.240.163@o2ib2
- net: o2ib1
gateway: 10.44.240.164@o2ib2
- net: o2ib1
gateway: 10.44.240.165@o2ib2
- net: o2ib1
gateway: 10.44.240.166@o2ib2
- net: o2ib1
gateway: 10.44.240.167@o2ib2
- net: o2ib1
gateway: 10.44.240.168@o2ib2
- net: tcp2
gateway: 10.44.240.161@o2ib2
- net: tcp2
gateway: 10.44.240.162@o2ib2
- net: tcp2
gateway: 10.44.240.163@o2ib2
- net: tcp2
gateway: 10.44.240.164@o2ib2
- net: tcp2
gateway: 10.44.240.165@o2ib2
- net: tcp2
gateway: 10.44.240.166@o2ib2
- net: tcp2
gateway: 10.44.240.167@o2ib2
- net: tcp2
gateway: 10.44.240.168@o2ib2
- net: tcp4
gateway: 10.44.240.161@o2ib2
- net: tcp4
gateway: 10.44.240.162@o2ib2
- net: tcp4
gateway: 10.44.240.163@o2ib2
- net: tcp4
gateway: 10.44.240.164@o2ib2
- net: tcp4
gateway: 10.44.240.165@o2ib2
- net: tcp4
gateway: 10.44.240.166@o2ib2
- net: tcp4
gateway: 10.44.240.167@o2ib2
- net: tcp4
gateway: 10.44.240.168@o2ib2
[root@rds-mds8 ~]# grep . /sys/module/lnet/parameters/*
/sys/module/lnet/parameters/accept:secure
/sys/module/lnet/parameters/accept_backlog:127
/sys/module/lnet/parameters/accept_port:988
/sys/module/lnet/parameters/accept_timeout:5
/sys/module/lnet/parameters/auto_down:1
/sys/module/lnet/parameters/avoid_asym_router_failure:1
/sys/module/lnet/parameters/check_routers_before_use:1
/sys/module/lnet/parameters/config_on_load:0
/sys/module/lnet/parameters/dead_router_check_interval:60
/sys/module/lnet/parameters/large_router_buffers:0
/sys/module/lnet/parameters/live_router_check_interval:60
/sys/module/lnet/parameters/lnet_drop_asym_route:0
/sys/module/lnet/parameters/lnet_health_sensitivity:100
/sys/module/lnet/parameters/lnet_interfaces_max:200
/sys/module/lnet/parameters/lnet_numa_range:0
/sys/module/lnet/parameters/lnet_peer_discovery_disabled:0
/sys/module/lnet/parameters/lnet_recovery_interval:1
/sys/module/lnet/parameters/lnet_retry_count:2
/sys/module/lnet/parameters/lnet_transaction_timeout:50
/sys/module/lnet/parameters/local_nid_dist_zero:1
/sys/module/lnet/parameters/networks:o2ib2(ib0)
/sys/module/lnet/parameters/peer_buffer_credits:0
/sys/module/lnet/parameters/portal_rotor:3
/sys/module/lnet/parameters/rnet_htable_size:128
/sys/module/lnet/parameters/router_ping_timeout:60
/sys/module/lnet/parameters/small_router_buffers:0
/sys/module/lnet/parameters/tiny_router_buffers:0
/sys/module/lnet/parameters/use_tcp_bonding:0
[root@rds-mds8 ~]# grep . /sys/module/ko2iblnd/parameters/*
/sys/module/ko2iblnd/parameters/cksum:0
/sys/module/ko2iblnd/parameters/concurrent_sends:0
/sys/module/ko2iblnd/parameters/conns_per_peer:1
/sys/module/ko2iblnd/parameters/credits:256
/sys/module/ko2iblnd/parameters/dev_failover:0
/sys/module/ko2iblnd/parameters/fmr_cache:1
/sys/module/ko2iblnd/parameters/fmr_flush_trigger:384
/sys/module/ko2iblnd/parameters/fmr_pool_size:512
/sys/module/ko2iblnd/parameters/ib_mtu:0
/sys/module/ko2iblnd/parameters/ipif_name:ib0
/sys/module/ko2iblnd/parameters/keepalive:100
/sys/module/ko2iblnd/parameters/map_on_demand:0
/sys/module/ko2iblnd/parameters/nscheds:0
/sys/module/ko2iblnd/parameters/ntx:512
/sys/module/ko2iblnd/parameters/peer_buffer_credits:0
/sys/module/ko2iblnd/parameters/peer_credits:8
/sys/module/ko2iblnd/parameters/peer_credits_hiw:0
/sys/module/ko2iblnd/parameters/peer_timeout:180
/sys/module/ko2iblnd/parameters/require_privileged_port:0
/sys/module/ko2iblnd/parameters/retry_count:5
/sys/module/ko2iblnd/parameters/rnr_retry_count:6
/sys/module/ko2iblnd/parameters/service:987
/sys/module/ko2iblnd/parameters/timeout:50
/sys/module/ko2iblnd/parameters/use_fastreg_gaps:0
/sys/module/ko2iblnd/parameters/use_privileged_port:1
/sys/module/ko2iblnd/parameters/wrq_sge:2
The servers here are Mellanox Connect-X6 cards, HDR, and we are currently still using MOFED 4.9 on all the servers:
[root@rds-mds8 ~]# lspci -vvv | grep Mellanox
5e:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
Subsystem: Mellanox Technologies Device 0006
[root@rds-mds8 ~]# ibstatus
Infiniband device 'mlx5_0' port 1 status:
default gid: fe80:0000:0000:0000:1c34:da03:0055:3a44
base lid: 0x24e
sm lid: 0x53
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 100 Gb/sec (2X HDR)
link_layer: InfiniBand
[root@rds-mds8 ~]# ofed_info -n
4.9-0.1.7.0.202008231437
The client environment is much more diverse, so if the logs indicate any particular clients/NIDs that are involved here I can provide more details on those. We also have LNET routers in the environment, and these are routing between IB and Intel OPA and Ethernet. The IB fabric is also relatively diverse, with generations of clients spanning from FDR to EDR to HDR. Generally though these are all running up to date, 2.12.6 Lustre, and a mixture of RHEL 7.9 or RHEL 8.3, with MOFED 4.9 or 5.1.
|
| Comment by Mikhail Pershin [ 08/Jul/21 ] |
|
From LNET level there is health stats for NID 10.44.241.2@o2ib2:
health stats:
health value: 1000
interrupts: 0
dropped: 311
aborted: 0
no route: 0
timeouts: 102
error: 0
it shows 'dropped' values and 'timeouts', though it is not major problem you've seen it still worth to tune lnet values to the following: peer_credits = 32 peer_credits_hiw = 16 concurrent_sends = 64 lnet_transaction_timeout = 100 lnet_retry_count = 2
As for main MGS issue, the main patch change is in lustre_start_mgc() which is used by both servers and clients to access MGS, so I think ideally patch should be added to all nodes. Can you apply it on all servers at least? |
| Comment by Matt Rásó-Barnett (Inactive) [ 12/Jul/21 ] |
|
Hi Mikhail, I can certainly adjust the LNET values to those recommended across our estate. Similarly with deploying the patch. We have a general maintenance coming up next week on the 21st, so we will aim to deploy these changes at this time. Thanks, Matt |
| Comment by Matt Rásó-Barnett (Inactive) [ 14/Jul/21 ] |
|
Hi Mikhail, slight side-question here, but in setting the parameter lnet_transaction_timeout I don't see a way to set this in the /etc/lnet.conf file. It seems to be able to be configured dynamically via lnetctl set transaction_timeout, but I was hoping to find a configuration file way to do it, something like eg:
# cat /etc/lnet.conf
...
global:
transaction_timeout = 100
retry_count = 2
We can obviously configure the setting dynamically via script, but we've moved all our tunables into managing the /etc/lnet.conf file now, so having a facility to set it via this file would be desirable just for simple management. Otherwise we will roll out the patch and updated settings next week so will feed-back after this how it's looking. Thanks, Matt |
| Comment by Mikhail Pershin [ 16/Jul/21 ] |
|
ssmirnov, could you please help with LNET settings here? |
| Comment by Serguei Smirnov [ 16/Jul/21 ] |
|
Hi, You should be able to set the transaction timeout and retry count in lnet.conf by adding the following line: options lnet lnet_retry_count=3 lnet_transaction_timeout=150 the o2ib lnd parameters can be set like this: options ko2iblnd peer_credits=32 peer_credits_hiw=16 concurrent_sends=64 Please make sure that the above is set uniformly across the cluster. Thanks, Serguei. |
| Comment by Matt Rásó-Barnett (Inactive) [ 16/Jul/21 ] |
|
Thanks Serguei, I forgot about adding it to the module options - I've been recently using the dynamic yaml /etc/lnet.conf config so much that that was my instinctive first choice for adding the configuration. Cheers, |
| Comment by Etienne Aujames [ 21/Jul/21 ] |
|
Hello mrb, Are you sure to have the same logs before and after applying the patch The typical dmesg on MGC nodes:
LustreError: 22397:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567976846, 300s ago), entering recovery for MGS@MGC10.0.10.51@o2ib7_0 ns: MGC10.0.10.51@o2ib7 lock: ffff90b36ed157c0/0x98816ce9d089ad9b lrc: 4/1,0 mode: --/CR res: [0x726966:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x98816ce9d089ada9 expref: -99 pid: 22397 timeout: 0 lvb_type: 0
LustreError: 22397:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message
LustreError: 5858:0:(ldlm_resource.c:1147:ldlm_resource_complain()) MGC10.0.10.51@o2ib7: namespace resource [0x726966:0x2:0x0].0x0 (ffff90b2677b6d80) refcount nonzero (1) after lock cleanup; forcing cleanup.
LustreError: 5858:0:(ldlm_resource.c:1147:ldlm_resource_complain()) Skipped 1 previous similar message
MGCs (all the clients nodes) are trying to reconnect indefinitely to MGS. Because MGS wait indefinitely a MGC to release a lock: it can not evict the client (flag OBD_CONNECT_MDS_MDS is set by the client). So the following calltrace could be observed on the MGS: Jun 27 22:33:37 rds-mds7 kernel: Call Trace: Jun 27 22:33:37 rds-mds7 kernel: [<ffffffffc1313c90>] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 27 22:33:37 rds-mds7 kernel: [<ffffffffc16a076c>] mgs_completion_ast_generic+0x5c/0x200 [mgs] Jun 27 22:33:37 rds-mds7 kernel: [<ffffffffc16a0983>] mgs_completion_ast_config+0x13/0x20 [mgs] Jun 27 22:33:37 rds-mds7 kernel: [<ffffffffc13147b1>] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 27 22:33:37 rds-mds7 kernel: [<ffffffffc16a53b4>] mgs_revoke_lock+0x104/0x380 [mgs] Jun 27 22:33:37 rds-mds7 kernel: [<ffffffffc16a5ad2>] mgs_target_reg+0x4a2/0x1320 [mgs] Jun 27 22:33:37 rds-mds7 kernel: [<ffffffffc13b29da>] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 27 22:33:37 rds-mds7 kernel: [<ffffffffc135748b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 27 22:33:37 rds-mds7 kernel: [<ffffffffc135adf4>] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 27 22:33:37 rds-mds7 kernel: [<ffffffff964c6691>] kthread+0xd1/0xe0 Jun 27 22:33:37 rds-mds7 kernel: [<ffffffff96b92d1d>] ret_from_fork_nospec_begin+0x7/0x21 Jun 27 22:33:37 rds-mds7 kernel: [<ffffffffffffffff>] 0xffffffffffffffff
After some hours/days (depending the number of clients), we observe that the load increase on the MGS node, because the res->lr_waiting become to large. The node became unresponsive, the HA kill it. The The known workaround for this issue is to umount/mount the MGT, but it might crash the MGS . The main difference between our issue and this one is the resource.
xxd -p -r <<< "36642d736472" && echo
6d-sdr (rds-6d)
This issue could be definitly be caused by network errors (MGC become unresponsive with a read lock on the resource). |
| Comment by Matt Rásó-Barnett (Inactive) [ 30/Jul/21 ] |
|
Hi all, I have an update now post-upgrade of our servers and clients last week. I have now updated all servers and the majority of our clients to 2.12.7 + https://jira.whamcloud.com/browse/LU-13356 patch. Since that update we haven't had the same issues in most cases, which is good, so it appears the patch has worked for us. I have been able to update the MGS configuration and add new OSTs for one of the new filesystems, which was the original trigger for this ticket. However unfortunately, I still have 6 OSTs on one of the other upgraded filesystems that are still refusing to mount and register. This is even the case after I re-format the OSTs: MGS logs - post format of OST0049 Jul 30 15:22:34 rds-mds8 kernel: LustreError: 140-5: Server rds-d6-OST0049 requested index 73, but that index is already in use. Use --writeconf to force Jul 30 15:22:34 rds-mds8 kernel: LustreError: 239688:0:(mgs_handler.c:526:mgs_target_reg()) Failed to write rds-d6-OST0049 log (-98) Jul 30 15:22:34 rds-mds8 kernel: LustreError: 239688:0:(mgs_handler.c:526:mgs_target_reg()) Skipped 2 previous similar messages Jul 30 15:22:36 rds-mds8 kernel: LustreError: 140-5: Server rds-d6-OST0049 requested index 73, but that index is already in use. Use --writeconf to force Jul 30 15:22:36 rds-mds8 kernel: LustreError: 239688:0:(mgs_handler.c:526:mgs_target_reg()) Failed to write rds-d6-OST0049 log (-98) OSS logs - post format of OST0049
Jul 30 15:22:28 rds-oss60 kernel: LDISKFS-fs (dm-11): file extents enabled, maximum tree depth=5
Jul 30 15:22:29 rds-oss60 kernel: LDISKFS-fs (dm-11): mounted filesystem with ordered data mode. Opts: errors=remount-ro
Jul 30 15:22:29 rds-oss60 kernel: LDISKFS-fs (dm-11): file extents enabled, maximum tree depth=5
Jul 30 15:22:29 rds-oss60 kernel: LDISKFS-fs (dm-11): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
Jul 30 15:22:35 rds-oss60 kernel: LustreError: 15f-b: rds-d6-OST0049: cannot register this server with the MGS: rc = -98. Is the MGS running?
Jul 30 15:22:35 rds-oss60 kernel: LustreError: Skipped 1 previous similar message
Jul 30 15:22:35 rds-oss60 kernel: LustreError: 120315:0:(obd_mount_server.c:1992:server_fill_super()) Unable to start targets: -98
Jul 30 15:22:35 rds-oss60 kernel: LustreError: 120315:0:(obd_mount_server.c:1600:server_put_super()) no obd rds-d6-OST0049
Jul 30 15:22:35 rds-oss60 kernel: LustreError: 120315:0:(obd_mount_server.c:134:server_deregister_mount()) rds-d6-OST0049 not registered
Jul 30 15:22:35 rds-oss60 kernel: LustreError: 120315:0:(obd_mount_server.c:134:server_deregister_mount()) Skipped 1 previous similar message
Jul 30 15:22:35 rds-oss60 kernel: Lustre: server umount rds-d6-OST0049 complete
Jul 30 15:22:35 rds-oss60 kernel: LustreError: 120315:0:(obd_mount.c:1604:lustre_fill_super()) Unable to mount (-98)
Ok so it's complaining that the requested index is in use, so I reformat again now with the '--replace' flag: MGS logs - post format with --replace Jul 30 15:27:26 rds-mds8 kernel: Lustre: Found index 73 for rds-d6-OST0049, updating log Jul 30 15:27:26 rds-mds8 kernel: LustreError: 142-7: The target rds-d6-OST0049 has not registered yet. It must be started before failnids can be added. Jul 30 15:27:26 rds-mds8 kernel: LustreError: Skipped 2 previous similar messages Jul 30 15:27:26 rds-mds8 kernel: LustreError: 286637:0:(mgs_llog.c:4304:mgs_write_log_param()) err -2 on param 'failover.node=10.44.241.26@o2ib2:10.44.241.25@o2ib2' Jul 30 15:27:26 rds-mds8 kernel: LustreError: 286637:0:(mgs_llog.c:4304:mgs_write_log_param()) Skipped 2 previous similar messages Jul 30 15:27:26 rds-mds8 kernel: LustreError: 286637:0:(mgs_handler.c:526:mgs_target_reg()) Failed to write rds-d6-OST0049 log (-2) OSS logs - post format with --replace
Jul 30 15:27:20 rds-oss60 kernel: LDISKFS-fs (dm-11): file extents enabled, maximum tree depth=5
Jul 30 15:27:21 rds-oss60 kernel: LDISKFS-fs (dm-11): mounted filesystem with ordered data mode. Opts: errors=remount-ro
Jul 30 15:27:21 rds-oss60 kernel: LDISKFS-fs (dm-11): file extents enabled, maximum tree depth=5
Jul 30 15:27:21 rds-oss60 kernel: LDISKFS-fs (dm-11): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
Jul 30 15:27:32 rds-oss60 kernel: LustreError: 15f-b: rds-d6-OST0049: cannot register this server with the MGS: rc = -2. Is the MGS running?
Jul 30 15:27:33 rds-oss60 kernel: LustreError: 123310:0:(obd_mount_server.c:1992:server_fill_super()) Unable to start targets: -2
Jul 30 15:27:33 rds-oss60 kernel: LustreError: 123310:0:(obd_mount_server.c:1600:server_put_super()) no obd rds-d6-OST0049
Jul 30 15:27:33 rds-oss60 kernel: LustreError: 123310:0:(obd_mount_server.c:134:server_deregister_mount()) rds-d6-OST0049 not registered
Jul 30 15:27:33 rds-oss60 kernel: Lustre: server umount rds-d6-OST0049 complete
Jul 30 15:27:33 rds-oss60 kernel: LustreError: 123310:0:(obd_mount.c:1604:lustre_fill_super()) Unable to mount (-2)
I don't see any log files on the MGT for the OSTs I'm trying to mount (for reference these are rds-d6-OST004[9abdef]):
[root@rds-mds8 ~]# lctl --device MGS llog_catlist | grep -E "rds-d6-OST004[9abdef]"
[root@rds-mds8 ~]#
I'm not really sure what else to try at this point? Is there any information from the above logs that would indicate the problem here? |
| Comment by Mikhail Pershin [ 30/Jul/21 ] |
|
Matt, could you show exact commands you were using to format OST? The mgs_write_log_param() failed on param failover.node setting, probably it is just wrong syntax. Also to get more details you could collect lustre logs, just do the following steps:
Prior that remember old debug level on MGS, get it via 'lctl get_param debug' and save it, after all set it back on MGS, maximum debug level would cause server slowdown, so better to use default setting and set maximum level only to debug things |
| Comment by Matt Rásó-Barnett (Inactive) [ 03/Aug/21 ] |
|
Here are the format commands used: /sbin/mkfs.lustre --ost --replace --fsname rds-d6 --index 73 --mgsnode='10.44.241.1@o2ib2' --mgsnode='10.44.241.2@o2ib2' --servicenode='10.44.241.26@o2ib2' --servicenode='10.44.241.25@o2ib2' --mkfsoptions '-i 2097152 -E stride=32,stripe-width=256' '/dev/mapper/v2-rds-ost-jb60' The only difference between the tests above is whether I include the '--replace' flag or not. I have taken logs from both the MGS and OSS during an attempted mount after running the above format command. I've set a debug marker just before this: 'DEBUG MARKER: START rds-OST0049 mount'. The OSS NID is: Unfortunately I think the MGS log is too big to upload to this ticket (even compressed it's 100M), do you have an sftp upload facility I can put the log files? Here is (I think), the relevant section from the MGS log: 20000000:00000001:25.0:1627998228.698165:0:182803:0:(mgs_handler.c:364:mgs_target_reg()) Process entered 20000000:00000001:25.0:1627998228.698167:0:182803:0:(mgs_llog.c:573:mgs_find_or_make_fsdb()) Process entered 20000000:00000001:25.0:1627998228.698167:0:182803:0:(mgs_llog.c:552:mgs_find_or_make_fsdb_nolock()) Process entered 20000000:00000001:25.0:1627998228.698168:0:182803:0:(mgs_llog.c:566:mgs_find_or_make_fsdb_nolock()) Process leaving (rc=0 : 0 : 0) 20000000:00000001:25.0:1627998228.698169:0:182803:0:(mgs_llog.c:579:mgs_find_or_make_fsdb()) Process leaving (rc=0 : 0 : 0) 20000000:01000000:25.0:1627998228.698169:0:182803:0:(mgs_handler.c:414:mgs_target_reg()) fs: rds-d6 index: 73 is registered to MGS. 20000000:00000001:25.0:1627998228.698170:0:182803:0:(mgs_llog.c:573:mgs_find_or_make_fsdb()) Process entered 20000000:00000001:25.0:1627998228.698170:0:182803:0:(mgs_llog.c:552:mgs_find_or_make_fsdb_nolock()) Process entered 20000000:00000001:25.0:1627998228.698171:0:182803:0:(mgs_llog.c:566:mgs_find_or_make_fsdb_nolock()) Process leaving (rc=0 : 0 : 0) 20000000:00000001:25.0:1627998228.698171:0:182803:0:(mgs_llog.c:579:mgs_find_or_make_fsdb()) Process leaving (rc=0 : 0 : 0) 20000000:01000000:25.0:1627998228.698172:0:182803:0:(mgs_handler.c:519:mgs_target_reg()) updating rds-d6-OST0049, index=73 20000000:00000001:25.0:1627998228.698172:0:182803:0:(mgs_llog.c:4315:mgs_write_log_target()) Process entered 20000000:00000001:25.0:1627998228.698173:0:182803:0:(mgs_llog.c:666:mgs_set_index()) Process entered 20000000:00000001:25.0:1627998228.698173:0:182803:0:(mgs_llog.c:573:mgs_find_or_make_fsdb()) Process entered 20000000:00000001:25.0:1627998228.698173:0:182803:0:(mgs_llog.c:552:mgs_find_or_make_fsdb_nolock()) Process entered 20000000:00000001:25.0:1627998228.698174:0:182803:0:(mgs_llog.c:566:mgs_find_or_make_fsdb_nolock()) Process leaving (rc=0 : 0 : 0) 20000000:00000001:25.0:1627998228.698174:0:182803:0:(mgs_llog.c:579:mgs_find_or_make_fsdb()) Process leaving (rc=0 : 0 : 0) 20000000:01000000:25.0:1627998228.698175:0:182803:0:(mgs_llog.c:710:mgs_set_index()) Server rds-d6-OST0049 updating index 73 20000000:00000001:25.0:1627998228.698175:0:182803:0:(mgs_llog.c:711:mgs_set_index()) Process leaving via out_up (rc=114 : 114 : 0x72) 20000000:02000400:25.0:1627998228.698176:0:182803:0:(mgs_llog.c:4324:mgs_write_log_target()) Found index 73 for rds-d6-OST0049, updating log 20000000:01000000:25.0:1627998228.698178:0:182803:0:(mgs_llog.c:4356:mgs_write_log_target()) Update params for rds-d6-OST0049 20000000:00000010:25.0:1627998228.698179:0:182803:0:(mgs_llog.c:4362:mgs_write_log_target()) kmalloced '(buf)': 97 at ffff96c084bf1700. 20000000:01000000:25.0:1627998228.698180:0:182803:0:(mgs_llog.c:4376:mgs_write_log_target()) remaining string: ' failover.node=10.44.241.26@o2ib2:10.44.241.25@o2ib2 ', param: 'mgsnode=10.44.241.1@o2ib2:10.44.241.2@o2ib2' 20000000:00000001:25.0:1627998228.698181:0:182803:0:(mgs_llog.c:3886:mgs_write_log_param()) Process entered 20000000:01000000:25.0:1627998228.698181:0:182803:0:(mgs_llog.c:3890:mgs_write_log_param()) next param 'mgsnode=10.44.241.1@o2ib2:10.44.241.2@o2ib2' 20000000:00000001:25.0:1627998228.698182:0:182803:0:(mgs_llog.c:3897:mgs_write_log_param()) Process leaving via end (rc=0 : 0 : 0x0) 20000000:00000001:25.0:1627998228.698183:0:182803:0:(mgs_llog.c:4306:mgs_write_log_param()) Process leaving (rc=0 : 0 : 0) 20000000:01000000:25.0:1627998228.698183:0:182803:0:(mgs_llog.c:4376:mgs_write_log_target()) remaining string: ' ', param: 'failover.node=10.44.241.26@o2ib2:10.44.241.25@o2ib2' 20000000:00000001:25.0:1627998228.698184:0:182803:0:(mgs_llog.c:3886:mgs_write_log_param()) Process entered 20000000:01000000:25.0:1627998228.698184:0:182803:0:(mgs_llog.c:3890:mgs_write_log_param()) next param 'failover.node=10.44.241.26@o2ib2:10.44.241.25@o2ib2' 20000000:01000000:25.0:1627998228.698185:0:182803:0:(mgs_llog.c:3925:mgs_write_log_param()) Adding failnode 20000000:00000001:25.0:1627998228.698185:0:182803:0:(mgs_llog.c:3203:mgs_write_log_add_failnid()) Process entered 20000000:00000040:25.0:1627998228.698186:0:182803:0:(lustre_log.h:367:llog_ctxt_get()) GETting ctxt ffff96bf0f3a6600 : new refcount 2 00000040:00000001:25.0:1627998228.698187:0:182803:0:(llog.c:1280:llog_open()) Process entered 00000040:00000010:25.0:1627998228.698188:0:182803:0:(llog.c:61:llog_alloc_handle()) kmalloced '(loghandle)': 280 at ffff96c104bcce00. 00000040:00000001:25.0:1627998228.698190:0:182803:0:(llog_osd.c:1236:llog_osd_open()) Process entered 00000020:00000001:25.0:1627998228.698190:0:182803:0:(local_storage.c:146:ls_device_get()) Process entered 00000020:00000001:25.0:1627998228.698191:0:182803:0:(local_storage.c:151:ls_device_get()) Process leaving via out_ls (rc=18446628553290942976 : -115520418608640 : 0xffff96ef4e7c0a00) 00000020:00000001:25.0:1627998228.698192:0:182803:0:(local_storage.c:173:ls_device_get()) Process leaving (rc=18446628553290942976 : -115520418608640 : ffff96ef4e7c0a00) 00080000:00000001:25.0:1627998228.698194:0:182803:0:(osd_handler.c:7423:osd_index_ea_lookup()) Process entered 00080000:00000001:25.0:1627998228.698195:0:182803:0:(osd_handler.c:5972:osd_ea_lookup_rec()) Process entered 00080000:00000001:25.0:1627998228.698200:0:182803:0:(osd_handler.c:6047:osd_ea_lookup_rec()) Process leaving via out (rc=18446744073709551614 : -2 : 0xfffffffffffffffe) 00080000:00000001:25.0:1627998228.698202:0:182803:0:(osd_handler.c:7431:osd_index_ea_lookup()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe) 00000040:00000001:25.0:1627998228.698203:0:182803:0:(llog_osd.c:1298:llog_osd_open()) Process leaving via out (rc=18446744073709551614 : -2 : 0xfffffffffffffffe) 00000040:00000001:25.0:1627998228.698203:0:182803:0:(llog_osd.c:1354:llog_osd_open()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe) 00000040:00000010:25.0:1627998228.698204:0:182803:0:(llog.c:91:llog_free_handle()) kfreed 'loghandle': 280 at ffff96c104bcce00. 00000040:00000001:25.0:1627998228.698205:0:182803:0:(llog.c:1306:llog_open()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe) 00000040:00000001:25.0:1627998228.698205:0:182803:0:(llog.c:1337:llog_is_empty()) Process leaving via out (rc=0 : 0 : 0x0) 20000000:00000040:25.0:1627998228.698206:0:182803:0:(lustre_log.h:377:llog_ctxt_put()) PUTting ctxt ffff96bf0f3a6600 : new refcount 1 20000000:02020000:25.0:1627998228.698206:0:182803:0:(mgs_llog.c:3214:mgs_write_log_add_failnid()) 142-7: The target rds-d6-OST0049 has not registered yet. It must be started before failnids can be added. 00000800:00000200:36.2F:1627998228.706154:0:0:0:(o2iblnd_cb.c:3769:kiblnd_cq_completion()) conn[ffff96c19bbf3a00] (36)++ 00000800:00000200:43.0:1627998228.706160:0:2905:0:(o2iblnd_cb.c:3891:kiblnd_scheduler()) conn[ffff96c19bbf3a00] (37)++ 00000800:00000200:43.0:1627998228.706161:0:2905:0:(o2iblnd_cb.c:343:kiblnd_handle_rx()) Received d1[1] from 10.44.240.162@o2ib2 00000400:00000200:43.0:1627998228.706163:0:2905:0:(lib-move.c:4196:lnet_parse()) TRACE: 10.44.241.2@o2ib2(10.44.241.2@o2ib2) <- 10.43.10.24@tcp2 : PUT - for me 00000800:00000200:45.0:1627998228.706164:0:2903:0:(o2iblnd_cb.c:3907:kiblnd_scheduler()) conn[ffff96c19bbf3a00] (38)-- 00000400:00000010:43.0:1627998228.706164:0:2905:0:(lib-lnet.h:490:lnet_msg_alloc()) alloc '(msg)': 440 at ffff96c10fb88e00 (tot 357391524). 00000400:00000200:43.0:1627998228.706165:0:2905:0:(lib-ptl.c:571:lnet_ptl_match_md()) Request from 12345-10.43.10.24@tcp2 of length 224 into portal 26 MB=0x60f8b5d916f80 00000400:00000200:43.0:1627998228.706167:0:2905:0:(lib-ptl.c:200:lnet_try_match_md()) Incoming put index 1a from 12345-10.43.10.24@tcp2 of length 224/224 into md 0x4a433cc9 [1] + 0 00000400:00000010:43.0:1627998228.706167:0:2905:0:(lib-lnet.h:295:lnet_me_free()) slab-freed 'me' at ffff96c0e179aa80. 00000400:00000200:43.0:1627998228.706168:0:2905:0:(lib-md.c:65:lnet_md_unlink()) Queueing unlink of md ffff96c0a439a990 00000400:00000200:43.0:1627998228.706169:0:2905:0:(lib-msg.c:918:lnet_is_health_check()) health check = 0, status = 0, hstatus = 0 00000100:00000001:43.0:1627998228.706169:0:2905:0:(events.c:300:request_in_callback()) Process entered 00000100:00000200:43.0:1627998228.706169:0:2905:0:(events.c:310:request_in_callback()) event type 2, status 0, service mgs 00000100:00000040:43.0:1627998228.706170:0:2905:0:(events.c:353:request_in_callback()) incoming req@ffff96bf0fa13850 x1705941104947072 msgsize 224 00000100:00100000:43.0:1627998228.706171:0:2905:0:(events.c:356:request_in_callback()) peer: 12345-10.43.10.24@tcp2 (source: 12345-10.43.10.24@tcp2) 00000100:00000040:43.0:1627998228.706172:0:2905:0:(events.c:365:request_in_callback()) Buffer complete: 62 buffers still posted 00000100:00000001:43.0:1627998228.706173:0:2905:0:(events.c:389:request_in_callback()) Process leaving 00000400:00000200:43.0:1627998228.706174:0:2905:0:(lib-md.c:69:lnet_md_unlink()) Unlinking md ffff96c0a439a990 00000400:00000010:43.0:1627998228.706174:0:2905:0:(lib-lnet.h:270:lnet_md_free()) slab-freed 'md' at ffff96c0a439a990. 00000400:00000010:43.0:1627998228.706175:0:2905:0:(lib-lnet.h:500:lnet_msg_free()) kfreed 'msg': 440 at ffff96c10fb88e00 (tot 357391084). But I'll provide full logs if I can upload them somewhere? |
| Comment by Matt Rásó-Barnett (Inactive) [ 19/Aug/21 ] |
|
Hi Mikhail, just wondering if you have any time to take another look at this for me? I'm still unable to add these OSTs to the filesystem currently and not sure what else to try right now. I can provide the full logs to the previous comment if there is an FTP server I can upload them to? |
| Comment by DELBARY Gael [ 13/Sep/21 ] |
|
Hi Matt, I don't know if you have finally figure out your issue. On a custom Lustre version (2.12.6 + some patchs) we hit a strange issue, sad lustre file configurations and we were enable to start the fs. The root_cause was a sad patch (outside LTS) but the behavior was not clean (in ldiskfs params files have 0 in link count !, lustre complains in config corruption). Even after clean write conf, no way to have a working fs. When I have debug a bit, I have seen that some few clients (~13000) did many noise. Normally in write conf procedure, you have to unmount and remount all clients (+targets) but I was surprised MDT/OST configs were in "bad state". Anyway to fix it, I had to stop all clients and all became good. Write conf procedure completes successfully and after that clients mounted without any issue (no either corruption in config files). If you can, you could test this (just in case) but it is not a seemless solution... |
| Comment by Matt Rásó-Barnett (Inactive) [ 22/Sep/21 ] |
|
Hi Gael, that's interesting thanks for sharing. Unfortunately I am still stuck on this issue somewhat, although the impact has lessened a little bit. The current status is that I have a single OST that fails to mount and join the filesystem, the one mentioned above that should be index 73. The only new development since my last messages is that there were 5 other OSTs that weren't mounting previously, but have now mounted successfully and joined the filesystem - index 74 to 79. In the last month we had a lot of unrelated LNET routing related issues over in https://jira.whamcloud.com/browse/LU-14987 and as part of that, we did in fact restart a large number of both the clients and also all the servers in the filesystems. It was after these restarts that I noticed the above OSTs 74 to 79 that were finally able to mount and join the filesystem. Unfortunately that still leaves OST 73, which I see the exact same errors with as before. I will generate new debug logs for this OST on both OSS and MDS and try to share as before. |
| Comment by Matt Rásó-Barnett (Inactive) [ 22/Sep/21 ] |
|
Actually I didn't notice this in my previous logs, but I note now the failing call to 'mgs_revoke_lock()' after attempting to start the missing OST 73 now: Sep 22 15:46:53 rds-mds7 kernel: LustreError: 140-5: Server rds-d6-OST0049 requested index 73, but that index is already in use. Use --writeconf to force Sep 22 15:46:53 rds-mds7 kernel: LustreError: 5636:0:(mgs_handler.c:526:mgs_target_reg()) Failed to write rds-d6-OST0049 log (-98) Sep 22 15:47:04 rds-mds7 kernel: Lustre: 5636:0:(client.c:2169:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1632322013/real 1632322013] req@ffff962705810000 x1710915848718400/t0(0) o104->MGS@10.47.2.177@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1632322024 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Sep 22 15:47:04 rds-mds7 kernel: Lustre: 5636:0:(client.c:2169:ptlrpc_expire_one_request()) Skipped 1 previous similar message Sep 22 15:47:07 rds-mds7 kernel: Lustre: rds-d6-MDT0002: haven't heard from client df0f1789-b047-52d8-233e-98e43aab8799 (at 10.47.2.177@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff95f07ed38c00, cur 1632322027 expire 1632321877 last 1632321800 Sep 22 15:47:07 rds-mds7 kernel: Lustre: Skipped 4 previous similar messages Sep 22 15:47:07 rds-mds7 kernel: LustreError: 5636:0:(mgs_handler.c:282:mgs_revoke_lock()) MGS: can't take cfg lock for 0x36642d736472/0x0 : rc = -11 Not sure if this last message is of any significance here? |
| Comment by Etienne Aujames [ 02/Oct/21 ] |
Server rds-d6-OST0049 requested index 73, but that index is already in use. Use --writeconf to force This line is displayed when a newly target (formatted) is firstly mounted but the requested target index is already allocated for an another target in MGS configuration. You can check the llog configuration on the MGS to verify if you see your OST with (for example): lctl --device MGS llog_print rds-d6-client Then you could try clean your MGS configuration with "lctl clear_conf" Then you could try to replace the existing OST target in configuration with your new one : "mkfs.lustre --replace" |
| Comment by Gerrit Updater [ 19/Nov/21 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45626 |
| Comment by Gerrit Updater [ 31/Jan/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45626/ |
| Comment by Peter Jones [ 31/Jan/22 ] |
|
Landed for 2.15 |