[LU-15000] MDS crashes with (osp_dev.c:1404:osp_obd_connect()) ASSERTION( osp->opd_connects == 1 ) failed Created: 10/Sep/21 Updated: 01/May/23 Resolved: 30/May/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.7 |
| Fix Version/s: | Lustre 2.16.0, Lustre 2.15.3 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Stephane Thiell | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS 7.9 |
||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
I previously reported some issues when adding OSTs in LU-14695, and I also guess we got hit by We have 300+ OSTs. After adding new OSTs, the first ones succeed, but after 3 or 4 new OSTs, all MDTs crash with the following backtrace (you can see the new OST being initialized on an OSS and then a MDS crash: Sep 09 07:59:33 oak-io6-s2 kernel: md/raid:md45: raid level 6 active with 10 out of 10 devices, algorithm 2 Sep 09 07:59:33 oak-io6-s2 kernel: md45: detected capacity change from 0 to 112003075014656 Sep 09 07:59:33 oak-io6-s2 kernel: LDISKFS-fs (md45): file extents enabled, maximum tree depth=5 Sep 09 07:59:34 oak-io6-s2 kernel: LDISKFS-fs (md45): mounted filesystem with ordered data mode. Opts: errors=remount-ro Sep 09 07:59:34 oak-io6-s2 kernel: LDISKFS-fs (md45): file extents enabled, maximum tree depth=5 Sep 09 07:59:34 oak-io6-s2 kernel: LDISKFS-fs (md45): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc Sep 09 07:59:38 oak-io6-s2 kernel: Lustre: oak-OST013d: new disk, initializing Sep 09 07:59:38 oak-io6-s2 kernel: Lustre: srv-oak-OST013d: No data found on store. Initialize space Sep 09 07:59:38 oak-io6-s2 kernel: Lustre: oak-OST013d: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Sep 09 07:59:43 oak-md1-s1 kernel: LustreError: 24251:0:(osp_dev.c:1404:osp_obd_connect()) ASSERTION( osp->opd_connects == 1 ) failed: Sep 09 07:59:43 oak-md1-s1 kernel: LustreError: 24251:0:(osp_dev.c:1404:osp_obd_connect()) LBUG Sep 09 07:59:43 oak-md1-s1 kernel: Pid: 24251, comm: llog_process_th 3.10.0-1160.6.1.el7_lustre.pl1.x86_64 #1 SMP Mon Dec 14 21:25:04 PST 2020 Sep 09 07:59:43 oak-md1-s1 kernel: Call Trace: Sep 09 07:59:43 oak-md1-s1 kernel: [<ffffffffc0cac7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] Sep 09 07:59:43 oak-md1-s1 kernel: [<ffffffffc0cac87c>] lbug_with_loc+0x4c/0xa0 [libcfs] Sep 09 07:59:43 oak-md1-s1 kernel: [<ffffffffc1a2beb6>] osp_obd_connect+0x3c6/0x400 [osp] Sep 09 07:59:43 oak-md1-s1 kernel: [<ffffffffc194b04e>] lod_add_device+0xa8e/0x19a0 [lod] Sep 09 07:59:43 oak-md1-s1 kernel: [<ffffffffc1946895>] lod_process_config+0x13b5/0x1510 [lod] Sep 09 07:59:43 oak-md1-s1 kernel: [<ffffffffc0ef18b2>] class_process_config+0x2142/0x2830 [obdclass] Sep 09 07:59:43 oak-md1-s1 kernel: [<ffffffffc0ef3b79>] class_config_llog_handler+0x819/0x1520 [obdclass] Sep 09 07:59:43 oak-md1-s1 kernel: [<ffffffffc0eb65af>] llog_process_thread+0x85f/0x1a10 [obdclass] Sep 09 07:59:43 oak-md1-s1 kernel: [<ffffffffc0eb8174>] llog_process_thread_daemonize+0xa4/0xe0 [obdclass] Sep 09 07:59:43 oak-md1-s1 kernel: [<ffffffffa62c5c21>] kthread+0xd1/0xe0 Sep 09 07:59:43 oak-md1-s1 kernel: [<ffffffffa6994ddd>] ret_from_fork_nospec_begin+0x7/0x21 Sep 09 07:59:43 oak-md1-s1 kernel: [<ffffffffffffffff>] 0xffffffffffffffff This is another example when we can see llog errors, for OSTs that actually are added (they are properly registered on the MGS), before the crash of the MDS: Sep 9 08:21:17 oak-md1-s1 kernel: Lustre: oak-MDT0001: Client 59a1cc5a-bdc6-102a-1ccb-b551d993bacc (at 10.51.3.32@o2ib3) reconnecting Sep 9 08:21:18 oak-md1-s1 kernel: Lustre: oak-MDT0001: Client ea26edda-b4c7-1625-cd1a-c65d4487d437 (at 10.51.3.68@o2ib3) reconnecting Sep 9 08:21:18 oak-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Sep 9 08:21:20 oak-md1-s1 kernel: Lustre: oak-MDT0001: Client 38d27508-3ea0-383f-8501-005bb0842ebd (at 10.51.4.12@o2ib3) reconnecting Sep 9 08:21:20 oak-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Sep 9 08:21:35 oak-md1-s1 kernel: Lustre: 7555:0:(obd_config.c:1641:class_config_llog_handler()) Skip config outside markers, (inst: 0000000000000000, uuid: , flags: 0x0) Sep 9 08:21:35 oak-md1-s1 kernel: LustreError: 7555:0:(genops.c:556:class_register_device()) oak-OST0144-osc-MDT0001: already exists, won't add Sep 9 08:21:35 oak-md1-s1 kernel: LustreError: 7555:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.0.2.51@o2ib5: cfg command failed: rc = -17 Sep 9 08:21:35 oak-md1-s1 kernel: Lustre: cmd=cf001 0:oak-OST0144-osc-MDT0001 1:osp 2:oak-MDT0001-mdtlov_UUID Sep 9 08:21:35 oak-md1-s1 kernel: LustreError: 3775:0:(mgc_request.c:599:do_requeue()) failed processing log: -17 Sep 9 08:21:35 oak-md1-s1 kernel: Lustre: cmd=cf001 0:oak-OST0144-osc-MDT0002 1:osp 2:oak-MDT0002-mdtlov_UUID Sep 9 08:21:44 oak-md1-s1 kernel: LustreError: 7569:0:(obd_config.c:461:class_setup()) Device 635 already setup (type osp) Sep 9 08:21:44 oak-md1-s1 kernel: LustreError: 7569:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.0.2.51@o2ib5: cfg command failed: rc = -17 Sep 9 08:21:44 oak-md1-s1 kernel: LustreError: 7569:0:(obd_config.c:1835:class_config_llog_handler()) Skipped 1 previous similar message Sep 9 08:21:44 oak-md1-s1 kernel: Lustre: cmd=cf003 0:oak-OST0144-osc-MDT0001 1:oak-OST0144_UUID 2:10.0.2.103@o2ib5 Sep 9 08:21:44 oak-md1-s1 kernel: LustreError: 3775:0:(mgc_request.c:599:do_requeue()) failed processing log: -17 Sep 9 08:21:44 oak-md1-s1 kernel: LustreError: 3775:0:(mgc_request.c:599:do_requeue()) Skipped 1 previous similar message Sep 9 08:21:48 oak-md1-s1 kernel: LustreError: 7571:0:(obd_config.c:461:class_setup()) Device 316 already setup (type osp) Sep 9 08:21:48 oak-md1-s1 kernel: LustreError: 7571:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.0.2.51@o2ib5: cfg command failed: rc = -17 Sep 9 08:21:48 oak-md1-s1 kernel: Lustre: cmd=cf003 0:oak-OST0144-osc-MDT0002 1:oak-OST0144_UUID 2:10.0.2.103@o2ib5 Sep 9 08:21:48 oak-md1-s1 kernel: LustreError: 3775:0:(mgc_request.c:599:do_requeue()) failed processing log: -17 Sep 9 08:21:55 oak-md1-s1 kernel: LustreError: 7574:0:(osp_dev.c:1404:osp_obd_connect()) ASSERTION( osp->opd_connects == 1 ) failed: Sep 9 08:21:55 oak-md1-s1 kernel: LustreError: 7574:0:(osp_dev.c:1404:osp_obd_connect()) LBUG Sep 9 08:21:55 oak-md1-s1 kernel: Pid: 7574, comm: llog_process_th 3.10.0-1160.6.1.el7_lustre.pl1.x86_64 #1 SMP Mon Dec 14 21:25:04 PST 2020 Sep 9 08:21:56 oak-md1-s1 kernel: Call Trace: Sep 9 08:21:56 oak-md1-s1 kernel: [<ffffffffc0b057cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] Sep 9 08:21:56 oak-md1-s1 kernel: [<ffffffffc0b0587c>] lbug_with_loc+0x4c/0xa0 [libcfs] Sep 9 08:21:56 oak-md1-s1 kernel: [<ffffffffc18baeb6>] osp_obd_connect+0x3c6/0x400 [osp] Sep 9 08:21:56 oak-md1-s1 kernel: [<ffffffffc17da04e>] lod_add_device+0xa8e/0x19a0 [lod] Sep 9 08:21:56 oak-md1-s1 kernel: [<ffffffffc17d5895>] lod_process_config+0x13b5/0x1510 [lod] Sep 9 08:21:56 oak-md1-s1 kernel: [<ffffffffc0d4f8b2>] class_process_config+0x2142/0x2830 [obdclass] Sep 9 08:21:56 oak-md1-s1 kernel: [<ffffffffc0d51b79>] class_config_llog_handler+0x819/0x1520 [obdclass] Sep 9 08:21:56 oak-md1-s1 kernel: [<ffffffffc0d145af>] llog_process_thread+0x85f/0x1a10 [obdclass] Sep 9 08:21:56 oak-md1-s1 kernel: [<ffffffffc0d16174>] llog_process_thread_daemonize+0xa4/0xe0 [obdclass] Sep 9 08:21:56 oak-md1-s1 kernel: [<ffffffff9dcc5c21>] kthread+0xd1/0xe0 Sep 9 08:21:56 oak-md1-s1 kernel: [<ffffffff9e394ddd>] ret_from_fork_nospec_begin+0x7/0x21 Sep 9 08:21:56 oak-md1-s1 kernel: [<ffffffffffffffff>] 0xffffffffffffffff In the trace below, we can see that the MDS crashes are happening at the same time: oak-io6-s2: Sep 09 08:37:59 oak-io6-s2 kernel: md/raid:md53: device dm-494 operational as raid disk 1 oak-io6-s2: Sep 09 08:37:59 oak-io6-s2 kernel: md/raid:md53: raid level 6 active with 10 out of 10 devices, algorithm 2 oak-io6-s2: Sep 09 08:37:59 oak-io6-s2 kernel: md53: detected capacity change from 0 to 112003075014656 oak-io6-s2: Sep 09 08:38:00 oak-io6-s2 kernel: LDISKFS-fs (md53): file extents enabled, maximum tree depth=5 oak-io6-s2: Sep 09 08:38:00 oak-io6-s2 kernel: LDISKFS-fs (md53): mounted filesystem with ordered data mode. Opts: errors=remount-ro oak-io6-s2: Sep 09 08:38:01 oak-io6-s2 kernel: LDISKFS-fs (md53): file extents enabled, maximum tree depth=5 oak-io6-s2: Sep 09 08:38:01 oak-io6-s2 kernel: LDISKFS-fs (md53): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc oak-io6-s2: Sep 09 08:38:06 oak-io6-s2 kernel: Lustre: oak-OST0145: new disk, initializing oak-io6-s2: Sep 09 08:38:06 oak-io6-s2 kernel: Lustre: srv-oak-OST0145: No data found on store. Initialize space oak-io6-s2: Sep 09 08:38:06 oak-io6-s2 kernel: Lustre: oak-OST0145: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel: LustreError: 7744:0:(osp_dev.c:1404:osp_obd_connect()) ASSERTION( osp->opd_connects == 1 ) failed: oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel: LustreError: 7744:0:(osp_dev.c:1404:osp_obd_connect()) LBUG oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel: Pid: 7744, comm: llog_process_th 3.10.0-1160.6.1.el7_lustre.pl1.x86_64 #1 SMP Mon Dec 14 21:25:04 PST 2020 oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel: LustreError: 5232:0:(osp_dev.c:1404:osp_obd_connect()) ASSERTION( osp->opd_connects == 1 ) failed: oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel: LustreError: 5232:0:(osp_dev.c:1404:osp_obd_connect()) LBUG oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel: Pid: 5232, comm: llog_process_th 3.10.0-1160.6.1.el7_lustre.pl1.x86_64 #1 SMP Mon Dec 14 21:25:04 PST 2020 oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel: Call Trace: oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel: [<ffffffffc097b7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel: Call Trace: oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel: [<ffffffffc097b87c>] lbug_with_loc+0x4c/0xa0 [libcfs] oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel: [<ffffffffc06667cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel: [<ffffffffc066687c>] lbug_with_loc+0x4c/0xa0 [libcfs] oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel: [<ffffffffc172deb6>] osp_obd_connect+0x3c6/0x400 [osp] oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel: [<ffffffffc1311eb6>] osp_obd_connect+0x3c6/0x400 [osp] oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel: [<ffffffffc1bd404e>] lod_add_device+0xa8e/0x19a0 [lod] oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel: [<ffffffffc164d04e>] lod_add_device+0xa8e/0x19a0 [lod] oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel: [<ffffffffc1648895>] lod_process_config+0x13b5/0x1510 [lod] oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel: [<ffffffffc1bcf895>] lod_process_config+0x13b5/0x1510 [lod] oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel: [<ffffffffc13df8b2>] class_process_config+0x2142/0x2830 [obdclass] oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel: [<ffffffffc0c3a8b2>] class_process_config+0x2142/0x2830 [obdclass] oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel: [<ffffffffc0c3cb79>] class_config_llog_handler+0x819/0x1520 [obdclass] oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel: [<ffffffffc13e1b79>] class_config_llog_handler+0x819/0x1520 [obdclass] oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel: [<ffffffffc13a45af>] llog_process_thread+0x85f/0x1a10 [obdclass] oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel: [<ffffffffc0bff5af>] llog_process_thread+0x85f/0x1a10 [obdclass] oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel: [<ffffffffc0c01174>] llog_process_thread_daemonize+0xa4/0xe0 [obdclass] oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel: [<ffffffffc13a6174>] llog_process_thread_daemonize+0xa4/0xe0 [obdclass] oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel: [<ffffffff9bec5c21>] kthread+0xd1/0xe0 oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel: [<ffffffffbd8c5c21>] kthread+0xd1/0xe0 oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel: [<ffffffffbdf94ddd>] ret_from_fork_nospec_begin+0x7/0x21 oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel: [<ffffffff9c594ddd>] ret_from_fork_nospec_begin+0x7/0x21 oak-md2-s2: Sep 09 08:38:11 oak-md2-s2 kernel: [<ffffffffffffffff>] 0xffffffffffffffff oak-md1-s2: Sep 09 08:38:11 oak-md1-s2 kernel: [<ffffffffffffffff>] 0xffffffffffffffff oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel: LustreError: 5224:0:(osp_dev.c:1404:osp_obd_connect()) ASSERTION( osp->opd_connects == 1 ) failed: oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel: LustreError: 5224:0:(osp_dev.c:1404:osp_obd_connect()) LBUG oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel: Pid: 5224, comm: llog_process_th 3.10.0-1160.6.1.el7_lustre.pl1.x86_64 #1 SMP Mon Dec 14 21:25:04 PST 2020 oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel: Call Trace: oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel: [<ffffffffc09f07cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel: [<ffffffffc09f087c>] lbug_with_loc+0x4c/0xa0 [libcfs] oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel: [<ffffffffc1b38eb6>] osp_obd_connect+0x3c6/0x400 [osp] oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel: [<ffffffffc1a5804e>] lod_add_device+0xa8e/0x19a0 [lod] oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel: [<ffffffffc1a53895>] lod_process_config+0x13b5/0x1510 [lod] oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel: [<ffffffffc13228b2>] class_process_config+0x2142/0x2830 [obdclass] oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel: [<ffffffffc1324b79>] class_config_llog_handler+0x819/0x1520 [obdclass] oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel: [<ffffffffc12e75af>] llog_process_thread+0x85f/0x1a10 [obdclass] oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel: [<ffffffffc12e9174>] llog_process_thread_daemonize+0xa4/0xe0 [obdclass] oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel: [<ffffffffae4c5c21>] kthread+0xd1/0xe0 oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel: [<ffffffffaeb94ddd>] ret_from_fork_nospec_begin+0x7/0x21 oak-md2-s1: Sep 09 08:38:12 oak-md2-s1 kernel: [<ffffffffffffffff>] 0xffffffffffffffff After rebooting the crashed MDS, the last added OSTs are working fine (after MDS recovery, they allocate their "super-sequence" and are then operational. But every time I add more OSTs to the filesystem, all MDS crash with this backtrace. This looks similar to I'm attaching the output of lctl --device MGS llog_print <FILE> as <FILE>.llog.txt for FILE = oak-client and oak-MDT0000 to oak-MDT0005 |
| Comments |
| Comment by Peter Jones [ 10/Sep/21 ] |
|
Mike It looks like some work was started but not completed relating to this area under Peter |
| Comment by Mikhail Pershin [ 14/Sep/21 ] |
|
I've just updated patch under |
| Comment by Stephane Thiell [ 14/Sep/21 ] |
|
Thank you! |
| Comment by Stephane Thiell [ 23/Sep/21 ] |
|
Hi Mike... so I have some update! This is the log from the MGS when we started new OST index 331 (oak-OST014b): 00000020:00000400:7.0:1632410256.389569:0:127696:0:(obd_config.c:1641:class_config_llog_handler()) Skip config outside markers, (inst: 0000000000000000, uuid: , flags: 0x4) 00000020:00020000:19.0:1632410256.389727:0:127696:0:(genops.c:556:class_register_device()) oak-OST0141-osc-MDT0001: already exists, won't add 00000020:00020000:19.0:1632410256.402255:0:127696:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.0.2.51@o2ib5: cfg command failed: rc = -17 00000020:02000400:19.0:1632410256.415459:0:127696:0:(obd_config.c:2068:class_config_dump_handler()) cmd=cf001 0:oak-OST0141-osc-MDT0001 1:osp 2:oak-MDT0001-mdtlov_UUID 10000000:00020000:9.0:1632410256.426947:0:3915:0:(mgc_request.c:599:do_requeue()) failed processing log: -17 00000020:00000400:14.0F:1632410256.437628:0:127698:0:(obd_config.c:1641:class_config_llog_handler()) Skip config outside markers, (inst: 0000000000000000, uuid: , flags: 0x0) 00000020:00000400:14.0:1632410256.437631:0:127698:0:(obd_config.c:1641:class_config_llog_handler()) Skip config outside markers, (inst: 0000000000000000, uuid: , flags: 0x4) 00000020:00020000:14.0:1632410256.437734:0:127698:0:(genops.c:556:class_register_device()) oak-OST0141-osc-MDT0002: already exists, won't add 00000020:00020000:14.0:1632410256.437736:0:127698:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.0.2.51@o2ib5: cfg command failed: rc = -17 00000020:02000400:14.0:1632410256.437739:0:127698:0:(obd_config.c:2068:class_config_dump_handler()) cmd=cf001 0:oak-OST0141-osc-MDT0002 1:osp 2:oak-MDT0002-mdtlov_UUID 10000000:00020000:21.0:1632410256.449208:0:3915:0:(mgc_request.c:599:do_requeue()) failed processing log: -17 New OST is not visible from MDTs (no logs) and not shown by lctl dl: # clush -w@mds 'lctl dl | grep oak-OST014b' clush: oak-md2-s2: exited with exit code 1 clush: oak-md1-s1: exited with exit code 1 clush: oak-md2-s1: exited with exit code 1 clush: oak-md1-s2: exited with exit code 1 New OST is visible from clients but not filling up (which makes sense): [root@oak-h01v19 ~]# lfs df /oak | grep OST:331 oak-OST014b_UUID 108461852548 1868 107368054268 1% /oak[OST:331] System is still running fine so there is that, no crash. But now the problem is that the config llogs seem corrupt so I'm worried we will need to restore them from backups: llog_reader oak-MDT0000 rec #1 type=10620000 len=224 offset 8192 rec #2 type=10620000 len=128 offset 8416 rec #3 type=10620000 len=176 offset 8544 ...<skip>... rec #2943 type=10620000 len=112 offset 435072 rec #2944 type=10620000 len=136 offset 435184 rec #2945 type=10620000 len=224 offset 435320 off 435544 skip 6824 to next chunk. Previous index is 2945, current 0, offset 435544 The log is corrupt (too big at 2826) llog_reader: Could not pack buffer.: Invalid argument (22) same for CONFIGS/oak-MDT0001 to 5 on the MGS. If I run llog_print on MGS, we can see the new OST (which explains why clients can see it): [root@oak-md1-s1 CONFIGS]# lctl --device MGS llog_print oak-client | grep oak-OST014b
- { index: 2879, event: attach, device: oak-OST014b-osc, type: osc, UUID: oak-clilov_UUID }
- { index: 2880, event: setup, device: oak-OST014b-osc, UUID: oak-OST014b_UUID, node: 10.0.2.104@o2ib5 }
- { index: 2882, event: add_conn, device: oak-OST014b-osc, node: 10.0.2.103@o2ib5 }
- { index: 2883, event: add_osc, device: oak-clilov, ost: oak-OST014b_UUID, index: 331, gen: 1 }
But it seems to be in memory only as it is not to be found in the MGS oak-client file: [root@oak-md1-s1 CONFIGS]# llog_reader oak-client | grep 014b [root@oak-md1-s1 CONFIGS]# I'm attaching oak-MDT0000 before (ok but without OST 331) as oak-MDT0000.backup-20210923 |
| Comment by Stephane Thiell [ 23/Sep/21 ] |
|
Hi Mike, [root@oak-md1-s1 ~]# debugfs -c -R 'dump CONFIGS/oak-MDT0000 /tmp/oak-MDT0000' /dev/mapper/md1-rbod1-mgt debugfs 1.45.6.wc5 (09-Feb-2021) /dev/mapper/md1-rbod1-mgt: catastrophic mode - not reading inode or group bitmaps [root@oak-md1-s1 ~]# llog_reader /tmp/oak-MDT0000 | grep 014b #2946 (224)marker 6183 (flags=0x01, v2.12.7.0) oak-OST014b 'add osc' Thu Sep 23 08:15:05 2021- #2948 (128)attach 0:oak-OST014b-osc-MDT0000 1:osc 2:oak-MDT0000-mdtlov_UUID #2949 (144)setup 0:oak-OST014b-osc-MDT0000 1:oak-OST014b_UUID 2:10.0.2.104@o2ib5 #2951 (112)add_conn 0:oak-OST014b-osc-MDT0000 1:10.0.2.103@o2ib5 #2952 (136)lov_modify_tgts add 0:oak-MDT0000-mdtlov 1:oak-OST014b_UUID 2:331 3:1 #2953 (224)END marker 6183 (flags=0x02, v2.12.7.0) oak-OST014b 'add osc' Thu Sep 23 08:15:05 2021- And the OST has been properly added to it, this is a diff of before/after for oak-MDT0000: --- /tmp/before 2021-09-23 12:59:43.058307733 -0700 +++ /tmp/after 2021-09-23 13:00:15.115469245 -0700 @@ -2825,9 +2825,17 @@ rec #2943 type=10620000 len=112 offset 435072 rec #2944 type=10620000 len=136 offset 435184 rec #2945 type=10620000 len=224 offset 435320 +rec #2946 type=10620000 len=224 offset 435544 +rec #2947 type=10620000 len=88 offset 435768 +rec #2948 type=10620000 len=128 offset 435856 +rec #2949 type=10620000 len=144 offset 435984 +rec #2950 type=10620000 len=88 offset 436128 +rec #2951 type=10620000 len=112 offset 436216 +rec #2952 type=10620000 len=136 offset 436328 +rec #2953 type=10620000 len=224 offset 436464 Header size : 8192 llh_size : 80 Time : Mon Feb 13 12:37:27 2017 -Number of records: 2827 cat_idx: 0 last_idx: 2945 +Number of records: 2835 cat_idx: 0 last_idx: 2953 Target uuid : config_uuid ----------------------- #01 (224)marker 2 (flags=0x01, v2.9.0.0) oak-MDT0000-mdtlov 'lov setup' Mon Feb 13 12:37:27 2017- @@ -5658,3 +5666,11 @@ #2943 (112)add_conn 0:oak-OST014a-osc-MDT0000 1:10.0.2.104@o2ib5 #2944 (136)lov_modify_tgts add 0:oak-MDT0000-mdtlov 1:oak-OST014a_UUID 2:330 3:1 #2945 (224)END marker 6175 (flags=0x02, v2.12.7.0) oak-OST014a 'add osc' Thu Sep 9 08:54:11 2021- +#2946 (224)marker 6183 (flags=0x01, v2.12.7.0) oak-OST014b 'add osc' Thu Sep 23 08:15:05 2021- +#2947 (088)add_uuid nid=10.0.2.104@o2ib5(0x500050a000268) 0: 1:10.0.2.104@o2ib5 +#2948 (128)attach 0:oak-OST014b-osc-MDT0000 1:osc 2:oak-MDT0000-mdtlov_UUID +#2949 (144)setup 0:oak-OST014b-osc-MDT0000 1:oak-OST014b_UUID 2:10.0.2.104@o2ib5 +#2950 (088)add_uuid nid=10.0.2.103@o2ib5(0x500050a000267) 0: 1:10.0.2.103@o2ib5 +#2951 (112)add_conn 0:oak-OST014b-osc-MDT0000 1:10.0.2.103@o2ib5 +#2952 (136)lov_modify_tgts add 0:oak-MDT0000-mdtlov 1:oak-OST014b_UUID 2:331 3:1 +#2953 (224)END marker 6183 (flags=0x02, v2.12.7.0) oak-OST014b 'add osc' Thu Sep 23 08:15:05 2021- So the main question remains, why are MDTs not aware of this new OST? How can I troubleshoot that? Thanks and sorry for the confusion. |
| Comment by Mikhail Pershin [ 27/Sep/21 ] |
|
Stephane, such problem can be result of |
| Comment by Etienne Aujames [ 27/Sep/21 ] |
|
Hello, |
| Comment by Stephane Thiell [ 27/Sep/21 ] |
|
Thank you both! We do have the patch from |
| Comment by Stephane Thiell [ 11/Oct/21 ] |
|
Hi Mike and Etienne, We still think that in our case, we have another problem when adding new OSTs. With the patch from --------------- oak-md1-s1 --------------- Oct 04 00:53:44 oak-md1-s1 kernel: Lustre: 6206:0:(obd_config.c:1641:class_config_llog_handler()) Skip config outside markers, (inst: 0000000000000000, uuid: , flags: 0x0) Oct 04 00:53:44 oak-md1-s1 kernel: LustreError: 6206:0:(genops.c:556:class_register_device()) oak-OST0142-osc-MDT0001: already exists, won't add --------------- oak-md1-s2 --------------- Oct 04 00:53:42 oak-md1-s2 kernel: Lustre: 6253:0:(obd_config.c:1641:class_config_llog_handler()) Skip config outside markers, (inst: 0000000000000000, uuid: , flags: 0x0) Oct 04 00:53:42 oak-md1-s2 kernel: LustreError: 6253:0:(genops.c:556:class_register_device()) oak-OST0142-osc-MDT0003: already exists, won't add --------------- oak-md2-s1 --------------- Oct 04 00:53:41 oak-md2-s1 kernel: Lustre: 5413:0:(obd_config.c:1641:class_config_llog_handler()) Skip config outside markers, (inst: 0000000000000000, uuid: , flags: 0x0) Oct 04 00:53:41 oak-md2-s1 kernel: LustreError: 5413:0:(genops.c:556:class_register_device()) oak-OST0145-osc-MDT0004: already exists, won't add --------------- oak-md2-s2 --------------- Oct 04 00:33:18 oak-md2-s2 kernel: Lustre: 48951:0:(obd_config.c:1641:class_config_llog_handler()) Skip config outside markers, (inst: 0000000000000000, uuid: , flags: 0x0) Oct 04 00:33:18 oak-md2-s2 kernel: LustreError: 48951:0:(genops.c:556:class_register_device()) oak-OST0145-osc-MDT0005: already exists, won't add Reminder: these OSTs are not the ones being added. When a new OST is added, the MGS is updated (and that is OK) revokes the config lock which triggers an update on clients (this is OK too), and also on MDTs, and class_config_llog_handler() where this message is produced is called with cfg->cfg_instance == NULL. In my understanding, cfg_instance should be defined so that new llog can be appended and so the MDTs can then add the new OST at the end without re-processing the full config. My current guess is that this works for a certain length of config llog (until OST0142 / OST0145 depending on the MDT, this is reproducible - do we reach a buffer limit here?), and when the second part of the config buffer is processed, we are losing the instance (and thus also cfg_last_idx?) so logs are processed at the beginning of the buffer which here happens to be OST0142 / OST0145 depending on the MDT as their config llog is not exactly the same. I've spent some time trying to look for a possible defect but I haven't found anything yet. Another thing I am suspecting is that it could have started when we removed a few OSTs: we previously disabled 12 OSTs on this system, we used the experimental del_ost which behaves like llog_cancel on specific indexes, and I noticed they are marked as EXCLUDE and not SKIP. So even a lctl clear_conf wouldn't help. This is an issue too if we want to definitely get rid of them. #28 (224)marker 323 (flags=0x01, v2.10.4.0) oak-MDT0001 'add osc(copied)' Thu Oct 18 11:45:30 2018- #29 (224)EXCLUDE START marker 324 (flags=0x11, v2.10.4.0) oak-OST0001 'add osc' Thu Oct 18 11:45:30 2018-Fri Mar 19 15:29:22 2021 #30 (088)add_uuid nid=10.0.2.102@o2ib5(0x500050a000266) 0: 1:10.0.2.102@o2ib5 #33 (088)add_uuid nid=10.0.2.101@o2ib5(0x500050a000265) 0: 1:10.0.2.101@o2ib5 #36 (224)END EXCLUDE END marker 324 (flags=0x12, v2.10.4.0) oak-OST0001 'add osc' Thu Oct 18 11:45:30 2018-Fri Mar 19 15:29:22 2021 To come back to our 'Skip config outside markers' issue, perhaps there is a llog buffer issue when EXCLUDE markers are previously used. I will try to reproduce to create a large config like this on a test system, but I wanted to share that with you in case you have some insights. Thanks! |
| Comment by Dominique Martinet [ 14/Feb/22 ] |
|
Hello, it's been a while but it's still a problem when adding new OSTs, I've taken a fresh look and after some efforts I was able to reproduce using the catalogs from the MGS on a dummy test filesystem, this should make it easy to get more infos out of it.
So first for the symptoms we were chasing, Stéphane was correct: the cld_cfg.cfg_last_idx for the oak-MDTxxxx catalogs are wrong on MDS, so old events are being processed when they already had been applied. My reproducing occurence had the following dmesg: [ 3486.702628] Lustre: 4657:0:(obd_config.c:1641:class_config_llog_handler()) Skip config outside markers, (inst: 0000000000000000, uuid: , flags: 0x0) [ 3486.710037] LustreError: 4657:0:(genops.c:556:class_register_device()) oak-OST0147-osc-MDT0002: already exists, won't add [ 3486.714672] LustreError: 4657:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.42.17.51@tcp: cfg command failed: rc = -17 [ 3486.719065] Lustre: cmd=cf001 0:oak-OST0147-osc-MDT0002 1:osp 2:oak-MDT0002-mdtlov_UUID [ 3486.722883] LustreError: 2907:0:(mgc_request.c:599:do_requeue()) failed processing log: -17 And with loglevels enabled we can see this in dk mgc_request.c:2058:mgc_process_log()) Process log oak-MDT0002-0000000000000000 from 3203
With the catalog as follow on mgs: ... last_idx: 3259 ... #3203 (144)setup 0:oak-OST0147-osc-MDT0002 1:oak-OST0147_UUID ... #3259 (last line) So: the last_idx here was 3202 when it should have been 3259 to start processing from 3260 and the discreptancy made the MDS reprocess oak-OST0147-osc-MDT0002's setup, leading to the error message, but the real problem is that last_idx doesn't match. This can also be confirmed with crash for manual inspection (I assume it got +1'd parsing that erroneous line):
crash> mod -s mgc
crash> list -H config_llog_list -s config_llog_data.cld_cfg,cld_logname -l config_llog_data3.cld_list_chain
...
ffff8bdcb6d3fa70
cld_cfg = {
cfg_instance = 0,
cfg_sb = 0xffff8bdc48767000,
cfg_uuid = {
uuid = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
},
cfg_callback = 0xffffffffc0949980 <class_config_llog_handler>,
cfg_last_idx = 3203,
cfg_flags = 2,
cfg_lwp_idx = 327,
cfg_sub_clds = 31
}
cld_logname = 0xffff8bdcb6d3fae5 "oak-MDT0002"
...
The previous fix was also targetting side effects, but what really needs fixing is the index so we don't reprocess events. Now, why is that index off? Here's what it looks like on MGT: ... rec #3258 type=10620000 len=88 offset 508600 rec #3259 type=10620000 len=224 offset 508688 Header size : 8192 llh_size : 80 Time : Thu Sep 26 05:36:36 2019 Number of records: 3132 cat_idx: 0 last_idx: 3259 Target uuid : config_uuid ----------------------- #01 (224)marker 2015 (flags=0x01, v2.10.8.0) oak-MDT0002-mdtlov 'lov setup' Thu Sep 26 05:36:36 2019- #02 (128)attach 0:oak-MDT0002-mdtlov 1:lov 2:oak-MDT0002-mdtlov_UUID #03 (176)lov_setup 0:oak-MDT0002-mdtlov 1:(struct lov_desc) uuid=oak-MDT0002-mdtlov_UUID stripe:cnt=1 size=1048576 offset=18446744073709551615 pattern=0x1 ... #18 (224)marker 2018 (flags=0x01, v2.10.8.0) oak-MDT0002 'add osc(copied)' Thu Sep 26 05:36:36 2019- #19 (224)EXCLUDE START marker 2019 (flags=0x11, v2.10.8.0) oak-OST0003 'add osc' Thu Sep 26 05:36:36 2019-Fri Mar 19 22:29:33 2021 #20 (088)add_uuid nid=10.0.2.102@o2ib5(0x500050a000266) 0: 1:10.42.17.114@tcp #23 (088)add_uuid nid=10.0.2.101@o2ib5(0x500050a000265) 0: 1:10.42.17.114@tcp #26 (224)END EXCLUDE END marker 2019 (flags=0x12, v2.10.8.0) oak-OST0003 'add osc' Thu Sep 26 05:36:36 2019-Fri Mar 19 22:29:33 2021 #27 (224)END marker 2019 (flags=0x02, v2.10.8.0) oak-MDT0002 'add osc(copied)' Thu Sep 26 05:36:36 2019- #28 (224)marker 2020 (flags=0x01, v2.10.8.0) oak-MDT0002 'add osc(copied)' Thu Sep 26 05:36:36 2019- #29 (224)EXCLUDE START marker 2021 (flags=0x11, v2.10.8.0) oak-OST0001 'add osc' Thu Sep 26 05:36:36 2019-Fri Mar 19 22:29:22 2021 ... #3172 (112)add_conn 0:oak-OST0142-osc-MDT0002 1:10.42.17.114@tcp #3173 (136)lov_modify_tgts add 0:oak-MDT0002-mdtlov 1:oak-OST0142_UUID 2:322 3:1 #3174 (224)END marker 6129 (flags=0x02, v2.12.7.0) oak-OST0142 'add osc' Thu Sep 9 15:37:12 2021- #3175 (224)marker 6137 (flags=0x01, v2.12.7.0) oak-OST0143 'add osc' Thu Sep 9 15:37:38 2021- #3176 (088)add_uuid nid=10.0.2.104@o2ib5(0x500050a000268) 0: 1:10.42.17.114@tcp #3177 (128)attach 0:oak-OST0143-osc-MDT0002 1:osc 2:oak-MDT0002-mdtlov_UUID #3178 (144)setup 0:oak-OST0143-osc-MDT0002 1:oak-OST0143_UUID 2:10.42.17.114@tcp #3179 (088)add_uuid nid=10.0.2.103@o2ib5(0x500050a000267) 0: 1:10.42.17.114@tcp #3180 (112)add_conn 0:oak-OST0143-osc-MDT0002 1:10.42.17.114@tcp #3181 (136)lov_modify_tgts add 0:oak-MDT0002-mdtlov 1:oak-OST0143_UUID 2:323 3:1 #3182 (224)END marker 6137 (flags=0x02, v2.12.7.0) oak-OST0143 'add osc' Thu Sep 9 15:37:38 2021- #3183 (224)marker 6145 (flags=0x01, v2.12.7.0) oak-OST0145 'add osc' Thu Sep 9 15:38:03 2021- #3184 (088)add_uuid nid=10.0.2.104@o2ib5(0x500050a000268) 0: 1:10.42.17.114@tcp #3185 (128)attach 0:oak-OST0145-osc-MDT0002 1:osc 2:oak-MDT0002-mdtlov_UUID #3186 (144)setup 0:oak-OST0145-osc-MDT0002 1:oak-OST0145_UUID 2:10.42.17.114@tcp #3187 (088)add_uuid nid=10.0.2.103@o2ib5(0x500050a000267) 0: 1:10.42.17.114@tcp #3188 (112)add_conn 0:oak-OST0145-osc-MDT0002 1:10.42.17.114@tcp #3189 (136)lov_modify_tgts add 0:oak-MDT0002-mdtlov 1:oak-OST0145_UUID 2:325 3:1 #3190 (224)END marker 6145 (flags=0x02, v2.12.7.0) oak-OST0145 'add osc' Thu Sep 9 15:38:03 2021- #3191 (224)marker 6153 (flags=0x01, v2.12.7.0) oak-OST0146 'add osc' Thu Sep 9 15:53:39 2021- ... #3240 (224)marker 6201 (flags=0x01, v2.12.7.0) oak-OST014d 'add osc' Mon Oct 4 08:10:30 2021- #3241 (088)add_uuid nid=10.0.2.104@o2ib5(0x500050a000268) 0: 1:10.42.17.114@tcp #3242 (128)attach 0:oak-OST014d-osc-MDT0002 1:osc 2:oak-MDT0002-mdtlov_UUID #3243 (144)setup 0:oak-OST014d-osc-MDT0002 1:oak-OST014d_UUID 2:10.42.17.114@tcp #3244 (088)add_uuid nid=10.0.2.103@o2ib5(0x500050a000267) 0: 1:10.42.17.114@tcp #3245 (112)add_conn 0:oak-OST014d-osc-MDT0002 1:10.42.17.114@tcp #3246 (136)lov_modify_tgts add 0:oak-MDT0002-mdtlov 1:oak-OST014d_UUID 2:333 3:1 #3247 (224)END marker 6201 (flags=0x02, v2.12.7.0) oak-OST014d 'add osc' Mon Oct 4 08:10:30 2021- #3248 (224)marker 6209 (flags=0x01, v2.12.7.0) oak-OST014e 'add osc' Mon Oct 4 08:13:45 2021- #3249 (088)add_uuid nid=10.0.2.103@o2ib5(0x500050a000267) 0: 1:10.42.17.114@tcp #3250 (128)attach 0:oak-OST014e-osc-MDT0002 1:osc 2:oak-MDT0002-mdtlov_UUID #3251 (144)setup 0:oak-OST014e-osc-MDT0002 1:oak-OST014e_UUID 2:10.42.17.114@tcp #3252 (088)add_uuid nid=10.0.2.104@o2ib5(0x500050a000268) 0: 1:10.42.17.114@tcp #3254 (112)add_conn 0:oak-OST014e-osc-MDT0002 1:10.42.17.114@tcp #3255 (136)lov_modify_tgts add 0:oak-MDT0002-mdtlov 1:oak-OST014e_UUID 2:334 3:1 #3256 (224)END marker 6209 (flags=0x02, v2.12.7.0) oak-OST014e 'add osc' Mon Oct 4 08:13:45 2021- #3257 (224)marker 6362 (flags=0x01, v2.12.7.0) oak 'quota.mdt' Thu Jan 20 21:23:29 2022- #3258 (088)param 0:oak 1:quota.mdt=gp #3259 (224)END marker 6362 (flags=0x02, v2.12.7.0) oak 'quota.mdt' Thu Jan 20 21:23:29 2022- And here's the MDT copy: rec #3188 type=10620000 len=224 offset 500632 rec #3189 type=10620000 len=224 offset 500856 rec #3190 type=10620000 len=88 offset 501080 rec #3191 type=10620000 len=224 offset 501168 Header size : 8192 llh_size : 80 Time : Fri Jan 21 18:36:56 2022 Number of records: 3132 cat_idx: 0 last_idx: 3191 Target uuid : ----------------------- #01 (224)marker 2015 (flags=0x01, v2.10.8.0) oak-MDT0002-mdtlov 'lov setup' Thu Sep 26 05:36:36 2019- #02 (128)attach 0:oak-MDT0002-mdtlov 1:lov 2:oak-MDT0002-mdtlov_UUID #03 (176)lov_setup 0:oak-MDT0002-mdtlov 1:(struct lov_desc) uuid=oak-MDT0002-mdtlov_UUID stripe:cnt=1 size=1048576 offset=18446744073709551615 pattern=0x1 ... #18 (224)marker 2018 (flags=0x01, v2.10.8.0) oak-MDT0002 'add osc(copied)' Thu Sep 26 05:36:36 2019- #19 (224)EXCLUDE START marker 2019 (flags=0x11, v2.10.8.0) oak-OST0003 'add osc' Thu Sep 26 05:36:36 2019-Fri Mar 19 22:29:33 2021 #20 (088)add_uuid nid=10.0.2.102@o2ib5(0x500050a000266) 0: 1:10.42.17.114@tcp #21 (088)add_uuid nid=10.0.2.101@o2ib5(0x500050a000265) 0: 1:10.42.17.114@tcp #22 (224)END EXCLUDE END marker 2019 (flags=0x12, v2.10.8.0) oak-OST0003 'add osc' Thu Sep 26 05:36:36 2019-Fri Mar 19 22:29:33 2021 #23 (224)END marker 2019 (flags=0x02, v2.10.8.0) oak-MDT0002 'add osc(copied)' Thu Sep 26 05:36:36 2019- #24 (224)marker 2020 (flags=0x01, v2.10.8.0) oak-MDT0002 'add osc(copied)' Thu Sep 26 05:36:36 2019- #25 (224)EXCLUDE START marker 2021 (flags=0x11, v2.10.8.0) oak-OST0001 'add osc' Thu Sep 26 05:36:36 2019-Fri Mar 19 22:29:22 2021 .. #3172 (224)marker 6201 (flags=0x01, v2.12.7.0) oak-OST014d 'add osc' Mon Oct 4 08:10:30 2021- #3173 (088)add_uuid nid=10.0.2.104@o2ib5(0x500050a000268) 0: 1:10.42.17.114@tcp #3174 (128)attach 0:oak-OST014d-osc-MDT0002 1:osc 2:oak-MDT0002-mdtlov_UUID #3175 (144)setup 0:oak-OST014d-osc-MDT0002 1:oak-OST014d_UUID 2:10.42.17.114@tcp #3176 (088)add_uuid nid=10.0.2.103@o2ib5(0x500050a000267) 0: 1:10.42.17.114@tcp #3177 (112)add_conn 0:oak-OST014d-osc-MDT0002 1:10.42.17.114@tcp #3178 (136)lov_modify_tgts add 0:oak-MDT0002-mdtlov 1:oak-OST014d_UUID 2:333 3:1 #3179 (224)END marker 6201 (flags=0x02, v2.12.7.0) oak-OST014d 'add osc' Mon Oct 4 08:10:30 2021- #3181 (224)marker 6209 (flags=0x01, v2.12.7.0) oak-OST014e 'add osc' Mon Oct 4 08:13:45 2021- #3182 (088)add_uuid nid=10.0.2.103@o2ib5(0x500050a000267) 0: 1:10.42.17.114@tcp #3183 (128)attach 0:oak-OST014e-osc-MDT0002 1:osc 2:oak-MDT0002-mdtlov_UUID #3184 (144)setup 0:oak-OST014e-osc-MDT0002 1:oak-OST014e_UUID 2:10.42.17.114@tcp #3185 (088)add_uuid nid=10.0.2.104@o2ib5(0x500050a000268) 0: 1:10.42.17.114@tcp #3186 (112)add_conn 0:oak-OST014e-osc-MDT0002 1:10.42.17.114@tcp #3187 (136)lov_modify_tgts add 0:oak-MDT0002-mdtlov 1:oak-OST014e_UUID 2:334 3:1 #3188 (224)END marker 6209 (flags=0x02, v2.12.7.0) oak-OST014e 'add osc' Mon Oct 4 08:13:45 2021- #3189 (224)marker 6362 (flags=0x01, v2.12.7.0) oak 'quota.mdt' Thu Jan 20 21:23:29 2022- #3190 (088)param 0:oak 1:quota.mdt=gp #3191 (224)END marker 6362 (flags=0x02, v2.12.7.0) oak 'quota.mdt' Thu Jan 20 21:23:29 2022- In particular:
From my understanding, the catalog should be identical on both nodes as the local copy will be used during mds start, so differences there will likely cause problems. It's also safe to say the difference is probably due to the EXCLUDE blocks as suspected from the start... So we now have two ways forward:
since exclude has never been merged I'd be tempted to put a rug on this and try to fix catalogs, but unfortunately I can't think of any way of fixing them nicely without having to shut down the whole filesystem (e.g. nuke the MGS and have targets reregister is most certainly not safe with thousands of clients mounted) Thanks! |
| Comment by Etienne Aujames [ 16/Feb/22 ] |
|
umounting/mounting the MDT target should sync the local configuration with the MGS if the MDT successfully get a lock on llog config ressource. If the MDT fail to sync the configuration the following message should appear (with debug=+mgc): Failed to get MGS log %s, using local copy for now, will try to update later. You can get more information from : https://review.whamcloud.com/40448/ (" If the MDT local configuration is different from the MGS configuration after the copy, it can mean that there is a bug in llog_backup(). |
| Comment by Dominique Martinet [ 16/Feb/22 ] |
|
Hi Etienne! Thanks for the reply. Yes, in my case the copy was successful, so there definitely is a bug in the copy as I pointed out. |
| Comment by Dominique Martinet [ 17/Feb/22 ] |
|
Well, it looks like we can have EXCLUDE entries, so it's probably worth a check anyway...
Unfortunately, the code doesn't let us modify an unset index, so we'd need to keep track of the last written index, compare with the one we want to write and create fake records and cancel these to have the index artificially match? FWIW I've kludged llog_reader to take a peek at what the "holes" are about and it's cancelled llogs from the exluded block, as well as padding log entries (which are also present on the copy at different offset when they are needed) – unfortunately llog_osd_pad is not exposed so we can't abuse it but we could just write entries with LLOG_PAD_MAGIC type empty or minimal size and immediately cancel it to unset the header bit...? Looking back I don't see any clean way of clearing the bit actually, it might not be so easy :| Anyway, ETIMEDOUT for today – will keep digging. Please speak up if you have better ideas than what I'm rambling about |
| Comment by Etienne Aujames [ 17/Feb/22 ] |
|
Hi Dominique, I think issue is that the mgc_llog_local_copy/llog_backup do not reproduce gap from the original config llog. The copy is done via llog_process that ignore cancel record (from bitmap llog header). So your original: #23 (088)add_uuid nid=10.0.2.101@o2ib5(0x500050a000265) 0: 1:10.42.17.114@tcp #26 (224)END EXCLUDE END marker 2019 (flags=0x12, v2.10.8.0) oak-OST0003 'add osc' Thu Sep 26 05:36:36 2019-Fri Mar 19 22:29:33 2021 is remplaced in local copy by: #21 (088)add_uuid nid=10.0.2.101@o2ib5(0x500050a000265) 0: 1:10.42.17.114@tcp #22 (224)END EXCLUDE END marker 2019 (flags=0x12, v2.10.8.0) oak-OST0003 'add osc' Thu Sep 26 05:36:36 2019-Fri Mar 19 22:29:33 2021 The copy don't keep the "holes" in the indexes created by "del_ost", that why you do not have the same cfg_last_idx. So if you try to remove the EXCLUDE record it will increase the index difference between local and original. This will not help. I am working on a patch to reproduce index gap on the local copy. |
| Comment by Gerrit Updater [ 17/Feb/22 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/46545 |
| Comment by Etienne Aujames [ 17/Feb/22 ] |
|
The patch above implement what you have described (without LLOG_PAD_MAGIC). But like you said this is not totaly accurate because of the chunk padding. |
| Comment by Stephane Thiell [ 17/Feb/22 ] |
|
Hi Etienne, |
| Comment by Dominique Martinet [ 17/Feb/22 ] |
|
Hi Etienne, thanks for the patch! That's pretty much what I had in mind, yes. This came back haunting me a bit while I sleep and yes it's not perfect because of padding: we can't control when each side will add their padding (because length differences), so the last index written might be 1 too big and we might have the opposite problem now (last index bigger means we'll skip one record we should have processed) So I've been thinking it's safer to keep the holes, and just append dummy small logs at the end until we get the right last index to fake it up. It's safer because the dummy logs being small would reduce the need to add padding, but unfortunately it's still not perfect because if it does require padding at the last dummy log appended we'll overrun the last index again I can't think of any safe copy mechanism that preserves indices right now as we don't know the side of skipped records northe offset within the changelog of the entry immediately after a hole, and unless we can reproduce that we won't be able to match the automatic padding. I plan to look further this way this weekend.
Stéphane, this is not a problem: clients don't have a copy of the MDT changelogs, so we can just restart the MDT and they'll get a new copy which will fix all our problems. If we want to play safe we can also backport the tunefs --nolocallogs patch, but even without it the servers will try to get a new copy and throw away the old one if copy worked, so it should work as far as I understand. That won't impact clients more than a normal failover, so if we can get this right this is a good solution. |
| Comment by Dominique Martinet [ 17/Feb/22 ] |
|
Hi again, sorry for the spam In mgc_llog_local_copy the MDS will make a local copy first with llog_backup(env, obd, lctxt, lctxt, logname, temp_log); (note they use lctxt twice which is LLOG_CONFIG_ORIG_CTXT). This process is identical and will get rid of any cancelled changelog, so if we just do the same on the MGS for any log that has holes when it starts we won't need to worry about these anymore. It's not perfect either though: we still have the problem while the MGS is running immediately after cancel events, so I guess we need a better time to do that.. asynchronously a while after any cancel event? hmm... Well, just a wild idea if we can't get copy right. I still intend on looking further this weekend. |
| Comment by Etienne Aujames [ 18/Feb/22 ] |
|
@Stephane Thiell the issue with correcting the MGS configs is that the target copies have last indexes lower that the original ones. Clients/targets are notified only for "new" records in mounted state, so they will ignore indexes between original and the copy. One solution could be to artificially generate rec after correcting MGS (with "lctl config_param <param>" and "lctl config_param -d <param>") until the correct configs index reach the old configs "last indexes". Then clients/targets should take into account new records (e.g adding new OSTs). |
| Comment by Gerrit Updater [ 18/Feb/22 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/46552 |
| Comment by Stephane Thiell [ 22/Feb/22 ] |
|
Hi Etienne, |
| Comment by Etienne Aujames [ 28/Feb/22 ] |
|
Hi Stephane, |
| Comment by Gerrit Updater [ 30/May/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46552/ |
| Comment by Peter Jones [ 30/May/22 ] |
|
Landed for 2.16 |
| Comment by Gerrit Updater [ 02/Jun/22 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/47515 |
| Comment by Stephane Thiell [ 24/Jun/22 ] |
|
@Peter Jones could you please consider this patch for inclusion in LTS releases (at least for 2.15)? |
| Comment by Gerrit Updater [ 17/Oct/22 ] |
|
"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48898 |
| Comment by Gerrit Updater [ 19/Apr/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48898/ |