[LU-14059] Changelogs not working after 2.10 to 2.12 upgrade Created: 21/Oct/20 Updated: 23/Jan/21 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Stephane Thiell | Assignee: | Hongchao Zhang |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre server 2.12.5_7.srcc (https://github.com/stanford-rc/lustre/commits/b2_12_5), changelog reader client: 2.12.5; OS Server: CentOS 7.6, client: CentOS 7.8 |
||
| Severity: | 2 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Following our upgrade of Oak from 2.10 to 2.12, changelogs are not working anymore on any of Oak's 4 MDTs. I tried to deregister/register a new reader (cl2) for each MDTs, but it's the same problem.
Worth noting: when starting MDTs, we can see the following warning messages: Oct 19 15:26:37 oak-md1-s1 kernel: Lustre: 15665:0:(mdd_device.c:545:mdd_changelog_llog_init()) oak-MDD0002 : orphan changelog records found, starting from index 0 to index 22264575363, being cleared now Oct 21 08:32:23 oak-md1-s1 kernel: Lustre: 41861:0:(mdd_device.c:545:mdd_changelog_llog_init()) oak-MDD0001 : orphan changelog records found, starting from index 0 to index 9422682426, being cleared now
Config for oak-MDT0001 (example, but it's the same with the 3 other MDTs): [root@oak-md1-s1 ~]# lctl get_param mdd.oak-MDT0001.changelog_* mdd.oak-MDT0001.changelog_deniednext=60 mdd.oak-MDT0001.changelog_gc=0 mdd.oak-MDT0001.changelog_max_idle_indexes=2097446912 mdd.oak-MDT0001.changelog_max_idle_time=2592000 mdd.oak-MDT0001.changelog_min_free_cat_entries=2 mdd.oak-MDT0001.changelog_min_gc_interval=3600 mdd.oak-MDT0001.changelog_size=4169832 mdd.oak-MDT0001.changelog_mask= CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO CLOSE LYOUT TRUNC SATTR HSM MTIME CTIME MIGRT FLRW RESYNC mdd.oak-MDT0001.changelog_users= current index: 9422682426 ID index (idle seconds) cl2 9422682426 (38908) We notice that the "current index" is not increasing, but the idle time does. Note that there is activity on the filesystem (it's in production). When trying to read changelogs from the client: [root@oak-rbh01 robinhood]# lfs changelog oak-MDT0001 [root@oak-rbh01 robinhood]# echo $? 0 When I tried to restart oak-MDT0001 this morning, it did the following: Oct 21 08:32:11 oak-md1-s1 kernel: Lustre: server umount oak-MDT0001 complete Oct 21 08:32:11 oak-md1-s1 kernel: LustreError: 137-5: oak-MDT0001_UUID: not available for connect from 10.210.12.6@tcp1 (no target). If you are running an HA pair check that the target is mounted on the other server. Oct 21 08:32:11 oak-md1-s1 kernel: LustreError: Skipped 28 previous similar messages Oct 21 08:32:12 oak-md1-s1 kernel: LustreError: 137-5: oak-MDT0001_UUID: not available for connect from 10.49.18.28@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Oct 21 08:32:12 oak-md1-s1 kernel: LustreError: Skipped 57 previous similar messages Oct 21 08:32:12 oak-md1-s1 kernel: LustreError: 18064:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.50.10.71@o2ib2 arrived at 1603294332 with bad export cookie 14346833159486330259 Oct 21 08:32:14 oak-md1-s1 kernel: LustreError: 137-5: oak-MDT0001_UUID: not available for connect from 10.50.4.55@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Oct 21 08:32:14 oak-md1-s1 kernel: LustreError: Skipped 126 previous similar messages Oct 21 08:32:17 oak-md1-s1 kernel: LustreError: 24771:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.210.12.119@tcp1 arrived at 1603294337 with bad export cookie 14346833159486337952 Oct 21 08:32:18 oak-md1-s1 kernel: LustreError: 137-5: oak-MDT0001_UUID: not available for connect from 10.0.3.1@o2ib5 (no target). If you are running an HA pair check that the target is mounted on the other server. Oct 21 08:32:18 oak-md1-s1 kernel: LustreError: Skipped 243 previous similar messages Oct 21 08:32:22 oak-md1-s1 kernel: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Oct 21 08:32:22 oak-md1-s1 kernel: Lustre: oak-MDT0001: Not available for connect from 10.49.30.7@o2ib1 (not set up) Oct 21 08:32:22 oak-md1-s1 kernel: Lustre: Skipped 42 previous similar messages Oct 21 08:32:23 oak-md1-s1 kernel: Lustre: oak-MDT0002: Connection restored to 2baf6034-0457-482e-32d4-2a55d4c43944 (at 0@lo) Oct 21 08:32:23 oak-md1-s1 kernel: Lustre: oak-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Oct 21 08:32:23 oak-md1-s1 kernel: Lustre: oak-MDD0001: changelog on Oct 21 08:32:23 oak-md1-s1 kernel: LustreError: 15560:0:(mgs_handler.c:282:mgs_revoke_lock()) MGS: can't take cfg lock for 0x6b616f/0x2 : rc = -11 Oct 21 08:32:23 oak-md1-s1 kernel: Lustre: 42675:0:(llog.c:615:llog_process_thread()) oak-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x181e9:0x2]record for index 0/6 Oct 21 08:32:23 oak-md1-s1 kernel: LustreError: 42675:0:(lod_dev.c:434:lod_sub_recovery_thread()) oak-MDT0000-osp-MDT0001 get update log failed: rc = -22 Oct 21 08:32:23 oak-md1-s1 kernel: Lustre: 41861:0:(mdd_device.c:545:mdd_changelog_llog_init()) oak-MDD0001 : orphan changelog records found, starting from index 0 to index 9422682426, being cleared now Oct 21 08:32:23 oak-md1-s1 kernel: Lustre: oak-MDT0001: in recovery but waiting for the first client to connect Oct 21 08:32:23 oak-md1-s1 kernel: Lustre: oak-MDT0001: Connection restored to 6b4880ff-499b-4 (at 10.50.3.30@o2ib2) Oct 21 08:32:23 oak-md1-s1 kernel: Lustre: 42678:0:(ldlm_lib.c:2073:target_recovery_overseer()) recovery is aborted, evict exports in recovery Oct 21 08:32:23 oak-md1-s1 kernel: Lustre: 42678:0:(ldlm_lib.c:2073:target_recovery_overseer()) Skipped 1 previous similar message Oct 21 08:32:23 oak-md1-s1 kernel: Lustre: oak-MDT0001: disconnecting 1636 stale clients Oct 21 08:32:23 oak-md1-s1 kernel: Lustre: oak-MDT0001: Denying connection for new client 4cc5cd3c-4881-4 (at 10.50.14.2@o2ib2), waiting for 1639 known clients (3 recovered, 0 in progress, and 819 evicted) already passed deadline 2476:54 Oct 21 08:32:23 oak-md1-s1 kernel: Lustre: Skipped 212 previous similar messages Oct 21 08:32:24 oak-md1-s1 kernel: Lustre: oak-MDT0001: nosquash_nids set to 10.49.0.[11-12]@o2ib1 10.50.0.1@o2ib2 10.50.0.[11-12]@o2ib2 10.50.1.[59-60]@o2ib2 10.51.0.[1-2]@o2ib3 10.51.0.[11-18]@o2ib3 10.0.2.[1-3]@o2ib5 10.0.2.[51-58]@o2i Oct 21 08:32:24 oak-md1-s1 kernel: Lustre: oak-MDT0001: root_squash is set to 99:99 Oct 21 08:32:24 oak-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Oct 21 08:32:24 oak-md1-s1 kernel: Lustre: oak-MDT0001: Connection restored to b7359770-1c3d-4 (at 10.50.1.48@o2ib2) Oct 21 08:32:24 oak-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Oct 21 08:32:26 oak-md1-s1 kernel: Lustre: oak-MDT0001: Connection restored to 10.0.2.104@o2ib5 (at 10.0.2.104@o2ib5) Oct 21 08:32:26 oak-md1-s1 kernel: Lustre: Skipped 124 previous similar messages Oct 21 08:32:29 oak-md1-s1 kernel: LustreError: 15695:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.210.12.115@tcp1 arrived at 1603294349 with bad export cookie 14346833159486332149 Oct 21 08:32:29 oak-md1-s1 kernel: LustreError: 15695:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) Skipped 1 previous similar message Oct 21 08:32:30 oak-md1-s1 kernel: Lustre: oak-MDT0001: Connection restored to 8f081fef-be09-4 (at 10.49.30.36@o2ib1) Oct 21 08:32:30 oak-md1-s1 kernel: Lustre: Skipped 840 previous similar messages Oct 21 08:32:58 oak-md1-s1 kernel: Lustre: oak-MDT0001: Connection restored to 10.0.2.117@o2ib5 (at 10.0.2.117@o2ib5) Oct 21 08:32:58 oak-md1-s1 kernel: Lustre: Skipped 683 previous similar messages Oct 21 08:33:13 oak-md1-s1 kernel: LustreError: 167-0: oak-MDT0001-osp-MDT0002: This client was evicted by oak-MDT0001; in progress operations using this service will fail. ... it's started but changelogs are still not working.
Note the: Oct 21 08:32:23 oak-md1-s1 kernel: Lustre: 42675:0:(llog.c:615:llog_process_thread()) oak-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x181e9:0x2]record for index 0/6 Any idea? |
| Comments |
| Comment by Peter Jones [ 21/Oct/20 ] |
|
Hongchao Can you please advise? Thanks Peter |
| Comment by Stephane Thiell [ 21/Oct/20 ] |
|
So to summarize this differently, changelogs are on but not recorded. The changelog_size doesn't increase, for example, for oak-MDT0001, it stays at mdd.oak-MDT0001.changelog_size=4169832. [root@oak-md1-s1 ~]# lctl get_param mdd.oak-MDT0001.changelog_* mdd.oak-MDT0001.changelog_deniednext=60 mdd.oak-MDT0001.changelog_gc=0 mdd.oak-MDT0001.changelog_max_idle_indexes=2097446912 mdd.oak-MDT0001.changelog_max_idle_time=2592000 mdd.oak-MDT0001.changelog_min_free_cat_entries=2 mdd.oak-MDT0001.changelog_min_gc_interval=3600 mdd.oak-MDT0001.changelog_size=4169832 mdd.oak-MDT0001.changelog_mask= CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO CLOSE LYOUT TRUNC SATTR HSM MTIME CTIME MIGRT FLRW RESYNC mdd.oak-MDT0001.changelog_users= current index: 9422682426 ID index (idle seconds) cl2 9422682426 (49316) But changelogs are enabled: Oct 21 08:32:23 oak-md1-s1 kernel: Lustre: oak-MDD0001: changelog on Oct 21 08:32:23 oak-md1-s1 kernel: Lustre: 41861:0:(mdd_device.c:545:mdd_changelog_llog_init()) oak-MDD0001 : orphan changelog records found, starting from index 0 to index 9422682426, being cleared now |
| Comment by Stephane Thiell [ 26/Oct/20 ] |
|
Hi Hongchao, Should I just stop each MDT and rename the file changelog_catalog and changelog_users at the ldiskfs level? Any other recommendation to not leak too much space on MDT if I do that? Thanks! |
| Comment by Stephane Thiell [ 28/Oct/20 ] |
|
Hi Hongchao, This morning, we tried the procedure of renaming the changelog files under ldiskfs like found in other tickets, eg: mv changelog_catalog changelog_catalog.bak mv changelog_users changelog_users.bak But even after that, the changelogs are still not recorded (size is stuck at 33016): [root@oak-md1-s2 ~]# lctl get_param mdd.oak-MDT0000.changelog_* mdd.oak-MDT0000.changelog_deniednext=60 mdd.oak-MDT0000.changelog_gc=0 mdd.oak-MDT0000.changelog_max_idle_indexes=2097446912 mdd.oak-MDT0000.changelog_max_idle_time=2592000 mdd.oak-MDT0000.changelog_min_free_cat_entries=2 mdd.oak-MDT0000.changelog_min_gc_interval=3600 mdd.oak-MDT0000.changelog_size=33016 mdd.oak-MDT0000.changelog_mask= CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO CLOSE LYOUT TRUNC SATTR HSM MTIME CTIME MIGRT FLRW RESYNC mdd.oak-MDT0000.changelog_users= current index: 1 ID index (idle seconds) cl1 0 (254) [root@oak-gw02 ~]# lfs changelog_clear oak-MDT0000 cl1 0 [root@oak-gw02 ~]# echo $? 0 [root@oak-gw02 ~]# lfs changelog oak-MDT0000 cl1 2 00MARK 13:22:02.346104356 2020.10.28 0x0 t=[0x40001:0x0:0x0] ef=0xf u=0:0 nid=0@<0:0> p=[0:0x50:0x2] Also old changelog IDs are still working, which seems broken: [root@oak-gw02 ~]# lfs changelog oak-MDT0000 cl2 2 00MARK 13:22:02.346104356 2020.10.28 0x0 t=[0x40001:0x0:0x0] ef=0xf u=0:0 nid=0@<0:0> p=[0:0x50:0x2] [root@oak-gw02 ~]# lfs changelog_clear oak-MDT0000 cl2 0 lfs changelog_clear: cannot purge records for 'cl2': No such file or directory (2) changelog_clear: no changelog user: cl2 How could we repair this and start with fresh changelogs? This issue happened after our 2.10 to 2.12 upgrade on our 4 MDTs. Thanks! |
| Comment by Dominique Martinet [ 30/Oct/20 ] |
|
Hi, I think there are two separate issues: - the initial problem before changelogs were manually cleared. The message "orphan changelog records found, starting from index 0 to index 22264575363, being cleared now" indicates that something very weird went wrong, it means the current reader (cl1) was at index 22264575363 which is ok but it found an entry for index 0 in the changelog catalog itself. I've tried quite a few things including zero-crossing catalog before upgrading 2.10->2.12 but couldn't reproduce this, honestly no idea :/ That being said I was able to reproduce the exact same message by editing the catalog manually and adding a changelog with index 0 and it worked fine even with holes in the catalog; I don't think anything really bad happened. - lack of changelog generation is completely unrelated and was due to nodemaps and audit_mode not defaulting to 1 on upgrade, that was the real problem. commit 942a9853f7b4c6fe22729468f1802ab782087e4e from 2.11 says "By default, audit_mode flag is set to 1 in newly created nodemap entries. And it is also set to 1 on 'default' nodemap." but existing maps are read back from disk and fill the void with 0 ; I'll check quickly if that could be improved. |
| Comment by Dominique Martinet [ 30/Oct/20 ] |
|
Unfortunately I don't see how to detect if the stored nodemap comes from an upgrade or not, the record has nothing like a "format version" that would let it check. With hindsight 20/20 that probably should have been added together with the new field, at this point even if we add one now we won't know if it's someone who explicitely had it disabled or not... I'm the type that never reads the doc but maybe it's already documented to be careful with nodemaps on upgrade for that reason?
Unless someone has a miracle idea on how to properly init the nodemaps with nmf_enable_audit / NM_FL_ENABLE_AUDIT set on upgrade I guess this LU can be closed |