[LU-16717] Directory restripe breaking lmv stripe settings Created: 06/Apr/23 Updated: 04/Oct/23 Resolved: 31/May/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.2 |
| Fix Version/s: | Lustre 2.16.0, Lustre 2.15.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Patrick Keller | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | migration_improvements | ||
| Environment: |
ZFS MDT |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
While doing several directory restripes (parent dir only with "lfs migrate -m -d") at the same time, the Lustre client lost connection with the typical messages on the MDS asking to resume the migration process. Resuming doesn't work, starting the migration process on affected directories returns "File descriptor in bad state (77)". The current stripe setting of the affected directories show hash types of "none" and "bad_type", e.g. lmv_stripe_count: 5 lmv_stripe_offset: 2 lmv_hash_type: none,bad_type mdtidx FID[seq:oid:ver] 2 [0x280000bd1:0x6fb:0x0] 0 [0x200000400:0xd16:0x0] 2 [0x280000402:0xd16:0x0] 1 [0x240000401:0xd16:0x0] 3 [0x2c0000402:0xd16:0x0] The directories can be traversed, changes on the directories can't be done (adding or removing files). Several directory trees are affected through multiple levels. Trying to migrate the directories from the top level results in "Invalid argument (22)" instead of -EBADFD. Debug messages on the MDS show that lmv_is_sane() is failing when trying to access the broken directories: Thu Apr 6 09:36:53 2023] LustreError: 1952126:0:(lustre_lmv.h:468:lmv_is_sane2()) insane LMV: magic=0xcd20cd0 count=5 index=1 hash=none:0x20000000 version=3 migrate offset=1 migrate hash=fnv_1a_64:33554434. [Thu Apr 6 09:36:53 2023] LustreError: 1952126:0:(lustre_lmv.h:446:lmv_is_sane()) insane LMV: magic=0xcd20cd0 count=5 index=1 hash=none:0x20000000 version=3 migrate offset=1 migrate hash=fnv_1a_64:33554434. Removing of the directories is also imposssible ("cannot remove <path>: invalid argument"). LFSCK only has "striped_shards_skipped", but doesn't repair anything. The migration might have been started in parallel on several levels within a directory tree, maybe causing races or corruption. lfs migrate If folders are affected, they are part of a whole affected path of directories from root+1 to a directory with no more subdirectories within, e.g. /root/dir1/dir2/dir3 with directories dir1 dir2 dir3 being affected. Is there any way to repair or remove the affected files? Data loss is not an issue for us, as this is a pre-production test setup, I want to report the issue anyway, because the problem immediately occured on starting the migration. |
| Comments |
| Comment by Patrick Keller [ 06/Apr/23 ] |
|
The "lfsck_striped_dir.c" documentation suggests that the "LMV_HASH_FLAG_BAD_TYPE" flag has been introduced to distinguish between cases with a valid hash on master LMV EA and the ones where anyone needs to be trusted because there is no valid hash found yet. However, in my case there seems to be no valid hash to be found anywhere. There should be a way to ignore LMV_HASH_FLAG_BAD_TYPE in order to remove the file. As of right now, it seems the directory is marked read-only as it is supposed to for "LMV_HASH_TYPE_UNKNOWN" but there is no exception to still be able to delete it.
I have a few debug outputs that show lfsck trying to fix directories with bad name hashes (flag = 4), which seem to show the behavior described above. I assume that lfsck set the bad_type flag but can't remove it. |
| Comment by Andreas Dilger [ 06/Apr/23 ] |
|
keller, For Lustre 2.15 there is automatic MDT space balancing (should keep MDT usage within about 5% of each other). By default the top 3 levels of the directory tree have round-robin MDT directory creation to better utilize all MDTs in the system. This will create 1-stripe directories on remote MDTs to ensure all of the MDTs are being used. The round-robin MDT allocation can be enabled deeper into the directory tree with "lfs setdirstripe -D -c 1 -i -1 --max-inherit-rr=N <dir>", if needed, but the defaults should provide reasonable MDT usage out of the box. With the MDT space balancing there is very little need to use striped directories except for huge single directories. |
| Comment by Patrick Keller [ 06/Apr/23 ] |
|
The intent was to test the behavior of the file system when doing a lot of directory restriping to see if thats something that could be done in a production environment for rebalancing MDTs as well as specifically trying to test impact on single directory metadata performance. For production, we will set the round-robin depth to go into the first level within project-specific directories for initial load distribution and will let the load balancing do the rest. The ticket is only supposed to point at a possible issue that came up during restriping and/or lfsck and not supposed to lead to any new features. |
| Comment by Andreas Dilger [ 07/Apr/23 ] |
|
keller, thanks for the background, and I appreciate the bug report. I just wanted to make sure that you weren't intentionally trying to restripe the whole filesystem to fully-striped directories, since we've had a number of significant performance problems when users think that will somehow improve performance, and I wanted to steer you away from that. As I wrote, the 2.15 MDT space balancing and round-robin directory creation has proven very useful. In case we aren't able to reproduce the issue here, it would definitely be useful to collect full debug logs from the MDS (assuming you have a client mounted there also): # lctl set_param debug=all debug_mb=1024 # lctl clear # lctl mark "migrate" # lfs migrate -m -d ... <dir> # lctl mark "rmdir" # rmdir <dir> # lctl dk /tmp/debug.migrate.txt along with any console errors from the MDS from the time of original migrate error. |
| Comment by Patrick Keller [ 07/Apr/23 ] |
|
I've attached logs (debug_migrate_rmdir_leafdir) that show the debug output when trying to migrate and remove an affected directory. I ran the command on the MDS with MDT0002 because I think that's where the issue occurs. I created the directory list that i used for directory migration after the first "rc = -22" errors apeared, so the issue might have been present before restriping of the directories even started. The fids affected at that time can't be resolved bs lfs fid2path right now due to -EBADFD. I also did restriping of DoM prior on the same day, so that might actually be the root cause of the issue. attached the kernel messages for Lustre on the client doing the DoM migration in dmesg_dommigrate. I left the dump_lsm() messages in for completeness, although the messages seem to be redundant. |
| Comment by Lai Siyao [ 11/Apr/23 ] |
|
It's not supported to resume directory migration upon bad hash type. Under this condition, user can still access this directory and sub files, but can't unlink them because server code doesn't support name lookup with bad hash (it should try each stripe as what's done on client). I will look into these two issues:
|
| Comment by Andreas Dilger [ 11/Apr/23 ] |
|
Lai, I've always wondered about cases like this (eg. bad hash type). It should be possible for the MDS to "complete" the migration in any case. At worst it could pick a new hash type and move more entries around, but at least it would get the directory out of the "MIGRATION" state. |
| Comment by Gerrit Updater [ 28/Apr/23 ] |
|
"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50796 |
| Comment by Gerrit Updater [ 28/Apr/23 ] |
|
"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50797 |
| Comment by Lai Siyao [ 28/Apr/23 ] |
|
Patrick, I just pushed two patches to address the issues listed above, will you try the latter one https://review.whamcloud.com/c/fs/lustre-release/+/50797 to see if it can fix your issue? |
| Comment by Xing Huang [ 08/May/23 ] |
|
2023-05-08: Two patches being worked on. |
| Comment by Patrick Keller [ 09/May/23 ] |
|
Sorry for the late response (LUG was in the way). I can confirm that removing the directories was possible in all cases with the patched server. LFSCK didn't seem to repair anything, although I'm not sure if its supposed to change anything for directories that have already been broken by my previous LFSCK runs. I will have to setup the filesystem again, as we are going into production and can't do any more testing. |
| Comment by Xing Huang [ 20/May/23 ] |
|
2023-05-20: Two patches being worked on, the first patch is being reviewed, another one passed code-review and is depending on the first one. |
| Comment by Xing Huang [ 29/May/23 ] |
|
2023-05-29: Both two patches being worked on are ready to land(on master-next branch). |
| Comment by Gerrit Updater [ 31/May/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50796/ |
| Comment by Gerrit Updater [ 31/May/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50797/ |
| Comment by Peter Jones [ 31/May/23 ] |
|
Landed for 2.16 |
| Comment by Gerrit Updater [ 06/Jun/23 ] |
|
"Xing Huang <hxing@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51235 |
| Comment by Gerrit Updater [ 07/Jun/23 ] |
|
"Xing Huang <hxing@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51243 |
| Comment by Gerrit Updater [ 02/Aug/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51235/ |
| Comment by Gerrit Updater [ 02/Aug/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51243/ |