[LU-15913] rename stress test leads to REMOTE_PARENT_DIR corruption Created: 06/Jun/22 Updated: 08/Feb/24 |
|
| Status: | Reopened |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Artem Blagodarenko (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||
| Description |
|
Streess test with active renaming of files and directories inside a striped directory finished with an error: Performing actions for 'rename(2,676pp674,aaaaaaaaaa,bbbbbbbbbb,cccccccccc,dddddddddd,eeeeeeeeee,ffffffffff,gggggggggg,hhhhhhhhhh,iiiiiiiiii,jjjjjjjjjj,kkkkkkkkkk,llllllllll,mmmmmmmmmm,nnnnnnnnnn,oooooo -> 3,676pp674,aaaaaaaaaa,bbbbbbbbbb,cccccccccc,dddddddddd,eeeeeeeeee,ffffffffff,gggggggggg,hhhhhhhhhh,i) error Input/output error (5)' numerrs=1 Fri Jun 3 01:06:30 CDT 2022 leads to corrupted MDT partition. E2fsck reports next symptoms: 1) incorrect filetype 2) a link to directory 3) .. points to the REMOTE_PARENT_DIR but should point to the special directory with sequence name ./data.20220423/server/e2fsck.pre_read_only.kjcf04n03.6.1-010.43.20220504165818.out.kjcf04n03:Entry '14,37275pp37273,aaaaaaaaaa,bbbbbbbbbb,cccccccccc,dddddddddd,eeeeeeeeee,ffffffffff,gggggggggg,hhh' in /REMOTE_PARENT_DIR/0x24006497a:0x129bc:0x0 (4014289161) has an incorrect filetype (was 17, should be 2). ./data.20220423/server/e2fsck.pre_read_only.kjcf04n03.6.1-010.43.20220504165818.out.kjcf04n03:Entry '0x24006b05d:0xa4c8:0x0' in /REMOTE_PARENT_DIR (4030089985) is a link to directory /REMOTE_PARENT_DIR/0x24006497a:0x129bc:0x0/14,37275pp37273,aaaaaaaaaa,bbbbbbbbbb,cccccccccc,dddddddddd,eeeeeeeeee,ffffffffff,gggggggggg,hhh (4014289162 fid=[0x20006a4b1:0x42c6:0x0]). ./data.20220423/server/e2fsck.pre_read_only.kjcf04n03.6.1-010.43.20220504165818.out.kjcf04n03:'..' in /REMOTE_PARENT_DIR/0x24006497a:0x129bc:0x0/14,37275pp37273,aaaaaaaaaa,bbbbbbbbbb,cccccccccc,dddddddddd,eeeeeeeeee,ffffffffff,gggggggggg,hhh (4014289162) is /REMOTE_PARENT_DIR (4030089985), should be /REMOTE_PARENT_DIR/0x24006497a:0x129bc:0x0 (4014289161).
|
| Comments |
| Comment by Artem Blagodarenko (Inactive) [ 06/Jun/22 ] |
|
Now I think this symptom is important.
Jun 2 19:01:42 kjcf04n02 kernel: LustreError: 55470:0:(out_handler.c:910:out_tx_end()) kjcf04-MDT0000-osd: undo for /builddir/build/BUILD/lustre-2.15.0.1_rc2_cray_105_g679d729/lustre/ptlrpc/../../lustre/target/out_handler.c:445: rc = -524
This message gives hint that possible error in out_handler can be clue.
#define LUSTRE_ENOTSUPP 524 /* Operation is not supported */
This message has been printed in out_tx_end() during undo operation that is not supported. if (ta->ta_args[i]->undo_fn != NULL) ta->ta_args[i]->undo_fn(env, ta->ta_handle, ta->ta_args[i]); else CERROR("%s: undo for %s:%d: rc = %d\n", dt_obd_name(ta->ta_handle->th_dev), ta->ta_args[i]->file, ta->ta_args[i]->line, -ENOTSUPP); The message says that undo operation for out_xattr_set() is not supported. There are two questions now:
|
| Comment by Andreas Dilger [ 09/Jun/22 ] |
|
This is likely due the parallel rename locking changes in It would also be useful to determine if this is affecting directory renames or also file renames? The |
| Comment by Cory Spitz [ 10/Jun/22 ] |
|
It appears that only file renames are affected, but artem_blagodarenko can correct me if I'm wrong. A symptom seen before the metadata corruption is EIO returned to rename(). With parallel rename from |
| Comment by Cory Spitz [ 10/Jun/22 ] |
|
Test passed with no new e2fsck errors! I'd say we found our culprit. |
| Comment by Andreas Dilger [ 10/Jun/22 ] |
|
It would be useful to know more details of the reproducer workload. Is this renaming in local directories or striped/remote directories, within one directory or within multiple directories, from a single client or multiple clients? The parallel rename patch should only be renaming within a single local directory, so (in theory) the REMOTE_PARENT_DIR should not be involved. It might be that renames within a striped directory could trigger this, if the "rename within same parent" check is incorrect? |
| Comment by Gerrit Updater [ 11/Jun/22 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47593 |
| Comment by Andreas Dilger [ 12/Jun/22 ] |
|
The patch 47593 both has tunable parameters ("mdt.*.enable_parallel_rename_file|dir", default off) to independently control parallel rename of both files and directories. In addition, I believe it also contains a fix in the form of disabling parallel renames within a striped directory. It would be useful if HPE could test both the default patch mode as well as with enable_parallel_rename_*=1 to determine if this is a proper fix for the issue. The patch could be further optimized to allow parallel rename within a single shard of the same directory, and to add additional locking to REMOTE_PARENT_DIR, but it isn't clear if that is needed, or if exchanging one level of locking is more efficient than locking in a different area of the code. |
| Comment by Cory Spitz [ 13/Jun/22 ] |
|
> I'd say we found our culprit. Andreas, thanks for posting your patch to try and bypass the problem. We'll get more details about the tests here shortly and we can decide what to do for both the code and the upcoming release of 2.15.0. |
| Comment by Peggy Gazzola [ 13/Jun/22 ] |
|
Regarding the reproducer, the test by default runs in local dirs, remote dirs, striped dirs. It's random, based on a mkdir wrapper script. The failures have mostly (possibly always) involved striped dirs; we'll need to check/verify that. |
| Comment by Artem Blagodarenko (Inactive) [ 14/Jun/22 ] |
|
Hi Andreas, Thank you for sharing https://review.whamcloud.com/#/c/47593/ I have reverted this patches at Friday commit c76f4a2ab19c203e8dd6d826e2921721ea3430b5 (HEAD -> LU-15285-revert) Author: Artem Blagodarenko <artem.blagodarenko@gmail.com> Date: Wed Jun 8 17:41:51 2022 -0400 LUS-10934 Revert "LU-12125 mds: allow parallel directory rename" This reverts commit 90979ab390a72a084f6a77cf7fdc29a4329adb41. Change-Id: I885a135ccb3aa11d4a349ae4658c60664634af64 commit a6c4b929bd69fd4a0a6dfc083be577f4bdfe7906 Author: Artem Blagodarenko <artem.blagodarenko@gmail.com> Date: Wed Jun 8 17:39:51 2022 -0400 Revert "LU-12125 mds: allow parallel regular file rename" This reverts commit d76cc65d5d68ed3e04bfbd9b7527f64ab0ee0ca7. Change-Id: I37893e0a3b49560003eac0905f54724c4a76a20a commit 4b3d30030298389b93bfffab46a5f57c8977ba5c Author: Artem Blagodarenko <artem.blagodarenko@gmail.com> Date: Wed Jun 8 17:22:44 2022 -0400 Revert "LU-15285 mdt: fix same-dir racing rename deadlock" This reverts commit 82ec537d8b4cc9261828f4efe6b03d8d33f38432. Change-Id: I9527f74eeea30c7625cdbad3cc990b3e2c114377 And after 2 days and 19 hours hit 9 rename issues again. We can check your patch. Probably my revert is not complete (I am comparing) something. |
| Comment by Andreas Dilger [ 15/Jun/22 ] |
Artem, the revert patches look correct, so it seems probable that the remaining EIO error is unrelated to the parallel rename? It looks like about 8h per failure. I don't have any info on how often the test was failing before the patches were reverted, but presumably it was failing more often? Does the current EIO failure also report corruption with e2fsck, or is it just the EIO error to userspace? Did this same test ever run against 2.14.0 for the same amount of time? My patch does three things:
It would be useful to test the patch with at least the default mode (all parallel renames disabled), as well as with parallel renames enabled to see if disabling the striped directory renames also "mostly fixes" the problem. I think at this point that the chance of hitting the "mostly fixed" problem is extremely low, given that this is itself an unlikely workload, so if the patch is also fixing the problem to the same level as reverting the patches, then I think the patch should land and the release is made as-is. |
| Comment by Artem Blagodarenko (Inactive) [ 15/Jun/22 ] |
|
Hi Andreas, >so it seems probable that the remaining EIO error is unrelated to the parallel rename? It seems, but I still have a hope that may revert is not complete or incorrect. So I want to see testing results with your patch. >It looks like about 8h per failure. I don't have any info on how often the test was failing before the patches were reverted, but presumably it was failing more often? It looks like the rate is quite the same. We executed 48h before and seen quite the same amount of EIO and e2fsck errors. >Does the current EIO failure also report corruption with e2fsck, or is it just the EIO error to userspace? Yes, e2fsck reports same corruptions: [root@kjcf04n00 admin]# diff <(grep -h "has an incorrect filetype" *20220610141112*) <(grep -h "has an incorrect filetype" *20220613171537*) 4a5 > Entry '9,14403pp14401,aaaaaaaaaa,bbbbbbbbbb,cccccccccc,dddddddddd,eeeeeeeeee,ffffffffff,gggggggggg,h' in /REMOTE_PARENT_DIR/0x2000b8d6e:0x2730:0x0 (3808878414) has an incorrect filetype (was 17, should be 2). [root@kjcf04n00 admin]# [root@kjcf04n00 admin]# diff <(grep -h "is a link to directory" *20220610141112*) <(grep -h "is a link to directory" *20220613171537*) 5a6 > Entry '0x2000bb431:0x314:0x0' in /REMOTE_PARENT_DIR (1472988673) is a link to directory /REMOTE_PARENT_DIR/0x2000b8d6e:0x2730:0x0/9,14403pp14401,aaaaaaaaaa,bbbbbbbbbb,cccccccccc,dddddddddd,eeeeeeeeee,ffffffffff,gggggggggg,h (3808878415). 6a8 > Entry '0x2000b8df1:0x36e6:0x0' in /REMOTE_PARENT_DIR (1472988673) is a link to directory /REMOTE_PARENT_DIR/0x2000b8d6e:0x164f:0x0/8:2,9035pp8575,aaaaaaaaaa,bbbbbbbbbb,cccccccccc,dddddddddd,eeeeeeeeee,ffffffffff,gggggggg (3362505009 fid=[0x2400bac60:0x277b:0x0]). 8a11 > Entry '0x2400b4ee8:0x111a6:0x0' in /REMOTE_PARENT_DIR (4030089985) is a link to directory /REMOTE_PARENT_DIR/0x2400b4ee6:0x13807:0x0/[0x2400b4ea0:0x1dd3:0x0]:0/26,26044pp26042,aaaaaaaaaa,bbbbbbbbbb,cccccccccc,dddddddddd,eeeeeeeeee,ffffffffff,gggggggggg,hhhhhhhhhh,iiiiiiiiii,jjjjjjjjjj,kkkkkkkkkk,llllllllll,mmmmmmmmmm,nnnnnnnnnn,oooooooooo (1296546125). [root@kjcf04n00 admin]# >Did this same test ever run against 2.14.0 for the same amount of time? Andreas, currently testing we did previously is on hold due technical problems. Do you have any ideas whether this issue can be reproducable on the smaller environment? Do you think clients count make sense? IO rate? Thanks.
|
| Comment by Andreas Dilger [ 16/Jun/22 ] |
Artem, my current theory is that the problem is caused by parallel renames by different clients/mountpoints in a single directory, which cause the updates in REMOTE_PARENT_DIR to fail. It is possible that the conflicting renames can only happen while parallel rename is allowed, otherwise the renames are serialized and cannot race. |
| Comment by Gerrit Updater [ 16/Jun/22 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47643 |
| Comment by Artem Blagodarenko (Inactive) [ 16/Jun/22 ] |
|
>Artem, my current theory is that the problem is caused by parallel renames by different clients/mountpoints in a single directory, which cause the updates in REMOTE_PARENT_DIR to fail. It is possible that the conflicting renames can only happen while parallel rename is allowed, otherwise the renames are serialized and cannot race The problem reproduced even with https://review.whamcloud.com/#/c/47593/ (default configure) applied, so serialization doesn't help unfortunatly. |
| Comment by Andreas Dilger [ 16/Jun/22 ] |
The patch in the default mode is essentially the same as reverting the parallel rename patches, so it seems like the parallel rename patches are not the source of the problem. That is a bit contradictory to Cory's previous comment, so some clarification is needed. Was the "problem going away after reverting" was just a lucky gap in failures? How many clients/mountpoints/threads are needed to hit this issue, and what type of rename operation is hitting the EIO? I have been trying to reproduce with multiple mountpoints and test directories, and 4x threads per directory creating and renaming files within those directories (because of focus on parallel rename patches) but have not hit the problem. This all makes me think that the parallel rename patches are not the cause of the current problems (or are at most making the problem easier to hit), and either this issue predates 2.14.0, or possibly was caused by one of the other changes in the MDT code. My suggestion would be to test 2.14.0 to see if the problem is hit there, and if not the code should be bisected to find the patch that causes the problem. Also, given that the problem seems extremely difficult to reproduce outside of a dedicated rename stress test, I don't think this is a risk of being hit by any normal user workload and should no longer be considered a blocker for 2.15.0. When the root cause is found we can always issue a patch for 2.15.1, which itself will be released fairly quickly afterward because of el8.6 and other needs. |
| Comment by Artem Blagodarenko (Inactive) [ 16/Jun/22 ] |
|
>That is a bit contradictory to Cory's previous comment, so some clarification is needed. |
| Comment by Artem Blagodarenko (Inactive) [ 30/Jun/22 ] |
|
The problem is fixed with the patch from the |
| Comment by Cory Spitz [ 30/Jun/22 ] |
|
> but avoiding a rollback is a solution for this exact issue |
| Comment by Artem Blagodarenko (Inactive) [ 30/Jun/22 ] |
|
Opps...actually we have patches here, that were not landed yet. So it is too early to close it. I have reopened the issue. |
| Comment by Andreas Dilger [ 30/Jun/22 ] |
This is a bit confusing to me, since patch https://review.whamcloud.com/47226 " That said, the root of the problem has been described as the rename failing because there is a conflicting target name appearing that would cause the rename to fail. I don't think this should be happening, given that the source and target FIDs and should be locked before the rename is actually done. Is it possible that the |
| Comment by Alex Zhuravlev [ 30/Jun/22 ] |
|
one option is to make the failing update (e.g. insert) the very first update in the batch so there would be nothing to rollback. |
| Comment by Cory Spitz [ 30/Jun/22 ] |
|
> Was testing not being done with 2.15.0-RC5? |
| Comment by Gerrit Updater [ 18/Jul/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47593/ |
| Comment by Gerrit Updater [ 23/Jan/24 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53768 |
| Comment by Gerrit Updater [ 08/Feb/24 ] |
|
"Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53981 |