[LU-10729] replay-dual test_23d: FAIL: Remote creation failed 1 : mkdir: cannot create directory': File exists Created: 27/Feb/18 Updated: 29/Nov/23 Resolved: 11/Apr/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0, Lustre 2.10.4, Lustre 2.10.5, Lustre 2.10.7, Lustre 2.12.1, Lustre 2.14.0, Lustre 2.12.6 |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Elena Gryaznova | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
test fails on failover config: hosts != failoverhosts == replay-dual test 23d: c1 rmdir d1, M0 drop update reply and fail M0/M1, c2 mkdir d1 =============== 19:53:39 (1519674819)
fail_loc=0x1701
fail_loc=0
Failing mds1 on fre909
Stopping /mnt/lustre-mds1 (opts:) on fre909
pdsh@fre913: fre909: ssh exited with exit code 1
Failing mds2 on fre910
Stopping /mnt/lustre-mds2 (opts:) on fre910
pdsh@fre913: fre910: ssh exited with exit code 1
reboot facets: mds1
Failover mds1 to fre910
...
fre915: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 0 sec
UUID 1K-blocks Used Available Use% Mounted on
lustre-MDT0000_UUID 1168028 6688 1062924 1% /mnt/lustre[MDT:0]
lustre-MDT0001_UUID 1168028 6272 1063340 1% /mnt/lustre[MDT:1]
lustre-OST0000_UUID 4607356 34096 4306876 1% /mnt/lustre[OST:0]
lustre-OST0001_UUID 4607356 34096 4306876 1% /mnt/lustre[OST:1]
lustre-OST0002_UUID 4607356 34100 4306872 1% /mnt/lustre[OST:2]
lustre-OST0003_UUID 4607356 34100 4306872 1% /mnt/lustre[OST:3]
filesystem_summary: 18429424 136392 17227496 1% /mnt/lustre
fre914: mkdir: cannot create directory '/mnt/lustre2/d23d.replay-dual/remote_dir': File exists
pdsh@fre913: fre914: ssh exited with exit code 1
replay-dual test_23d: @@@@@@ FAIL: Remote creation failed 1
|
| Comments |
| Comment by Minh Diep [ 23/Mar/18 ] |
|
+1 on b2_10 https://testing.hpdd.intel.com/test_sets/9a12c54c-2eab-11e8-b3c6-52540065bddc |
| Comment by James Nunez (Inactive) [ 06/Aug/18 ] |
|
When this test fails, we see that several test suites that run after replay-dual fail because they can't clean up this sub test directory; the tests don't even start to run or the test suite fails on clean up after all tests are run. From the end of the replay-vbr test_log at https://testing.whamcloud.com/test_sets/95dc7668-98ef-11e8-a9f7-52540065bddc, we see rm: cannot remove '/mnt/lustre/d23d.replay-dual': Directory not empty Trace dump: = /usr/lib64/lustre/tests/replay-vbr.sh:31:main() replay-vbr: FAIL: test-framework exiting on error I'm gong to attribute these failures to this ticket. |
| Comment by Alex Zhuravlev [ 19/Nov/21 ] |
|
the llog cancel for a previous distributed transaction race with the update from rmdir ../remote_dir and if that llog write wins then rmdir's update is on hold (until the reply for llog write). |
| Comment by Gerrit Updater [ 19/Nov/21 ] |
|
"Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45623 |
| Comment by Gerrit Updater [ 11/Apr/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/45623/ |
| Comment by Peter Jones [ 11/Apr/23 ] |
|
Landed for 2.16 |
| Comment by Andreas Dilger [ 09/Sep/23 ] |
|
I also see this fail intermittently on replay-dual test_22d. |
| Comment by Gerrit Updater [ 12/Sep/23 ] |
|
"Xing Huang <hxing@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52343 |
| Comment by Gerrit Updater [ 29/Nov/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52343/ |