[LU-14383] race between lookup and migrate Created: 29/Jan/21 Updated: 12/Jan/22 Resolved: 12/Jan/22 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Vladimir Saveliev | Assignee: | WC Triage |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
The following race between mkdir and migrate may end with failure of mkdir: mkdir $DIR1/$tdir
mkdir $DIR2/$tdir/dir2 &
$LFS migrate -m 1 $DIR1/$tdir
Please confirm whether it is expected behavior has to be fixed. |
| Comments |
| Comment by Gerrit Updater [ 29/Jan/21 ] |
|
Vladimir Saveliev (c17830@cray.com) uploaded a new patch: https://review.whamcloud.com/41364 |
| Comment by Lai Siyao [ 05/Feb/21 ] |
|
This looks to be expected in current implementation: "mkdir $DIR2/$tdir/dir2" tried to create "dir2" under current layout, however if server found the layout is changed, it may return error. To solve this race, client needs to refresh layout of $tdir and try again. |
| Comment by Vladimir Saveliev [ 09/Feb/21 ] |
ok Btw, the same applies to stat:
@@ -4631,12 +4631,13 @@ test_80c() {
[ $MDSCOUNT -lt 2 ] && skip "needs >= 2 MDTs" && return
mkdir $DIR1/$tdir
+ touch $DIR1/$tdir/file
#define OBD_FAIL_MDS_OBJECT_LOCK_DELAY 0x16b
do_facet mds1 $LCTL set_param fail_loc=0x8000016b
- mkdir $DIR2/$tdir/dir2 &
- MKDIRPID=$!
+ stat $DIR2/$tdir/file &
+ STATPID=$!
$LFS migrate -m 1 $DIR1/$tdir
- wait $MKDIRPID
+ wait $STATPID
[ $? -eq 0 ] || error "stat failed"
}
This results in: == sanityn test 80c: Lookup and migrate race ========================================================= 18:04:30 (1612796670) fail_loc=0x8000016b stat: cannot stat '/mnt/lustre2/d80c.sanityn/file': No such file or directory If this is ok as well, then sanityn.sh:test_80b() should not break its "accessing the migrating directory" loop with:
stat $migrate_dir2/file5 > /dev/null || {
echo "stat file5 fails"
break
}
WIth -ENOENT clients are not able to decide whether they are to try again. Would it be possible to return -EAGAIN in case of race with migrate? |
| Comment by Lai Siyao [ 09/Feb/21 ] |
|
This issue should be fixed by FID map, which is tracked under LU-7607. The patch is on https://review.whamcloud.com/#/c/38233/, but I don't have time to update it recently. |