[LU-16467] lod_trans_space_check() fails with -28 during file unlink Created: 12/Jan/23 Updated: 13/Jan/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Mikhail Pershin | Assignee: | Lai Siyao |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
We have situation with striped directories and full MDTs. File unlink fails with -ENOSPC error. # lfs df
UUID 1K-blocks Used Available Use% Mounted on
ai200x-MDT0000_UUID 139539628 133404116 3719624 98% /lustre/ai200x/client[MDT:0]
ai200x-MDT0001_UUID 139539628 123157904 13966032 90% /lustre/ai200x/client[MDT:1]
ai200x-MDT0002_UUID 139539628 137217096 0 100% /lustre/ai200x/client[MDT:2]
ai200x-MDT0003_UUID 139539628 124383568 12740368 91% /lustre/ai200x/client[MDT:3]
# rm -rf 542162
rm: cannot remove '542162': No space left on device
[root@ai200x-001 blogbench]# lfs getdirstripe 542162
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
1 [0x240002b16:0x11cec:0x0]
2 [0x2c0001bba:0x11cec:0x0]
3 [0x280000c3e:0x86b8:0x0]
[43349.098414] LustreError: 470595:0:(file.c:249:ll_close_inode_openhandle()) ai200x-clilmv-ff47cef76496c800: inode [0x240001b78:0x8a9c:0x0] mdc close failed: rc = -28
File itself is placed on MDT0001 which has space but unlink operation calls OSP and failed on osp_statfs(): osp_statfs()) ai200x-MDT0002-osp-MDT0001: 34884907 blocks, 598078 free, 0 avail ... lod_trans_space_check()) ai200x-MDT0002-osp-MDT0001: fail - target state 220: rc = -28 So as result file is not removed because MDT0002 is full. This is the result of |
| Comments |
| Comment by Mikhail Pershin [ 12/Jan/23 ] |
|
I've just added lustre debug log collected around rm operation: # rm /lustre/ai200x/client/blogbench/542327/blog-8/article-7.xml and 'blog-8' info just in case: # lfs getdirstripe /lustre/ai200x/client/blogbench/542327/blog-8 lmv_stripe_count: 4 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64 mdtidx FID[seq:oid:ver] 0 [0x200004a50:0x12072:0x0] 1 [0x240002b21:0x12072:0x0] 2 [0x2c0001bc2:0x839d:0x0] 3 [0x280000c45:0x11af:0x0] |
| Comment by Andreas Dilger [ 13/Jan/23 ] |
|
I think the straight forward solution here is for lod_trans_space_check() to skip the space check for unlink and rmdir operations, or at least to check "free" instead of "avail" space for those operation types. |