[LU-16467] lod_trans_space_check() fails with -28 during file unlink Created: 12/Jan/23  Updated: 13/Jan/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Mikhail Pershin Assignee: Lai Siyao
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Text File lustre2.log    
Issue Links:
Related
is related to LU-14719 "lfs migrate -m" creates broken agent... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We have situation with striped directories and full MDTs. File unlink fails with -ENOSPC error.

# lfs df
UUID                   1K-blocks        Used   Available Use% Mounted on
ai200x-MDT0000_UUID    139539628   133404116     3719624  98% /lustre/ai200x/client[MDT:0]
ai200x-MDT0001_UUID    139539628   123157904    13966032  90% /lustre/ai200x/client[MDT:1]
ai200x-MDT0002_UUID    139539628   137217096           0 100% /lustre/ai200x/client[MDT:2]
ai200x-MDT0003_UUID    139539628   124383568    12740368  91% /lustre/ai200x/client[MDT:3]
# rm -rf 542162
rm: cannot remove '542162': No space left on device
[root@ai200x-001 blogbench]# lfs getdirstripe 542162
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx		 FID[seq:oid:ver]
     1		 [0x240002b16:0x11cec:0x0]
     2		 [0x2c0001bba:0x11cec:0x0]
     3		 [0x280000c3e:0x86b8:0x0]

[43349.098414] LustreError: 470595:0:(file.c:249:ll_close_inode_openhandle()) ai200x-clilmv-ff47cef76496c800: inode [0x240001b78:0x8a9c:0x0] mdc close failed: rc = -28

File itself is placed on MDT0001 which has space but unlink operation calls OSP and failed on osp_statfs():

osp_statfs()) ai200x-MDT0002-osp-MDT0001: 34884907 blocks, 598078 free, 0 avail
...
lod_trans_space_check()) ai200x-MDT0002-osp-MDT0001: fail - target state 220: rc = -28

So as result file is not removed because MDT0002 is full. This is the result of LU-14179 patch it seems and solution could be skipping lod_trans_space_check() for unlink operation.



 Comments   
Comment by Mikhail Pershin [ 12/Jan/23 ]

I've just added lustre debug log collected around rm operation:

# rm /lustre/ai200x/client/blogbench/542327/blog-8/article-7.xml

and 'blog-8' info just in case:
# lfs getdirstripe /lustre/ai200x/client/blogbench/542327/blog-8
lmv_stripe_count: 4 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx		 FID[seq:oid:ver]
     0		 [0x200004a50:0x12072:0x0]		
     1		 [0x240002b21:0x12072:0x0]		
     2		 [0x2c0001bc2:0x839d:0x0]		
     3		 [0x280000c45:0x11af:0x0]
Comment by Andreas Dilger [ 13/Jan/23 ]

I think the straight forward solution here is for lod_trans_space_check() to skip the space check for unlink and rmdir operations, or at least to check "free" instead of "avail" space for those operation types.

Generated at Sat Feb 10 03:27:16 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.