[LU-456] statfs reports truncated blocks as freed while they are not Created: 23/Jun/11 Updated: 11/Oct/11 Resolved: 24/Jul/11 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.1.0, Lustre 1.8.7 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Johann Lombardi (Inactive) | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 4946 |
| Description |
|
I am seeing a strange behavior with an home-made test doing write_until_ENOSPC/truncate/write_until_ENOSPC. The test is very simple: for i in `seq 2`; do dd ... done. The dd processes should run until they get ENOSPC. While looking at the debug logs, i have actually found that ext4 might be pulling my legs. >From the Logs of the 2nd dd which basically does truncate + writes: 00000080:00200000:2.0:1307742632.935829:0:23441:0:(rw.c:82:ll_truncate()) VFS 00000100:00100000:2.0:1307742632.935994:0:23114:0:(service.c:1705:ptlrpc_server_handle_reque 00000100:00100000:2.0:1307742632.950925:0:23114:0:(service.c:1705:ptlrpc_server_handle_reque 00002000:00000020:2.0:1307742632.950941:0:23114:0:(ofd_grant.c:177:ofd_grant_statfs()) 00002000:00000020:2.0:1307742632.950945:0:23114:0:(ofd_grant.c:741:ofd_grant_prepare_write() 00002000:00000020:2.0:1307742632.950951:0:23114:0:(ofd_grant.c:177:ofd_grant_statfs()) 00002000:00000020:2.0:1307742632.950958:0:23114:0:(ofd_grant.c:457:ofd_grant_check()) 00002000:00000020:2.0:1307742632.950961:0:23114:0:(ofd_grant.c:544:ofd_grant()) 00000100:00100000:2.0:1307742632.951196:0:23114:0:(service.c:1752:ptlrpc_server_handle_reque 11 similar writes requests succeeded, and then on the 12th: 00002000:00000020:2.0:1307742633.023255:0:23114:0:(ofd_grant.c:177:ofd_grant_statfs()) 00002000:00000020:2.0:1307742633.023269:0:23114:0:(ofd_grant.c:457:ofd_grant_check()) 00002000:00000020:2.0:1307742633.023273:0:23114:0:(ofd_grant.c:544:ofd_grant()) 00000100:00100000:2.0:1307742633.025038:0:23114:0:(service.c:1752:ptlrpc_server_handle_reque The write actually failed on in fsfilt_map_nblocks(): Although statfs returned that the space released by truncate was freed, we still cannot allocate it. No need to say that this breaks grants. All those logs are from the Orion branch, but i can reproduce the same problem with master. |
| Comments |
| Comment by Johann Lombardi (Inactive) [ 23/Jun/11 ] |
|
For the record, i tried to sync & wait for more than 5s between the two dds and it did not help. A 3rd dd can successfully use the free space (modulus some grant leak due to client not expecting to be granted space back on write requests failing with ENOSPC). FYI, i use RHEL5. |
| Comment by Peter Jones [ 23/Jun/11 ] |
|
Niu Could you please look into this issue as your top priority? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 24/Jun/11 ] |
|
Hi, Johann I can easily reproduce it on my local test environment, but when I tried to mount the ost with "data=writeback" the problem is gone, could you confirm that 'data=writeback' doesn't help for you? Seems for non-writeback mode, ext4 will make sure the block not be reused until commit, see the following piece of code in ldiskfs_free_blocks(): /* We need to make sure we don't reuse
I think that's the reason of second dd failed on allocating blocks, because the truncate hasn't commit yet before the write arrived ost. When I change the second dd to:
I also tried your test on local ext4 mount, but failed to reproduce the problem even with non-writeback mode (don't see the reason yet, will do further investigation), so I suspect this problem is kind of lustre bug. Alex, could you take a look on it when you have time? Thank you in advance. |
| Comment by Johann Lombardi (Inactive) [ 24/Jun/11 ] |
|
> I can easily reproduce it on my local test environment, but when I tried to mount the ost with "data=writeback" Well, maybe i messed up during the test. However, i cannot reproduce the same issue with b1_8 which uses data=ordered. > Seems for non-writeback mode, ext4 will make sure the block not be reused until commit, Right, Alex already pointed me to this part of the code. > /* We need to make sure we don't reuse Has this code changed recently? > I think that's the reason of second dd failed on allocating blocks, because the truncate hasn't commit Well, lustre does not use the same path as the VFS. We have been messing with locking (i.e. fsfilt_down_truncate_sem()) |
| Comment by Andreas Dilger [ 25/Jun/11 ] |
|
The just-deleted space is not released until there is a journal commit callback. I was just looking at this code with Bobijam to remove the jbd2-jcberr patch from the kernel. It makes sense to have the OFD code do a journal commit and wait for it if the filesystem is so full and an object is being unlinked or truncated. Another option is to not return the blocks to the statfs free pool until the commit callback is run, so that the grant code is not confused. It is probably worthwhile to look at the truncate path to check that there isn't anything else strange going on that prevents the blocks from being reused. |
| Comment by Niu Yawei (Inactive) [ 26/Jun/11 ] |
|
Hi, Johann I think the problem is not caused by we messing with fsfilt_down_truncate_sem(), I tried to remove the read lock in ldiskfs_ext_walk_space(), and use the write lock to cover the whole fsfilt_ldiskfs_ext_walk_space(), but it didn't fix the problem. For the local ext4, when the block allocation get failed with ENOSPC, it'll wait for the commiting transaction to complete then retry (see ext4_should_retry_alloc() called by ext4_write_begin()), I think that's why the local ext4 doesn't have such problem. For b1_8, the default 'fo_syncjournal' is 1 (the default 'fo_syncjournal for master is 0), which means each write of b1_8 has to wait for current commiting complete, that's why the problem can't be reproduced on b1_8. Hi, Andreas Thanks |
| Comment by Niu Yawei (Inactive) [ 27/Jun/11 ] |
|
I tried to use the same way as ext4_write_begin() to handle the -ENOSPC error in filter_direct_io(), it works for me. Will post a patch for review soon. |
| Comment by Niu Yawei (Inactive) [ 27/Jun/11 ] |
|
The patch is posted at http://review.whamcloud.com/1022 |
| Comment by Oleg Drokin [ 30/Jun/11 ] |
|
While I understand the issue, I am not sure this is important enough to be a blocker. Besides this is the behavior in previous releases as well. |
| Comment by Johann Lombardi (Inactive) [ 30/Jun/11 ] |
|
The real problem is the grant leak caused by this issue. |
| Comment by Oleg Drokin [ 05/Jul/11 ] |
|
I think we handle grant leak on resending already, so there should not be any grant leak to speak of. |
| Comment by Johann Lombardi (Inactive) [ 06/Jul/11 ] |
|
> I think we handle grant leak on resending already, so there should not be any grant leak to speak of. The leak is not related to resend. The server grants space to the client in a bulk write reply, but the request fails with ENOSPC during commit. |
| Comment by Build Master (Inactive) [ 15/Jul/11 ] |
|
Integrated in Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
|
| Comment by Build Master (Inactive) [ 15/Jul/11 ] |
|
Integrated in Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
|
| Comment by Build Master (Inactive) [ 15/Jul/11 ] |
|
Integrated in Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
|
| Comment by Build Master (Inactive) [ 15/Jul/11 ] |
|
Integrated in Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
|
| Comment by Build Master (Inactive) [ 15/Jul/11 ] |
|
Integrated in Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
|
| Comment by Build Master (Inactive) [ 15/Jul/11 ] |
|
Integrated in Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
|
| Comment by Build Master (Inactive) [ 15/Jul/11 ] |
|
Integrated in Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
|
| Comment by Build Master (Inactive) [ 15/Jul/11 ] |
|
Integrated in Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
|
| Comment by Build Master (Inactive) [ 15/Jul/11 ] |
|
Integrated in Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
|
| Comment by Build Master (Inactive) [ 15/Jul/11 ] |
|
Integrated in Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
|
| Comment by Build Master (Inactive) [ 15/Jul/11 ] |
|
Integrated in Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
|
| Comment by Build Master (Inactive) [ 24/Jul/11 ] |
|
Integrated in Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
|
| Comment by Build Master (Inactive) [ 24/Jul/11 ] |
|
Integrated in Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
|
| Comment by Build Master (Inactive) [ 24/Jul/11 ] |
|
Integrated in Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
|
| Comment by Build Master (Inactive) [ 24/Jul/11 ] |
|
Integrated in Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
|
| Comment by Build Master (Inactive) [ 24/Jul/11 ] |
|
Integrated in Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
|
| Comment by Build Master (Inactive) [ 24/Jul/11 ] |
|
Integrated in Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
|
| Comment by Build Master (Inactive) [ 24/Jul/11 ] |
|
Integrated in Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
|
| Comment by Build Master (Inactive) [ 24/Jul/11 ] |
|
Integrated in Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
|
| Comment by Build Master (Inactive) [ 24/Jul/11 ] |
|
Integrated in Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
|
| Comment by Build Master (Inactive) [ 24/Jul/11 ] |
|
Integrated in Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
|
| Comment by Build Master (Inactive) [ 24/Jul/11 ] |
|
Integrated in Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
|
| Comment by Build Master (Inactive) [ 24/Jul/11 ] |
|
Integrated in Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
|
| Comment by Build Master (Inactive) [ 24/Jul/11 ] |
|
Integrated in Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
|
| Comment by Build Master (Inactive) [ 24/Jul/11 ] |
|
Integrated in Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
|
| Comment by Peter Jones [ 24/Jul/11 ] |
|
Landed for 2.1 |