[LU-456] statfs reports truncated blocks as freed while they are not Created: 23/Jun/11  Updated: 11/Oct/11  Resolved: 24/Jul/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.1.0, Lustre 1.8.7

Type: Bug Priority: Blocker
Reporter: Johann Lombardi (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 4946

 Description   

I am seeing a strange behavior with an home-made test doing write_until_ENOSPC/truncate/write_until_ENOSPC. The test is very simple: for i in `seq 2`; do dd ... done. The dd processes should run until they get ENOSPC. While looking at the debug logs, i have actually found that ext4 might be pulling my legs.

>From the Logs of the 2nd dd which basically does truncate + writes:

00000080:00200000:2.0:1307742632.935829:0:23441:0:(rw.c:82:ll_truncate()) VFS
Op:inode=144115205255725059/33554436(ffff8103213c5710) to 0
=> dd truncates the file

00000100:00100000:2.0:1307742632.935994:0:23114:0:(service.c:1705:ptlrpc_server_handle_reque
st()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc
ll_ost_io_126:30dc3ffd-9567-deb3-c9ed-423e4589e173+6:21865:x1371266995979156:12345-0@lo:10
00000100:00100000:2.0:1307742632.950439:0:23114:0:(service.c:1752:ptlrpc_server_handle_reque
st()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc
ll_ost_io_126:30dc3ffd-9567-deb3-c9ed-423e4589e173+6:21865:x1371266995979156:12345-0@lo:10
Request procesed in 14451us (14480us total) trans 4294967465 rc 0/0
=> truncate request handled fairly quickly

00000100:00100000:2.0:1307742632.950925:0:23114:0:(service.c:1705:ptlrpc_server_handle_reque
st()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc
ll_ost_io_126:30dc3ffd-9567-deb3-c9ed-423e4589e173+7:21863:x1371266995979157:12345-0@lo:4
=> first write received (4KB write since the client has no grant)

00002000:00000020:2.0:1307742632.950941:0:23114:0:(ofd_grant.c:177:ofd_grant_statfs())
lustre-OST0000: cli 30dc3ffd-9567-deb3-c9ed-423e4589e173/ffff810304240400 free: 10375168
avail: 4096
=> The grant code gets statfs data from cache, only 4KB remain, makes sense since the OST
was full.

00002000:00000020:2.0:1307742632.950945:0:23114:0:(ofd_grant.c:741:ofd_grant_prepare_write()
) lustre-OST0000: fs has no space left and statfs too old
=> The grant code decides to refresh statfs info

00002000:00000020:2.0:1307742632.950951:0:23114:0:(ofd_grant.c:177:ofd_grant_statfs())
lustre-OST0000: cli 30dc3ffd-9567-deb3-c9ed-423e4589e173/ffff810304240400 free: 174878720
avail: 164167680
=> The refreshed statfs data have the space back, wonderful!

00002000:00000020:2.0:1307742632.950958:0:23114:0:(ofd_grant.c:457:ofd_grant_check())
lustre-OST0000: cli 30dc3ffd-9567-deb3-c9ed-423e4589e173/ffff810304240400 granted: 0
ungranted: 4096 grant: 0 dirty: 0
=> We have enough free space to handle this 4KB write, let's it go.

00002000:00000020:2.0:1307742632.950961:0:23114:0:(ofd_grant.c:544:ofd_grant())
lustre-OST0000: cli 30dc3ffd-9567-deb3-c9ed-423e4589e173/ffff810304240400 wants: 33554432
current grant 0 granting: 2097152
=> Given that we have plenty of free space, we even grant more space (2MB) back to the
client.

00000100:00100000:2.0:1307742632.951196:0:23114:0:(service.c:1752:ptlrpc_server_handle_reque
st()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc
ll_ost_io_126:30dc3ffd-9567-deb3-c9ed-423e4589e173+7:21863:x1371266995979157:12345-0@lo:4
Request procesed in 275us (297us total) trans 4294967466 rc 0/0
=> Write succeeded.

11 similar writes requests succeeded, and then on the 12th:

00002000:00000020:2.0:1307742633.023255:0:23114:0:(ofd_grant.c:177:ofd_grant_statfs())
lustre-OST0000: cli 30dc3ffd-9567-deb3-c9ed-423e4589e173/ffff810304240400 free: 174878720
avail: 154730496
00002000:00000020:2.0:1307742633.023259:0:23114:0:(ofd_grant.c:234:ofd_grant_space_left())
lustre-OST0000: cli 30dc3ffd-9567-deb3-c9ed-423e4589e173/ffff810304240400 avail 154730496
left 143192064 unstable 0 tot_grant 11538432 pending 0
=> Still plenty of space remain, we only wrote 12MB after all ...

00002000:00000020:2.0:1307742633.023269:0:23114:0:(ofd_grant.c:457:ofd_grant_check())
lustre-OST0000: cli 30dc3ffd-9567-deb3-c9ed-423e4589e173/ffff810304240400 granted: 1048576
ungranted: 0 grant: 10489856 dirty: 1114112
=> The client has grants now, we allow the write.

00002000:00000020:2.0:1307742633.023273:0:23114:0:(ofd_grant.c:544:ofd_grant())
lustre-OST0000: cli 30dc3ffd-9567-deb3-c9ed-423e4589e173/ffff810304240400 wants: 33554432
current grant 10424320 granting: 2097152
=> We again grant more space back.

00000100:00100000:2.0:1307742633.025038:0:23114:0:(service.c:1752:ptlrpc_server_handle_reque
st()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc
ll_ost_io_126:30dc3ffd-9567-deb3-c9ed-423e4589e173+17:21863:x1371266995979167:12345-0@lo:4
Request procesed in 1925us (2000us total) trans 0 rc -28/-28
=> but the write failed during commit

The write actually failed on in fsfilt_map_nblocks():
00002000:00001000:1.0:1307747608.661208:0:31786:0:(fsfilt-ldiskfs.c:1135:fsfilt_map_nblocks(
)) blocks 2304-2559 requested for inode 18
00000001:00000001:1.0:1307747608.661287:0:31786:0:(osd_io.c:665:osd_write_commit()) Process
leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4)

Although statfs returned that the space released by truncate was freed, we still cannot allocate it. No need to say that this breaks grants.

All those logs are from the Orion branch, but i can reproduce the same problem with master.



 Comments   
Comment by Johann Lombardi (Inactive) [ 23/Jun/11 ]

For the record, i tried to sync & wait for more than 5s between the two dds and it did not help. A 3rd dd can successfully use the free space (modulus some grant leak due to client not expecting to be granted space back on write requests failing with ENOSPC).
I also tried with data=writeback and it not help either.

FYI, i use RHEL5.

Comment by Peter Jones [ 23/Jun/11 ]

Niu

Could you please look into this issue as your top priority?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 24/Jun/11 ]

Hi, Johann

I can easily reproduce it on my local test environment, but when I tried to mount the ost with "data=writeback" the problem is gone, could you confirm that 'data=writeback' doesn't help for you?

Seems for non-writeback mode, ext4 will make sure the block not be reused until commit, see the following piece of code in ldiskfs_free_blocks():

/* We need to make sure we don't reuse

  • block released untill the transaction commit.
  • writeback mode have weak data consistency so
  • don't force data as metadata when freeing block
  • for writeback mode.
    */
    if (metadata == 0 && !ldiskfs_should_writeback_data(inode))
    metadata = 1;

I think that's the reason of second dd failed on allocating blocks, because the truncate hasn't commit yet before the write arrived ost. When I change the second dd to:

  • truncate file to zero
  • sleep 5 seconds
  • dd until ENOSPC
    then I can't reproduce the problem anymore.

I also tried your test on local ext4 mount, but failed to reproduce the problem even with non-writeback mode (don't see the reason yet, will do further investigation), so I suspect this problem is kind of lustre bug.

Alex, could you take a look on it when you have time? Thank you in advance.

Comment by Johann Lombardi (Inactive) [ 24/Jun/11 ]

> I can easily reproduce it on my local test environment, but when I tried to mount the ost with "data=writeback"
> the problem is gone, could you confirm that 'data=writeback' doesn't help for you?

Well, maybe i messed up during the test. However, i cannot reproduce the same issue with b1_8 which uses data=ordered.

> Seems for non-writeback mode, ext4 will make sure the block not be reused until commit,
> see the following piece of code in ldiskfs_free_blocks():

Right, Alex already pointed me to this part of the code.

> /* We need to make sure we don't reuse
>
> block released untill the transaction commit.
> writeback mode have weak data consistency so
> don't force data as metadata when freeing block
> for writeback mode.
> */
> if (metadata == 0 && !ldiskfs_should_writeback_data(inode))
> metadata = 1;

Has this code changed recently?

> I think that's the reason of second dd failed on allocating blocks, because the truncate hasn't commit
> yet before the write arrived ost. When I change the second dd to:
>
> truncate file to zero
> sleep 5 seconds
> dd until ENOSPC
> then I can't reproduce the problem anymore.
>
> I also tried your test on local ext4 mount, but failed to reproduce the problem even with non-writeback
> mode (don't see the reason yet, will do further investigation), so I suspect this problem is kind of
> lustre bug.

Well, lustre does not use the same path as the VFS. We have been messing with locking (i.e. fsfilt_down_truncate_sem())
in ext3_ext_walk_space/ext3_ext_new_extent_cb lately, maybe that's related?

Comment by Andreas Dilger [ 25/Jun/11 ]

The just-deleted space is not released until there is a journal commit callback. I was just looking at this code with Bobijam to remove the jbd2-jcberr patch from the kernel.

It makes sense to have the OFD code do a journal commit and wait for it if the filesystem is so full and an object is being unlinked or truncated. Another option is to not return the blocks to the statfs free pool until the commit callback is run, so that the grant code is not confused.

It is probably worthwhile to look at the truncate path to check that there isn't anything else strange going on that prevents the blocks from being reused.

Comment by Niu Yawei (Inactive) [ 26/Jun/11 ]

Hi, Johann

I think the problem is not caused by we messing with fsfilt_down_truncate_sem(), I tried to remove the read lock in ldiskfs_ext_walk_space(), and use the write lock to cover the whole fsfilt_ldiskfs_ext_walk_space(), but it didn't fix the problem.

For the local ext4, when the block allocation get failed with ENOSPC, it'll wait for the commiting transaction to complete then retry (see ext4_should_retry_alloc() called by ext4_write_begin()), I think that's why the local ext4 doesn't have such problem.

For b1_8, the default 'fo_syncjournal' is 1 (the default 'fo_syncjournal for master is 0), which means each write of b1_8 has to wait for current commiting complete, that's why the problem can't be reproduced on b1_8.

Hi, Andreas
I'll try to handle the ENOSPC like the local ext4 does in OFD code.

Thanks

Comment by Niu Yawei (Inactive) [ 27/Jun/11 ]

I tried to use the same way as ext4_write_begin() to handle the -ENOSPC error in filter_direct_io(), it works for me. Will post a patch for review soon.

Comment by Niu Yawei (Inactive) [ 27/Jun/11 ]

The patch is posted at http://review.whamcloud.com/1022

Comment by Oleg Drokin [ 30/Jun/11 ]

While I understand the issue, I am not sure this is important enough to be a blocker.
It is a corner case in out of space situation and somewhat expected that wnen you are out of space, it takes some time before some space is reclaimed after deletion and you can start writing again.

Besides this is the behavior in previous releases as well.

Comment by Johann Lombardi (Inactive) [ 30/Jun/11 ]

The real problem is the grant leak caused by this issue.

Comment by Oleg Drokin [ 05/Jul/11 ]

I think we handle grant leak on resending already, so there should not be any grant leak to speak of.
OR what is the unhandled case you think of?

Comment by Johann Lombardi (Inactive) [ 06/Jul/11 ]

> I think we handle grant leak on resending already, so there should not be any grant leak to speak of.
> OR what is the unhandled case you think of?

The leak is not related to resend. The server grants space to the client in a bulk write reply, but the request fails with ENOSPC during commit.
In this case, osc_brw_fini_request() does not call osc_update_grant() and the grant returned by the server is lost.

Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,inkernel #103
LU-456 Force commit to reuse the just-deleted blocks

Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
Files :

  • lustre/tests/sanity.sh
  • lustre/obdfilter/filter_io_26.c
  • lustre/include/obd_support.h
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-b1_8 » x86_64,client,el6,inkernel #103
LU-456 Force commit to reuse the just-deleted blocks

Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
Files :

  • lustre/include/obd_support.h
  • lustre/obdfilter/filter_io_26.c
  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-b1_8 » i686,client,el6,inkernel #103
LU-456 Force commit to reuse the just-deleted blocks

Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
Files :

  • lustre/obdfilter/filter_io_26.c
  • lustre/tests/sanity.sh
  • lustre/include/obd_support.h
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,ofa #103
LU-456 Force commit to reuse the just-deleted blocks

Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
Files :

  • lustre/tests/sanity.sh
  • lustre/include/obd_support.h
  • lustre/obdfilter/filter_io_26.c
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-b1_8 » i686,client,el5,ofa #103
LU-456 Force commit to reuse the just-deleted blocks

Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
Files :

  • lustre/obdfilter/filter_io_26.c
  • lustre/include/obd_support.h
  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-b1_8 » x86_64,client,ubuntu1004,inkernel #103
LU-456 Force commit to reuse the just-deleted blocks

Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
Files :

  • lustre/include/obd_support.h
  • lustre/tests/sanity.sh
  • lustre/obdfilter/filter_io_26.c
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-b1_8 » i686,server,el5,inkernel #103
LU-456 Force commit to reuse the just-deleted blocks

Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
Files :

  • lustre/tests/sanity.sh
  • lustre/obdfilter/filter_io_26.c
  • lustre/include/obd_support.h
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,ofa #103
LU-456 Force commit to reuse the just-deleted blocks

Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
Files :

  • lustre/include/obd_support.h
  • lustre/obdfilter/filter_io_26.c
  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-b1_8 » i686,client,el5,inkernel #103
LU-456 Force commit to reuse the just-deleted blocks

Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
Files :

  • lustre/tests/sanity.sh
  • lustre/obdfilter/filter_io_26.c
  • lustre/include/obd_support.h
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-b1_8 » i686,server,el5,ofa #103
LU-456 Force commit to reuse the just-deleted blocks

Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
Files :

  • lustre/obdfilter/filter_io_26.c
  • lustre/tests/sanity.sh
  • lustre/include/obd_support.h
Comment by Build Master (Inactive) [ 15/Jul/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,inkernel #103
LU-456 Force commit to reuse the just-deleted blocks

Johann Lombardi : bd29c2d95562591b0c063defc31c3cf70ea5a33b
Files :

  • lustre/include/obd_support.h
  • lustre/tests/sanity.sh
  • lustre/obdfilter/filter_io_26.c
Comment by Build Master (Inactive) [ 24/Jul/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #225
LU-456 Force commit to reuse the just-deleted blocks

Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
Files :

  • lustre/tests/sanity.sh
  • lustre/obdfilter/filter_io_26.c
  • lustre/include/obd_support.h
Comment by Build Master (Inactive) [ 24/Jul/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #225
LU-456 Force commit to reuse the just-deleted blocks

Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
Files :

  • lustre/tests/sanity.sh
  • lustre/include/obd_support.h
  • lustre/obdfilter/filter_io_26.c
Comment by Build Master (Inactive) [ 24/Jul/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #225
LU-456 Force commit to reuse the just-deleted blocks

Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
Files :

  • lustre/tests/sanity.sh
  • lustre/include/obd_support.h
  • lustre/obdfilter/filter_io_26.c
Comment by Build Master (Inactive) [ 24/Jul/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #225
LU-456 Force commit to reuse the just-deleted blocks

Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
Files :

  • lustre/obdfilter/filter_io_26.c
  • lustre/include/obd_support.h
  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 24/Jul/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #225
LU-456 Force commit to reuse the just-deleted blocks

Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
Files :

  • lustre/tests/sanity.sh
  • lustre/obdfilter/filter_io_26.c
  • lustre/include/obd_support.h
Comment by Build Master (Inactive) [ 24/Jul/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #225
LU-456 Force commit to reuse the just-deleted blocks

Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
Files :

  • lustre/obdfilter/filter_io_26.c
  • lustre/include/obd_support.h
  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 24/Jul/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #225
LU-456 Force commit to reuse the just-deleted blocks

Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
Files :

  • lustre/obdfilter/filter_io_26.c
  • lustre/tests/sanity.sh
  • lustre/include/obd_support.h
Comment by Build Master (Inactive) [ 24/Jul/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #225
LU-456 Force commit to reuse the just-deleted blocks

Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
Files :

  • lustre/tests/sanity.sh
  • lustre/obdfilter/filter_io_26.c
  • lustre/include/obd_support.h
Comment by Build Master (Inactive) [ 24/Jul/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #225
LU-456 Force commit to reuse the just-deleted blocks

Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
Files :

  • lustre/include/obd_support.h
  • lustre/obdfilter/filter_io_26.c
  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 24/Jul/11 ]

Integrated in lustre-master » i686,server,el5,ofa #225
LU-456 Force commit to reuse the just-deleted blocks

Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
Files :

  • lustre/tests/sanity.sh
  • lustre/include/obd_support.h
  • lustre/obdfilter/filter_io_26.c
Comment by Build Master (Inactive) [ 24/Jul/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #225
LU-456 Force commit to reuse the just-deleted blocks

Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
Files :

  • lustre/obdfilter/filter_io_26.c
  • lustre/include/obd_support.h
  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 24/Jul/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #225
LU-456 Force commit to reuse the just-deleted blocks

Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
Files :

  • lustre/tests/sanity.sh
  • lustre/obdfilter/filter_io_26.c
  • lustre/include/obd_support.h
Comment by Build Master (Inactive) [ 24/Jul/11 ]

Integrated in lustre-master » i686,client,el5,ofa #225
LU-456 Force commit to reuse the just-deleted blocks

Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
Files :

  • lustre/tests/sanity.sh
  • lustre/obdfilter/filter_io_26.c
  • lustre/include/obd_support.h
Comment by Build Master (Inactive) [ 24/Jul/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #225
LU-456 Force commit to reuse the just-deleted blocks

Oleg Drokin : 346a17e4d8b5c291d776387ace81a5b74bc24141
Files :

  • lustre/tests/sanity.sh
  • lustre/obdfilter/filter_io_26.c
  • lustre/include/obd_support.h
Comment by Peter Jones [ 24/Jul/11 ]

Landed for 2.1

Generated at Sat Feb 10 01:07:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.