Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
None
-
None
-
3
-
4946
Description
I am seeing a strange behavior with an home-made test doing write_until_ENOSPC/truncate/write_until_ENOSPC. The test is very simple: for i in `seq 2`; do dd ... done. The dd processes should run until they get ENOSPC. While looking at the debug logs, i have actually found that ext4 might be pulling my legs.
>From the Logs of the 2nd dd which basically does truncate + writes:
00000080:00200000:2.0:1307742632.935829:0:23441:0:(rw.c:82:ll_truncate()) VFS
Op:inode=144115205255725059/33554436(ffff8103213c5710) to 0
=> dd truncates the file
00000100:00100000:2.0:1307742632.935994:0:23114:0:(service.c:1705:ptlrpc_server_handle_reque
st()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc
ll_ost_io_126:30dc3ffd-9567-deb3-c9ed-423e4589e173+6:21865:x1371266995979156:12345-0@lo:10
00000100:00100000:2.0:1307742632.950439:0:23114:0:(service.c:1752:ptlrpc_server_handle_reque
st()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc
ll_ost_io_126:30dc3ffd-9567-deb3-c9ed-423e4589e173+6:21865:x1371266995979156:12345-0@lo:10
Request procesed in 14451us (14480us total) trans 4294967465 rc 0/0
=> truncate request handled fairly quickly
00000100:00100000:2.0:1307742632.950925:0:23114:0:(service.c:1705:ptlrpc_server_handle_reque
st()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc
ll_ost_io_126:30dc3ffd-9567-deb3-c9ed-423e4589e173+7:21863:x1371266995979157:12345-0@lo:4
=> first write received (4KB write since the client has no grant)
00002000:00000020:2.0:1307742632.950941:0:23114:0:(ofd_grant.c:177:ofd_grant_statfs())
lustre-OST0000: cli 30dc3ffd-9567-deb3-c9ed-423e4589e173/ffff810304240400 free: 10375168
avail: 4096
=> The grant code gets statfs data from cache, only 4KB remain, makes sense since the OST
was full.
00002000:00000020:2.0:1307742632.950945:0:23114:0:(ofd_grant.c:741:ofd_grant_prepare_write()
) lustre-OST0000: fs has no space left and statfs too old
=> The grant code decides to refresh statfs info
00002000:00000020:2.0:1307742632.950951:0:23114:0:(ofd_grant.c:177:ofd_grant_statfs())
lustre-OST0000: cli 30dc3ffd-9567-deb3-c9ed-423e4589e173/ffff810304240400 free: 174878720
avail: 164167680
=> The refreshed statfs data have the space back, wonderful!
00002000:00000020:2.0:1307742632.950958:0:23114:0:(ofd_grant.c:457:ofd_grant_check())
lustre-OST0000: cli 30dc3ffd-9567-deb3-c9ed-423e4589e173/ffff810304240400 granted: 0
ungranted: 4096 grant: 0 dirty: 0
=> We have enough free space to handle this 4KB write, let's it go.
00002000:00000020:2.0:1307742632.950961:0:23114:0:(ofd_grant.c:544:ofd_grant())
lustre-OST0000: cli 30dc3ffd-9567-deb3-c9ed-423e4589e173/ffff810304240400 wants: 33554432
current grant 0 granting: 2097152
=> Given that we have plenty of free space, we even grant more space (2MB) back to the
client.
00000100:00100000:2.0:1307742632.951196:0:23114:0:(service.c:1752:ptlrpc_server_handle_reque
st()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc
ll_ost_io_126:30dc3ffd-9567-deb3-c9ed-423e4589e173+7:21863:x1371266995979157:12345-0@lo:4
Request procesed in 275us (297us total) trans 4294967466 rc 0/0
=> Write succeeded.
11 similar writes requests succeeded, and then on the 12th:
00002000:00000020:2.0:1307742633.023255:0:23114:0:(ofd_grant.c:177:ofd_grant_statfs())
lustre-OST0000: cli 30dc3ffd-9567-deb3-c9ed-423e4589e173/ffff810304240400 free: 174878720
avail: 154730496
00002000:00000020:2.0:1307742633.023259:0:23114:0:(ofd_grant.c:234:ofd_grant_space_left())
lustre-OST0000: cli 30dc3ffd-9567-deb3-c9ed-423e4589e173/ffff810304240400 avail 154730496
left 143192064 unstable 0 tot_grant 11538432 pending 0
=> Still plenty of space remain, we only wrote 12MB after all ...
00002000:00000020:2.0:1307742633.023269:0:23114:0:(ofd_grant.c:457:ofd_grant_check())
lustre-OST0000: cli 30dc3ffd-9567-deb3-c9ed-423e4589e173/ffff810304240400 granted: 1048576
ungranted: 0 grant: 10489856 dirty: 1114112
=> The client has grants now, we allow the write.
00002000:00000020:2.0:1307742633.023273:0:23114:0:(ofd_grant.c:544:ofd_grant())
lustre-OST0000: cli 30dc3ffd-9567-deb3-c9ed-423e4589e173/ffff810304240400 wants: 33554432
current grant 10424320 granting: 2097152
=> We again grant more space back.
00000100:00100000:2.0:1307742633.025038:0:23114:0:(service.c:1752:ptlrpc_server_handle_reque
st()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc
ll_ost_io_126:30dc3ffd-9567-deb3-c9ed-423e4589e173+17:21863:x1371266995979167:12345-0@lo:4
Request procesed in 1925us (2000us total) trans 0 rc -28/-28
=> but the write failed during commit
The write actually failed on in fsfilt_map_nblocks():
00002000:00001000:1.0:1307747608.661208:0:31786:0:(fsfilt-ldiskfs.c:1135:fsfilt_map_nblocks(
)) blocks 2304-2559 requested for inode 18
00000001:00000001:1.0:1307747608.661287:0:31786:0:(osd_io.c:665:osd_write_commit()) Process
leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4)
Although statfs returned that the space released by truncate was freed, we still cannot allocate it. No need to say that this breaks grants.
All those logs are from the Orion branch, but i can reproduce the same problem with master.
Attachments
Issue Links
- Trackbacks
-
Changelog 1.8 {}version 1.8.7wc1{} {}Support for networks: socklnd \any kernel supported by Lustre, qswlnd Qsnet kernel modules 5.20 and later, openiblnd IbGold 1.8.2, o2iblnd OFED 1.3, 1.4.1, 1.4.2, 1.5.1, 1.5.2, 1.5.3.1 and 1.5.3.2 gmlnd GM 2.1....