[LU-13420] append to PFL-file without 'eof' component fails Created: 07/Apr/20  Updated: 08/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Upstream, Lustre 2.12.4
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Alex Zhuravlev Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: llnl

Attachments: Text File append.log    
Issue Links:
Related
is related to LU-10782 Enable tiny write append for singly s... Open
is related to LU-9341 PFL: append should not instantiate fu... Resolved
is related to LU-10665 DoM: append to file causes OST compon... Resolved
is related to LU-17403 lfs migrate: cannot get group lock: N... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
# mkdir /mnt/lustre/d
#  lfs setstripe -E 4M -c 1 -E 64M -c 4 /mnt/lustre/d/f1; echo asd >>/mnt/lustre/d/f1
-bash: echo: write error: Invalid argument

and in the log:

00000004:80000000:1.0:1586276528.329048:0:6464:0:(lod_object.c:6811:lod_declare_update_plain()) lustre-MDT0000-mdtlov: the defined layout [0, 0x4000000) does not covers the write range [0x0, 0xffffffffffffffff)
00000004:00000001:1.0:1586276528.329049:0:6464:0:(lod_object.c:6816:lod_declare_update_plain()) Process leaving via out (rc=18446744073709551594 : -22 : 0xffffffffffffffea)


 Comments   
Comment by Andreas Dilger [ 07/Apr/20 ]

There are discussions on various options for implementing/optimizing file append in LU-9341 and LU-10782. I think something like what is proposed there (partial file lock, verify EOF hasn't changed/adjust offset after getting lock) would be needed for this to work.

Since using limited-size PFL layouts for log files is a use case that has real benefits (e.g. put an upper limit of 1GB for files in log directory) it would be good to find a way for this to work. Maybe changing the check in lod_declare_update_plain() is enough for the short term, if we understand that this is an O_APPEND file?

Comment by Olaf Faaland [ 19/Jun/20 ]

Ran into this doing testing at LLNL. (used to say: Should I create another ticket, or add topllnl to this? Thanks.)

I've just added our usual tags and will post here until told otherwise, so that I'm not holding up the process.

Our versions involved:
opal lustre-2.12.4_6.chaos-1.ch6.x86_64
jet lustre-2.12.4_6.chaos-1.ch6.x86_64

It's the lack of an EOF component in the layout that triggers the error on my system.

User operation:

bash-4.2$ lfs setstripe -E 16M -c 1 -E 1G -c 2  will_fail
bash-4.2$ date >> will_fail
date: write error: Invalid argument

bash-4.2$ lfs setstripe -E 16M -c 1 -E -1 -c 2  will_succeed
bash-4.2$ date >> will_succeed
bash-4.2$ echo $?
0

And error on console of the lustre client node:

[Thu Jun 18 17:46:06 2020] LustreError: 11-0: lquake-MDT0000-mdc-ffff8a7c68307000: operation ldlm_enqueue to node 172.19.1.111@o2ib100 failed: rc = -22

And on the MDT we see

dk.jet1.1592591626:00000004:80000000:2.0:1592591621.254440:0:15022:0:(lod_object.c:6049:lod_declare_update_plain()) lquake-MDT0000-mdtlov: the defined layout [0, 0x1000000000) does not covers the write range [0x0, 0xffffffffffffffff)

Although that debug log record is from a different iteration of executing the reproducer than the client console log error message.

Comment by Olaf Faaland [ 19/Jun/20 ]

I'll note our use case here for now; if you want to discuss it elsewhere let me know.

We want to use PFL specifically so we can set a default that results in reasonable striping for all files, without user knowledge or action. That means that a routine operation like appending has to work with PFL, for it to be usable by us. I wouldn't think we are special in that way, but I can't explain why this hasn't come up already.

When I wrote the above I thought this error occurred for any append to any PFL file, and didn't realize it was the lack of an EOF component that triggered the failure.

Is there a reason a user wouldn't want the EOF component? I can't think of one.

Comment by Andreas Dilger [ 20/Jun/20 ]

Olaf, you are correct that this issue only affects PFL files that do not have an EOF component. For most cases it makes sense that the default layout would have an EOF component.

At one time it was proposed that it would be possible to limit the maximum size of files by using a PFL layout where the last component ended at the maximum file size. That might be useful to avoid runaway processes filling the filesystem or an OST. Ironically, this would probably be most useful for log files, which are often written with O_APPEND.

So this is a real issue, but one that is only going to catch people actively doing something's strange, and is easily worked around by adding a final component to the file.

Comment by Olaf Faaland [ 20/Jun/20 ]

At one time it was proposed that it would be possible to limit the maximum size of files by using a PFL layout where the last component ended at the maximum file size. That might be useful to avoid runaway processes filling the filesystem or an OST. Ironically, this would probably be most useful for log files, which are often written with O_APPEND.

Andreas, thanks for explaining. I've seen exactly that here (unintended log file growth without bound). A limit like that would be useful.

So this is a real issue, but one that is only going to catch people actively doing something's strange, and is easily worked around by adding a final component to the file.

In our case it happened to someone by mistake - a sysadmin omitted the eof component accidentally. That's easy to do. Until the issue is fixed, shouldn't there be a warning at the time "lfs setstripe" is run? I'm thinking mostly of Lustre 2.12. We could also/instead produce a one-time message on the console log, but that might be noticed only after jobs have failed.

Comment by Cameron Harr [ 22/Jun/20 ]

I concur that there should be a warning or a default EOF pattern set when "-E -1 ..." isn't specified. It seems too easy to run into this issue, whether by accident or ignorance.

Comment by Andreas Dilger [ 04/Jul/20 ]

I would be OK with a warning printed by "lfs setstripe in the case of a missing eof component, at least until we fix the "O_APPEND to short layout" issue. I wouldn't want to prevent that usage, since it still has potential value in some cases even without O_APPEND (e.g. limiting maximum file size). It would be even better to fix the O_APPEND case so that it works as expected (allowing appends up to the end of the last extent before reporting EFBIG like it would if the write exceeded the maximum allowed file size.

Generated at Sat Feb 10 03:01:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.