Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13420

append to PFL-file without 'eof' component fails

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Upstream, Lustre 2.12.4
    • 3
    • 9223372036854775807

    Description

      # mkdir /mnt/lustre/d
      #  lfs setstripe -E 4M -c 1 -E 64M -c 4 /mnt/lustre/d/f1; echo asd >>/mnt/lustre/d/f1
      -bash: echo: write error: Invalid argument
      

      and in the log:

      00000004:80000000:1.0:1586276528.329048:0:6464:0:(lod_object.c:6811:lod_declare_update_plain()) lustre-MDT0000-mdtlov: the defined layout [0, 0x4000000) does not covers the write range [0x0, 0xffffffffffffffff)
      00000004:00000001:1.0:1586276528.329049:0:6464:0:(lod_object.c:6816:lod_declare_update_plain()) Process leaving via out (rc=18446744073709551594 : -22 : 0xffffffffffffffea)
      

      Attachments

        1. append.log
          2.58 MB
          Alex Zhuravlev

        Issue Links

          Activity

            [LU-13420] append to PFL-file without 'eof' component fails

            I would be OK with a warning printed by "lfs setstripe in the case of a missing eof component, at least until we fix the "O_APPEND to short layout" issue. I wouldn't want to prevent that usage, since it still has potential value in some cases even without O_APPEND (e.g. limiting maximum file size). It would be even better to fix the O_APPEND case so that it works as expected (allowing appends up to the end of the last extent before reporting EFBIG like it would if the write exceeded the maximum allowed file size.

            adilger Andreas Dilger added a comment - I would be OK with a warning printed by " lfs setstripe in the case of a missing eof component, at least until we fix the " O_APPEND to short layout" issue. I wouldn't want to prevent that usage, since it still has potential value in some cases even without O_APPEND (e.g. limiting maximum file size). It would be even better to fix the O_APPEND case so that it works as expected (allowing appends up to the end of the last extent before reporting EFBIG like it would if the write exceeded the maximum allowed file size.
            charr Cameron Harr added a comment -

            I concur that there should be a warning or a default EOF pattern set when "-E -1 ..." isn't specified. It seems too easy to run into this issue, whether by accident or ignorance.

            charr Cameron Harr added a comment - I concur that there should be a warning or a default EOF pattern set when "-E -1 ..." isn't specified. It seems too easy to run into this issue, whether by accident or ignorance.
            ofaaland Olaf Faaland added a comment -

            At one time it was proposed that it would be possible to limit the maximum size of files by using a PFL layout where the last component ended at the maximum file size. That might be useful to avoid runaway processes filling the filesystem or an OST. Ironically, this would probably be most useful for log files, which are often written with O_APPEND.

            Andreas, thanks for explaining. I've seen exactly that here (unintended log file growth without bound). A limit like that would be useful.

            So this is a real issue, but one that is only going to catch people actively doing something's strange, and is easily worked around by adding a final component to the file.

            In our case it happened to someone by mistake - a sysadmin omitted the eof component accidentally. That's easy to do. Until the issue is fixed, shouldn't there be a warning at the time "lfs setstripe" is run? I'm thinking mostly of Lustre 2.12. We could also/instead produce a one-time message on the console log, but that might be noticed only after jobs have failed.

            ofaaland Olaf Faaland added a comment - At one time it was proposed that it would be possible to limit the maximum size of files by using a PFL layout where the last component ended at the maximum file size. That might be useful to avoid runaway processes filling the filesystem or an OST. Ironically, this would probably be most useful for log files, which are often written with O_APPEND. Andreas, thanks for explaining. I've seen exactly that here (unintended log file growth without bound). A limit like that would be useful. So this is a real issue, but one that is only going to catch people actively doing something's strange, and is easily worked around by adding a final component to the file. In our case it happened to someone by mistake - a sysadmin omitted the eof component accidentally. That's easy to do. Until the issue is fixed, shouldn't there be a warning at the time "lfs setstripe" is run? I'm thinking mostly of Lustre 2.12. We could also/instead produce a one-time message on the console log, but that might be noticed only after jobs have failed.

            Olaf, you are correct that this issue only affects PFL files that do not have an EOF component. For most cases it makes sense that the default layout would have an EOF component.

            At one time it was proposed that it would be possible to limit the maximum size of files by using a PFL layout where the last component ended at the maximum file size. That might be useful to avoid runaway processes filling the filesystem or an OST. Ironically, this would probably be most useful for log files, which are often written with O_APPEND.

            So this is a real issue, but one that is only going to catch people actively doing something's strange, and is easily worked around by adding a final component to the file.

            adilger Andreas Dilger added a comment - Olaf, you are correct that this issue only affects PFL files that do not have an EOF component. For most cases it makes sense that the default layout would have an EOF component. At one time it was proposed that it would be possible to limit the maximum size of files by using a PFL layout where the last component ended at the maximum file size. That might be useful to avoid runaway processes filling the filesystem or an OST. Ironically, this would probably be most useful for log files, which are often written with O_APPEND. So this is a real issue, but one that is only going to catch people actively doing something's strange, and is easily worked around by adding a final component to the file.
            ofaaland Olaf Faaland added a comment - - edited

            I'll note our use case here for now; if you want to discuss it elsewhere let me know.

            We want to use PFL specifically so we can set a default that results in reasonable striping for all files, without user knowledge or action. That means that a routine operation like appending has to work with PFL, for it to be usable by us. I wouldn't think we are special in that way, but I can't explain why this hasn't come up already.

            When I wrote the above I thought this error occurred for any append to any PFL file, and didn't realize it was the lack of an EOF component that triggered the failure.

            Is there a reason a user wouldn't want the EOF component? I can't think of one.

            ofaaland Olaf Faaland added a comment - - edited I'll note our use case here for now; if you want to discuss it elsewhere let me know. We want to use PFL specifically so we can set a default that results in reasonable striping for all files, without user knowledge or action. That means that a routine operation like appending has to work with PFL, for it to be usable by us. I wouldn't think we are special in that way, but I can't explain why this hasn't come up already. When I wrote the above I thought this error occurred for any append to any PFL file, and didn't realize it was the lack of an EOF component that triggered the failure. Is there a reason a user wouldn't want the EOF component? I can't think of one.
            ofaaland Olaf Faaland added a comment - - edited

            Ran into this doing testing at LLNL. (used to say: Should I create another ticket, or add topllnl to this? Thanks.)

            I've just added our usual tags and will post here until told otherwise, so that I'm not holding up the process.

            Our versions involved:
            opal lustre-2.12.4_6.chaos-1.ch6.x86_64
            jet lustre-2.12.4_6.chaos-1.ch6.x86_64

            It's the lack of an EOF component in the layout that triggers the error on my system.

            User operation:

            bash-4.2$ lfs setstripe -E 16M -c 1 -E 1G -c 2  will_fail
            bash-4.2$ date >> will_fail
            date: write error: Invalid argument
            
            bash-4.2$ lfs setstripe -E 16M -c 1 -E -1 -c 2  will_succeed
            bash-4.2$ date >> will_succeed
            bash-4.2$ echo $?
            0
            

            And error on console of the lustre client node:

            [Thu Jun 18 17:46:06 2020] LustreError: 11-0: lquake-MDT0000-mdc-ffff8a7c68307000: operation ldlm_enqueue to node 172.19.1.111@o2ib100 failed: rc = -22
            

            And on the MDT we see

            dk.jet1.1592591626:00000004:80000000:2.0:1592591621.254440:0:15022:0:(lod_object.c:6049:lod_declare_update_plain()) lquake-MDT0000-mdtlov: the defined layout [0, 0x1000000000) does not covers the write range [0x0, 0xffffffffffffffff)
            

            Although that debug log record is from a different iteration of executing the reproducer than the client console log error message.

            ofaaland Olaf Faaland added a comment - - edited Ran into this doing testing at LLNL. (used to say: Should I create another ticket, or add topllnl to this? Thanks.) I've just added our usual tags and will post here until told otherwise, so that I'm not holding up the process. Our versions involved: opal lustre-2.12.4_6.chaos-1.ch6.x86_64 jet lustre-2.12.4_6.chaos-1.ch6.x86_64 It's the lack of an EOF component in the layout that triggers the error on my system. User operation: bash-4.2$ lfs setstripe -E 16M -c 1 -E 1G -c 2 will_fail bash-4.2$ date >> will_fail date: write error: Invalid argument bash-4.2$ lfs setstripe -E 16M -c 1 -E -1 -c 2 will_succeed bash-4.2$ date >> will_succeed bash-4.2$ echo $? 0 And error on console of the lustre client node: [Thu Jun 18 17:46:06 2020] LustreError: 11-0: lquake-MDT0000-mdc-ffff8a7c68307000: operation ldlm_enqueue to node 172.19.1.111@o2ib100 failed: rc = -22 And on the MDT we see dk.jet1.1592591626:00000004:80000000:2.0:1592591621.254440:0:15022:0:(lod_object.c:6049:lod_declare_update_plain()) lquake-MDT0000-mdtlov: the defined layout [0, 0x1000000000) does not covers the write range [0x0, 0xffffffffffffffff) Although that debug log record is from a different iteration of executing the reproducer than the client console log error message.

            There are discussions on various options for implementing/optimizing file append in LU-9341 and LU-10782. I think something like what is proposed there (partial file lock, verify EOF hasn't changed/adjust offset after getting lock) would be needed for this to work.

            Since using limited-size PFL layouts for log files is a use case that has real benefits (e.g. put an upper limit of 1GB for files in log directory) it would be good to find a way for this to work. Maybe changing the check in lod_declare_update_plain() is enough for the short term, if we understand that this is an O_APPEND file?

            adilger Andreas Dilger added a comment - There are discussions on various options for implementing/optimizing file append in LU-9341 and LU-10782 . I think something like what is proposed there (partial file lock, verify EOF hasn't changed/adjust offset after getting lock) would be needed for this to work. Since using limited-size PFL layouts for log files is a use case that has real benefits (e.g. put an upper limit of 1GB for files in log directory) it would be good to find a way for this to work. Maybe changing the check in lod_declare_update_plain() is enough for the short term, if we understand that this is an O_APPEND file?

            People

              wc-triage WC Triage
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: