Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7510

(vvp_io.c:1088:vvp_io_commit_write()) Write page 962977 of inode ffff880fbea44b78 failed -28

Details

    • Bug
    • Resolution: Done
    • Major
    • None
    • Lustre 2.5.3
    • Servers and clients: 2.5.4-11chaos-11chaos--PRISTINE-2.6.32-573.7.1.1chaos.ch5.4.x86_64
      ZFS back end
    • 3
    • 9223372036854775807

    Description

      We have some production apps and rsync processes failing writes with ENOSPC errors on the ZFS backed FS only. It is currently at ~79%. There are no server side errors, -28 errors as above appeare in the client logs.

      I see that LU-3522 and LU-2049 may have a bearing on this issue, is there a 2.5 backport or equivalent fix available?

      Attachments

        1. lu-7510-lbug.txt
          14 kB
        2. zfs.lfs-out.12.02
          10 kB
        3. zfs.tot_granted.12.02
          3 kB

        Issue Links

          Activity

            [LU-7510] (vvp_io.c:1088:vvp_io_commit_write()) Write page 962977 of inode ffff880fbea44b78 failed -28
            jfc John Fuchs-Chesney (Inactive) made changes -
            Resolution New: Done [ 10000 ]
            Status Original: Reopened [ 4 ] New: Resolved [ 5 ]

            Thanks Ruth.

            ~ jfc.

            jfc John Fuchs-Chesney (Inactive) added a comment - Thanks Ruth. ~ jfc.

            The file system usage has been reduced to ~70%, and we haven't seen -28 issues or LBUGs since then.

            You can close this one, we'll consider the fix for -28 issues to be upgrade to 2.8 lustre on the servers at some point in the future.

            If the LBUG re-occurs I'll open a new ticket.

            Thanks,
            Ruth

            ruth.klundt@gmail.com Ruth Klundt (Inactive) added a comment - The file system usage has been reduced to ~70%, and we haven't seen -28 issues or LBUGs since then. You can close this one, we'll consider the fix for -28 issues to be upgrade to 2.8 lustre on the servers at some point in the future. If the LBUG re-occurs I'll open a new ticket. Thanks, Ruth
            pjones Peter Jones made changes -
            End date New: 26/May/16
            Start date New: 01/Dec/15

            The LBUG in question hasn't been changed, though the grant code has been reworked (a la LU-2049) upstream. The negative grant resulting in LBUG should be separate bug, though it's probably 2.5 only.

            utopiabound Nathaniel Clark added a comment - The LBUG in question hasn't been changed, though the grant code has been reworked (a la LU-2049 ) upstream. The negative grant resulting in LBUG should be separate bug, though it's probably 2.5 only.

            And a specific question, is the LBUG likely addressed by changes upstream or should this be a separate ticket?

            ruth.klundt@gmail.com Ruth Klundt (Inactive) added a comment - And a specific question, is the LBUG likely addressed by changes upstream or should this be a separate ticket?

            Nearly all OSS nodes on this file system became inaccessible yesterday, 3 of them showed the LBUG at ofd_grant.c:352:ofd_grant_incoming with negative grant values. I disabled the automated grant release workaround in case it is related to this occurence. The OSTs are 77-79% full at the moment. After that another OSS went down with the same LBUG.

            This coincides with the addition of a new cluster, but we haven't done any I/O from it so far, just mounting. Any advice/thoughts?

            ruth.klundt@gmail.com Ruth Klundt (Inactive) added a comment - Nearly all OSS nodes on this file system became inaccessible yesterday, 3 of them showed the LBUG at ofd_grant.c:352:ofd_grant_incoming with negative grant values. I disabled the automated grant release workaround in case it is related to this occurence. The OSTs are 77-79% full at the moment. After that another OSS went down with the same LBUG. This coincides with the addition of a new cluster, but we haven't done any I/O from it so far, just mounting. Any advice/thoughts?

            Each of the osts have shown a couple of decreases, in the 3.8-3.9 T range.

            ruth.klundt@gmail.com Ruth Klundt (Inactive) added a comment - Each of the osts have shown a couple of decreases, in the 3.8-3.9 T range.

            after deactivating the osts on that node, the rate of increase is slower, but it still is much larger than all the others and not decreasing so far at about ~3.7T.

            ruth.klundt@gmail.com Ruth Klundt (Inactive) added a comment - after deactivating the osts on that node, the rate of increase is slower, but it still is much larger than all the others and not decreasing so far at about ~3.7T.
            ruth.klundt@gmail.com Ruth Klundt (Inactive) made changes -
            Attachment New: lu-7510-lbug.txt [ 21303 ]

            People

              utopiabound Nathaniel Clark
              ruth.klundt@gmail.com Ruth Klundt (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: