Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.8.0
    • Lustre 2.7.0
    • 4
    • 9223372036854775807

    Description

      No sign or indication, ie lustre-log or error messages, OSS unexpectantly crash (please see console image).

      /var/log/messages is attached

      Attachments

        1. 23-6.png
          23-6.png
          47 kB
        2. log.28119.gz
          388 kB
        3. lustre-logs.tgz
          0.2 kB
        4. messages13
          271 kB
        5. panda-oss-23-6_messages
          1003 kB

        Activity

          [LU-6584] OSS hit LBUG and crash
          pjones Peter Jones added a comment -

          Will SDSC be able to try this patch out to confirm whether it fixes the issues that they have been experiencing?

          pjones Peter Jones added a comment - Will SDSC be able to try this patch out to confirm whether it fixes the issues that they have been experiencing?

          It seems the reason of this issue is the int type overflow in lnb_rc. Instead of writing the (eof - file_offset) right into lnb_rc we have to check first it is not negative.

          tappro Mikhail Pershin added a comment - It seems the reason of this issue is the int type overflow in lnb_rc. Instead of writing the (eof - file_offset) right into lnb_rc we have to check first it is not negative.

          Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/16685
          Subject: LU-6584 osd: prevent int type overflow in osd_read_prep()
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 687338302147dad5b09b964b8615a3b3adb78a7d

          gerrit Gerrit Updater added a comment - Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/16685 Subject: LU-6584 osd: prevent int type overflow in osd_read_prep() Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 687338302147dad5b09b964b8615a3b3adb78a7d

          Hi Andreas, since our last update to the code tree based on http://review.whamcloud.com/#/c/14926/ we've been stable. It's possible that we've pulled in a bugfix along with the debugging patch although I couldn't point to a specific one.

          We are looking at ZFS 0.6.5 to get away from the unreleased version for ZFS we've had to run. I would probably do that along with another rebase to a later unpatched tag of Lustre, maybe once LU-4865 is included.

          On a related note, I think this issue could be removed from the 2.8 blocker list, since we started with patched versions of Lustre and ZFS.

          rpwagner Rick Wagner (Inactive) added a comment - Hi Andreas, since our last update to the code tree based on http://review.whamcloud.com/#/c/14926/ we've been stable. It's possible that we've pulled in a bugfix along with the debugging patch although I couldn't point to a specific one. We are looking at ZFS 0.6.5 to get away from the unreleased version for ZFS we've had to run. I would probably do that along with another rebase to a later unpatched tag of Lustre, maybe once LU-4865 is included. On a related note, I think this issue could be removed from the 2.8 blocker list, since we started with patched versions of Lustre and ZFS.

          Hi Rick, any news on this front? Have you looked into upgrading to ZFS 0.6.5 to get the native large block support? The patch http://review.whamcloud.com/15127 "LU-4865 zfs: grow block size by write pattern" should also help performance when dealing with files under 1MB in size.

          adilger Andreas Dilger added a comment - Hi Rick, any news on this front? Have you looked into upgrading to ZFS 0.6.5 to get the native large block support? The patch http://review.whamcloud.com/15127 " LU-4865 zfs: grow block size by write pattern" should also help performance when dealing with files under 1MB in size.

          We've scheduled a maintenance window for Sep. 8 to roll out this latest patch after testing.

          Andreas, I'll consider changing the recordsize on some of the OSTs. The most likely scenario where we get solid information from this is if the LBUG is still hit on one of the OSSes with the changed setting. I am being a little cautious considering this since it will mean having a ZFS dataset with varying recordsizes. I don't believe the ZFS layer will care, but it's not something I've dealt with before.

          rpwagner Rick Wagner (Inactive) added a comment - We've scheduled a maintenance window for Sep. 8 to roll out this latest patch after testing. Andreas, I'll consider changing the recordsize on some of the OSTs. The most likely scenario where we get solid information from this is if the LBUG is still hit on one of the OSSes with the changed setting. I am being a little cautious considering this since it will mean having a ZFS dataset with varying recordsizes. I don't believe the ZFS layer will care, but it's not something I've dealt with before.
          bobijam Zhenyu Xu added a comment -

          http://review.whamcloud.com/#/c/14926/ has been updated to add more remote/local buffer check.

          bobijam Zhenyu Xu added a comment - http://review.whamcloud.com/#/c/14926/ has been updated to add more remote/local buffer check.

          Rick, the other possible avenue for debugging is to disable the 1MB blocksize tunable on one or more of your OST datasets, and see if this correlates to a reduction or elimination of the occurrence of this failure. This is one of the main deltas between your ZFS environment and other ZFS users, so this would allow us to isolate the memory corruption to the code handling 1MB blocksize.

          adilger Andreas Dilger added a comment - Rick, the other possible avenue for debugging is to disable the 1MB blocksize tunable on one or more of your OST datasets, and see if this correlates to a reduction or elimination of the occurrence of this failure. This is one of the main deltas between your ZFS environment and other ZFS users, so this would allow us to isolate the memory corruption to the code handling 1MB blocksize.

          Bobijam,
          can you please make a new patch that checks the contents of niobuf_remote when it is first accessed by the OST (tgt_brw_read() and tgt_brw_write()) to verify that the contents are sane, and print out all the values under D_BUFFS debugging. If the values are incorrect a CERROR() should be printed and an -EPROTO error returned to the client, and we can debug this problem as a network corruption.

          This niobuf verification should be in a helper function that can also be called before the currently-failing LASSERT() checks are being handled (and elsewhere in the code if you think it is helpful), and those functions can return an EIO error to the caller rather than triggering the LASSERT. At that point the client should resend the BRW RPC due to brw_interpret()>osc_recoverable_error() and hopefully it will succeed on the second try.

          While I don't think this is a proper solution, it will at least tell us if the corruption is happening on the client and/or on the network, or in memory on the OSS, and it will potentially allow debugging to continue without the high frequency of OSS failures.

          adilger Andreas Dilger added a comment - Bobijam, can you please make a new patch that checks the contents of niobuf_remote when it is first accessed by the OST ( tgt_brw_read() and tgt_brw_write() ) to verify that the contents are sane, and print out all the values under D_BUFFS debugging. If the values are incorrect a CERROR() should be printed and an -EPROTO error returned to the client, and we can debug this problem as a network corruption. This niobuf verification should be in a helper function that can also be called before the currently-failing LASSERT() checks are being handled (and elsewhere in the code if you think it is helpful), and those functions can return an EIO error to the caller rather than triggering the LASSERT. At that point the client should resend the BRW RPC due to brw_interpret() >osc_recoverable_error() and hopefully it will succeed on the second try. While I don't think this is a proper solution, it will at least tell us if the corruption is happening on the client and/or on the network, or in memory on the OSS, and it will potentially allow debugging to continue without the high frequency of OSS failures.

          Andreas, yes, we're using 1MB block sizes on the ZFS datasets that handle the OSTs.

          rpwagner Rick Wagner (Inactive) added a comment - Andreas, yes, we're using 1MB block sizes on the ZFS datasets that handle the OSTs.

          We're using the f1512ee61e commit from master ZFS branch (large block support). It's later than 0.6.4.1, and I had problems running with latest master version.

          dimm Dmitry Mishin (Inactive) added a comment - We're using the f1512ee61e commit from master ZFS branch (large block support). It's later than 0.6.4.1, and I had problems running with latest master version.

          People

            tappro Mikhail Pershin
            haisong Haisong Cai (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: