Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.8.0
    • Lustre 2.7.0
    • 4
    • 9223372036854775807

    Description

      No sign or indication, ie lustre-log or error messages, OSS unexpectantly crash (please see console image).

      /var/log/messages is attached

      Attachments

        1. 23-6.png
          23-6.png
          47 kB
        2. log.28119.gz
          388 kB
        3. lustre-logs.tgz
          0.2 kB
        4. messages13
          271 kB
        5. panda-oss-23-6_messages
          1003 kB

        Activity

          [LU-6584] OSS hit LBUG and crash
          pjones Peter Jones added a comment -

          Fix landed for 2.8. We'll reopen if this issue still is hit on Hyperion. If there is still an issue at SDSC and it is not, as hoped, a duplicate of this issue then please open a new ticket to track that issue.

          pjones Peter Jones added a comment - Fix landed for 2.8. We'll reopen if this issue still is hit on Hyperion. If there is still an issue at SDSC and it is not, as hoped, a duplicate of this issue then please open a new ticket to track that issue.

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16685/
          Subject: LU-6584 osd: prevent int type overflow in osd_read_prep()
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: efe3842c76b8041a048457779554ffa5ba76567d

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16685/ Subject: LU-6584 osd: prevent int type overflow in osd_read_prep() Project: fs/lustre-release Branch: master Current Patch Set: Commit: efe3842c76b8041a048457779554ffa5ba76567d

          Rick, this particular issue existed in IO READ code path and doesn't related to LU-7106. I check OSD code quickly and didn't notice other similar issues at first glance.

          tappro Mikhail Pershin added a comment - Rick, this particular issue existed in IO READ code path and doesn't related to LU-7106 . I check OSD code quickly and didn't notice other similar issues at first glance.

          Yes, we're scheduling a PM and push this out. Could this patch be related to LU-7106? In other words could the current code create an error that propagates back to the client as ENOSPC even when there's capacity on the OST?

          rpwagner Rick Wagner (Inactive) added a comment - Yes, we're scheduling a PM and push this out. Could this patch be related to LU-7106 ? In other words could the current code create an error that propagates back to the client as ENOSPC even when there's capacity on the OST?
          pjones Peter Jones added a comment -

          Will SDSC be able to try this patch out to confirm whether it fixes the issues that they have been experiencing?

          pjones Peter Jones added a comment - Will SDSC be able to try this patch out to confirm whether it fixes the issues that they have been experiencing?

          It seems the reason of this issue is the int type overflow in lnb_rc. Instead of writing the (eof - file_offset) right into lnb_rc we have to check first it is not negative.

          tappro Mikhail Pershin added a comment - It seems the reason of this issue is the int type overflow in lnb_rc. Instead of writing the (eof - file_offset) right into lnb_rc we have to check first it is not negative.

          Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/16685
          Subject: LU-6584 osd: prevent int type overflow in osd_read_prep()
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 687338302147dad5b09b964b8615a3b3adb78a7d

          gerrit Gerrit Updater added a comment - Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/16685 Subject: LU-6584 osd: prevent int type overflow in osd_read_prep() Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 687338302147dad5b09b964b8615a3b3adb78a7d

          People

            tappro Mikhail Pershin
            haisong Haisong Cai (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: