Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5297

osp_sync_thread can't handle invalid record gracefully

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.8.0
    • Lustre 2.6.0
    • None
    • 3
    • 14781

    Description

      osp_sync_process_queues() now assumes all records to be processed are not corrupted, it's lack of error handling code for invalid records.

      One solution could be regarding the invalid record as committed and skipping processing on it.

      Attachments

        Issue Links

          Activity

            [LU-5297] osp_sync_thread can't handle invalid record gracefully
            pjones Peter Jones added a comment -

            Landed for 2.8

            pjones Peter Jones added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14925/
            Subject: LU-5297 osp: process unsuccessful osp sync records properly
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 591f8771df00e1c3279019281e5f7d2e7c7e4877

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14925/ Subject: LU-5297 osp: process unsuccessful osp sync records properly Project: fs/lustre-release Branch: master Current Patch Set: Commit: 591f8771df00e1c3279019281e5f7d2e7c7e4877

            Emoly Liu (emoly.liu@intel.com) uploaded a new patch: http://review.whamcloud.com/14925
            Subject: LU-5297 osp: decrease opd_syn_changes for an invalid record
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 25ceff8e1de80f98e701f1ed5630a9b9308c2ddf

            gerrit Gerrit Updater added a comment - Emoly Liu (emoly.liu@intel.com) uploaded a new patch: http://review.whamcloud.com/14925 Subject: LU-5297 osp: decrease opd_syn_changes for an invalid record Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 25ceff8e1de80f98e701f1ed5630a9b9308c2ddf

            I don't have plan to make patch yet, probably we'd move it to 2.7.

            niu Niu Yawei (Inactive) added a comment - I don't have plan to make patch yet, probably we'd move it to 2.7.

            Niu, were you planning to make a patch for this, or should this bug be moved to 2.7.0?

            adilger Andreas Dilger added a comment - Niu, were you planning to make a patch for this, or should this bug be moved to 2.7.0?

            Niu, could you explain in what part the patch is not complete? what specific cases it doesn't handle?

            That patch decreases opd_syn_rpc_in_flight & opd_syn_rpc_in_progress when skipping a invalid record, that's not enough, because opd_syn_changes isn't decreased and the sync thread will break unexpectedly, and invalid record isn't deleted at the end, I'm afraid it can cause further trouble.

            niu Niu Yawei (Inactive) added a comment - Niu, could you explain in what part the patch is not complete? what specific cases it doesn't handle? That patch decreases opd_syn_rpc_in_flight & opd_syn_rpc_in_progress when skipping a invalid record, that's not enough, because opd_syn_changes isn't decreased and the sync thread will break unexpectedly, and invalid record isn't deleted at the end, I'm afraid it can cause further trouble.

            Niu, could you explain in what part the patch is not complete? what specific cases it doesn't handle?

            bzzz Alex Zhuravlev added a comment - Niu, could you explain in what part the patch is not complete? what specific cases it doesn't handle?

            Is this related to LU-5188 "osp: return 1 if osp_sync_xxx_job issue RPC"? What is the severity of this problem (i.e. what impact does it have on normal operation)? Will it cause OST objects to be leaked if they are unlinked when the OST is offline, and still offline when the MDS processes the llog records?

            That patch was trying to handle error for osp thread, but looks it's not complete. I think the severity isn't high, because there isn't any corrupted record in normal usage.

            niu Niu Yawei (Inactive) added a comment - Is this related to LU-5188 "osp: return 1 if osp_sync_xxx_job issue RPC"? What is the severity of this problem (i.e. what impact does it have on normal operation)? Will it cause OST objects to be leaked if they are unlinked when the OST is offline, and still offline when the MDS processes the llog records? That patch was trying to handle error for osp thread, but looks it's not complete. I think the severity isn't high, because there isn't any corrupted record in normal usage.

            Is this related to LU-5188 "osp: return 1 if osp_sync_xxx_job issue RPC"? What is the severity of this problem (i.e. what impact does it have on normal operation)? Will it cause OST objects to be leaked if they are unlinked when the OST is offline, and still offline when the MDS processes the llog records?

            adilger Andreas Dilger added a comment - Is this related to LU-5188 "osp: return 1 if osp_sync_xxx_job issue RPC" ? What is the severity of this problem (i.e. what impact does it have on normal operation)? Will it cause OST objects to be leaked if they are unlinked when the OST is offline, and still offline when the MDS processes the llog records?

            People

              emoly.liu Emoly Liu
              niu Niu Yawei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: