Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6045

DLC: Address a minor inconsistency between net and route adding behavior

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.7.0
    • Lustre 2.7.0
    • None
    • 3
    • 16846

    Description

      when adding an already existing net, then we fail with -EEXIST but when we add an already existing route we ignore the entry. We should also fail with -EEXIST in this case as well.

      Attachments

        Issue Links

          Activity

            [LU-6045] DLC: Address a minor inconsistency between net and route adding behavior

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13116/
            Subject: LU-6045 lnet: return appropriate errno when adding route
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 0f9f4b1b234bdc12a2604225d8c6398a355b75a4

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13116/ Subject: LU-6045 lnet: return appropriate errno when adding route Project: fs/lustre-release Branch: master Current Patch Set: Commit: 0f9f4b1b234bdc12a2604225d8c6398a355b75a4

            The way it works is that lnetctl doesn't stop it's batch process when a single operation fails.

            For example say you are provisioning N number of routes in a batch, and one of these already exists. This single operation fails, but the rest of the operations succeed. When the batch processing is complete the returned YAML block reports which operations in the batch failed and for what reason. In this case it's useful to specify a seq_no for each operation to be able to associate the YAML report with the batch request.

            The main idea here is to off load the decision whether to fail or not to fail from the kernel. The kernel always reports back the actual state of the operation requested. User space can decide what to do with this return code. I believe this is useful since the DLC model never uses the Kernel to operate on batches. It only requests one singe operation to be completed at a tiem. Batch processing always occurs in User space. In my opinion, this model is superior to getting the kernel to perform batch processing since it entails having to parse the YAML batch description in the kernel, which is not preferred.

            Another option is to parse in user space and form a batch data structure, but again this is not optimal because it limits the number of operation per batch to the max size data structure passed to the kernel.

            As I mentioned currently the DLC API processes all the batch requests and reports which ones failed without halting the batch. If it is needed we can further enhance the API to allow it to halt batch processing or continue batch processing on failure, depending on what type of flexibility we would like to expose to the user.

            As of now, I don't see a real use case to allow batch processing to be halted.

            Note: the current behavior when parsing modprobe route configuration is to ignore specific failures and continue provisioning the rest of the routes. This patch does not alter this behavior but transfers the decision to halt or not to halt (in case of DLC) to user space.

            ashehata Amir Shehata (Inactive) added a comment - The way it works is that lnetctl doesn't stop it's batch process when a single operation fails. For example say you are provisioning N number of routes in a batch, and one of these already exists. This single operation fails, but the rest of the operations succeed. When the batch processing is complete the returned YAML block reports which operations in the batch failed and for what reason. In this case it's useful to specify a seq_no for each operation to be able to associate the YAML report with the batch request. The main idea here is to off load the decision whether to fail or not to fail from the kernel. The kernel always reports back the actual state of the operation requested. User space can decide what to do with this return code. I believe this is useful since the DLC model never uses the Kernel to operate on batches. It only requests one singe operation to be completed at a tiem. Batch processing always occurs in User space. In my opinion, this model is superior to getting the kernel to perform batch processing since it entails having to parse the YAML batch description in the kernel, which is not preferred. Another option is to parse in user space and form a batch data structure, but again this is not optimal because it limits the number of operation per batch to the max size data structure passed to the kernel. As I mentioned currently the DLC API processes all the batch requests and reports which ones failed without halting the batch. If it is needed we can further enhance the API to allow it to halt batch processing or continue batch processing on failure, depending on what type of flexibility we would like to expose to the user. As of now, I don't see a real use case to allow batch processing to be halted. Note: the current behavior when parsing modprobe route configuration is to ignore specific failures and continue provisioning the rest of the routes. This patch does not alter this behavior but transfers the decision to halt or not to halt (in case of DLC) to user space.

            I have a higher level question for end-users of lnetctl, and possibly yaml configs (as it relates LU-6043) - does it make more sense to return -EEXIST when trying to configure a device already configured, or should it silently succeed if it is already configured, or possibly have a userspace-level option to decide (e.g. like mkdir -p or rm -f do)?

            adilger Andreas Dilger added a comment - I have a higher level question for end-users of lnetctl, and possibly yaml configs (as it relates LU-6043 ) - does it make more sense to return -EEXIST when trying to configure a device already configured, or should it silently succeed if it is already configured, or possibly have a userspace-level option to decide (e.g. like mkdir -p or rm -f do)?

            Amir Shehata (amir.shehata@intel.com) uploaded a new patch: http://review.whamcloud.com/13116
            Subject: LU-6045 lnet: return appropriate errno when adding route
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: f7ce83ee0e5ee34b255d13ee2d914ca6be749706

            gerrit Gerrit Updater added a comment - Amir Shehata (amir.shehata@intel.com) uploaded a new patch: http://review.whamcloud.com/13116 Subject: LU-6045 lnet: return appropriate errno when adding route Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f7ce83ee0e5ee34b255d13ee2d914ca6be749706

            People

              ashehata Amir Shehata (Inactive)
              ashehata Amir Shehata (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: