[LU-6045] DLC: Address a minor inconsistency between net and route adding behavior Created: 18/Dec/14  Updated: 12/Jan/15  Resolved: 12/Jan/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Major
Reporter: Amir Shehata (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-6043 DLC: lnetctl import does not configur... Resolved
Severity: 3
Rank (Obsolete): 16846

 Description   

when adding an already existing net, then we fail with -EEXIST but when we add an already existing route we ignore the entry. We should also fail with -EEXIST in this case as well.



 Comments   
Comment by Gerrit Updater [ 18/Dec/14 ]

Amir Shehata (amir.shehata@intel.com) uploaded a new patch: http://review.whamcloud.com/13116
Subject: LU-6045 lnet: return appropriate errno when adding route
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f7ce83ee0e5ee34b255d13ee2d914ca6be749706

Comment by Andreas Dilger [ 18/Dec/14 ]

I have a higher level question for end-users of lnetctl, and possibly yaml configs (as it relates LU-6043) - does it make more sense to return -EEXIST when trying to configure a device already configured, or should it silently succeed if it is already configured, or possibly have a userspace-level option to decide (e.g. like mkdir -p or rm -f do)?

Comment by Amir Shehata (Inactive) [ 02/Jan/15 ]

The way it works is that lnetctl doesn't stop it's batch process when a single operation fails.

For example say you are provisioning N number of routes in a batch, and one of these already exists. This single operation fails, but the rest of the operations succeed. When the batch processing is complete the returned YAML block reports which operations in the batch failed and for what reason. In this case it's useful to specify a seq_no for each operation to be able to associate the YAML report with the batch request.

The main idea here is to off load the decision whether to fail or not to fail from the kernel. The kernel always reports back the actual state of the operation requested. User space can decide what to do with this return code. I believe this is useful since the DLC model never uses the Kernel to operate on batches. It only requests one singe operation to be completed at a tiem. Batch processing always occurs in User space. In my opinion, this model is superior to getting the kernel to perform batch processing since it entails having to parse the YAML batch description in the kernel, which is not preferred.

Another option is to parse in user space and form a batch data structure, but again this is not optimal because it limits the number of operation per batch to the max size data structure passed to the kernel.

As I mentioned currently the DLC API processes all the batch requests and reports which ones failed without halting the batch. If it is needed we can further enhance the API to allow it to halt batch processing or continue batch processing on failure, depending on what type of flexibility we would like to expose to the user.

As of now, I don't see a real use case to allow batch processing to be halted.

Note: the current behavior when parsing modprobe route configuration is to ignore specific failures and continue provisioning the rest of the routes. This patch does not alter this behavior but transfers the decision to halt or not to halt (in case of DLC) to user space.

Comment by Gerrit Updater [ 07/Jan/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13116/
Subject: LU-6045 lnet: return appropriate errno when adding route
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0f9f4b1b234bdc12a2604225d8c6398a355b75a4

Generated at Sat Feb 10 01:56:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.