[LU-11735] o2iblnd - bad check for fmr Created: 05/Dec/18  Updated: 24/Sep/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Amir Shehata (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11152 sanity test_133g: ost1 find /proc/fs... Resolved
is related to LU-11552 improper FMR/FastReg pool cleanup Reopened
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
9b790ba0f5606c0a91563828fa43f5e4ae210425 LU-11152 lnd: test fpo_fmr_poool pointer instead of special bool

Introduced an issue.

You can't perform the following check:

if (!IS_ERR_OR_NULL(fpo->fmr.fpo_fmr_pool))

because

 340 »·······union {                                           
 341 »·······»·······struct {  
 342 »·······»·······»·······struct ib_fmr_pool *fpo_fmr_pool; /* IB FMR pool */
 343 »·······»·······} fmr;                                                          
 344 »·······»·······struct { /* For fast registration */
 345 »·······»·······»·······struct list_head  fpo_pool_list;
 346 »·······»·······»·······int»····»·······  fpo_pool_size;
 347 »·······»·······} fast_reg;                                       
 348 »·······};

It's part of a union so if we have fast reg enabled, then it's possible that the check is successful even though we don't have fmr enabled.



 Comments   
Comment by James A Simmons [ 05/Dec/18 ]

lets make it an enum and clean up the code. I also dislike the union. The union assumes you can only have IB hardware that does either FMR or FastReg on a node. With multi-rail it is very possible to have IB hardware card using FMR and another using FastReg. 

Comment by Andreas Dilger [ 06/Dec/18 ]

James, we need to either fix this or revert the original LU-11152 patch ASAP for 2.12.

Comment by James A Simmons [ 06/Dec/18 ]

Sure. The truth is that the o2iblnd pool handling is a piece of garbage and should be replaced with 

ib_mr_pool_*() functions that handle this much more gracefully.

Comment by Andreas Dilger [ 06/Dec/18 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33802
Subject: Revert "LU-11152 lnd: test fpo_fmr_poool pointer instead of special bool"
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b169980b805b78df9b05e10079d27c4a6d3dbbc1

Comment by Peter Jones [ 09/Dec/18 ]

Revert of LU-11152 landed for 2.12 RC2 so this ticket moved to 2.13 as a lower priority

Comment by Gerrit Updater [ 20/Dec/18 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33901
Subject: LU-11735 lnd: clean up fmr/FastReg condition
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9a164082c703222bd5a14469ed1624ef18ee473e

Comment by Amir Shehata (Inactive) [ 20/Dec/18 ]

I've tried to describe this area of the code here:
https://wiki.whamcloud.com/display/LNet/More+on+FMR+and+Fast+Reg
If there are suggestions on how to clarify the documentation, please add comments on the wiki page above

Comment by Cory Spitz [ 18/Dec/19 ]

FYI:
> Revert of LU-11152 landed for 2.12 RC2 so this ticket moved to 2.13 as a lower priority
Note, while code associated with LU-11152 was reverted, that code was wrongly associated with LU-11152. It is really associated with LU-11552. Simple typo.

Generated at Sat Feb 10 02:46:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.