[LU-12262] Improve sbi_flags checking Created: 01/May/19  Updated: 10/Oct/21  Resolved: 10/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Patrick Farrell (Inactive) Assignee: James A Simmons
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11930 sbi_flags printing broke Resolved
is related to LU-11825 Remove LU-8964/pio feature & supporti... Resolved
is related to LU-10948 client cache open lock after N opens Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

In LU-11930, Oleg described a problem with printing sbi_flags:
'

Nowadays if we do get_param on lustre-....sbi_flags we get back:

error: get_param: reading 'llite.lustre-ffff88009a0ca800.sbi_flags': Invalid argument

and then in the kernel

[  788.311079] LustreError: 10351:0:(lproc_llite.c:967:ll_sbi_flags_seq_show()) lustre: Revise array LL_SBI_FLAGS to match sbi flags please.

The result is of course every test that checks feature flags is now not finding anything it's looking for and skips. in particular al layoutswaps (test 184*) and a bunch of other stuff is now in perma-skip mode.

This was broken in LU-11825 https://review.whamcloud.com/33912 it appears by removing a flag from LL_SBI_FLAGS which is something that should NEVER be done.

Also I think we ned to convert that error message into an assertion.

Problems like here we'd catch sooner and if some client has some flags set incorrectly we are probably ok panicking there as well as there's clearly some unexpected memory corruption or whatnot going on?"

This ticket is to capture his suggestions at the end, to improve checking so we can't accidentally remove a flag like this in the future.



 Comments   
Comment by Andreas Dilger [ 29/Oct/19 ]

It looks like this was fixed in patch https://review.whamcloud.com/34187 "LU-11930 llite: Restore pio flags".

Comment by Andreas Dilger [ 29/Oct/19 ]

Note that the actual LL_SBI_* values are not exposed outside of the lustre/llite code. The real bug was that a flag value was removed, and the corresponding line in the LL_SBI_FLAGS array was also removed, but LL_SBI_FLAGS needs to be a contiguous array in LL_SBI_* order.

Comment by James A Simmons [ 30/Oct/19 ]

I think Patrick was trying to prevent the lose of your idea:

enum ll_sbi_flags {
 LL_SBI_NOLCK, /* DLM locking disabled (O_DIRECT only) */
 :
 :
 LL_SBI_TINY_WRITE, /* tiny write support */
 LL_SBI_LAST /* sentinel at end of list */
 };

 #define LL_SBI_FLAGS { \
 [LL_SBI_NOLCK] = "nolck", \
 [LL_SBI_CHECKSUM] = "checksum", \

 
It's just no one got around to it.

Comment by Andreas Dilger [ 29/May/21 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43871
Subject: LU-12262 llite: improve ll_sbi_flags_seq_show()
Project: fs/lustre-release
Branch: master
Current Patch Set: 2
Commit: d6c0836e03f07c2c7bed3cf7173fa8a29e00ebe6

Comment by Gerrit Updater [ 10/Aug/21 ]

"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/44541
Subject: LU-12262 llite: harden ll_sbi ll_flags
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 235a51483680d88c9467cb8e868a4bf1065a802f

Comment by Gerrit Updater [ 10/Oct/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44541/
Subject: LU-12262 llite: harden ll_sbi ll_flags
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 47e6f6abdacd6a3c1fdb320b4c9c331c0f493cce

Generated at Sat Feb 10 02:51:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.