[LU-13136] (layout.c:2121:__req_capsule_get()) @@@ Wrong buffer for field 'niobuf_inline' (7 of 7) in format 'LDLM_INTENT_OPEN', 0 vs. 0 (server) Created: 14/Jan/20  Updated: 19/Apr/20  Resolved: 28/Jan/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: Lustre 2.14.0, Lustre 2.12.5

Type: Bug Priority: Minor
Reporter: Stephane Thiell Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: None
Environment:

CentOS 7.6, Lustre client 2.13.0 from WC


Issue Links:
Related
is related to LU-10181 DoM performance optimization Resolved
is related to LU-13143 detect console spew during (interop) ... Open
is related to LU-13438 Rhel8.1 / lustre-client 2.12.4-1 Open
is related to LU-13382 Wrong buffer for field 'niobuf_inline... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Since we upgraded Sherlock compute nodes from Lustre client 2.12 LTS to Lustre client 2.13.0, we're seeing the following error messages on the 2.13 clients:

Jan 10 17:45:03 sh-109-11.int kernel: LustreError: 363749:0:(layout.c:2121:__req_capsule_get()) @@@ Wrong buffer for field 'niobuf_inline' (7 of 7) in format 'LDLM_INTENT_OPEN', 0 vs. 0 (server)  req@ffff8a4bdc2de300 x1652789462777344/t108684297018(108684297018) o101->oak-MDT0001-mdc-ffff8a63bc27e000@10.0.2.52@o2ib5:12/10 lens 592/648 e 0 to 0 dl 1578707147 ref 3 fl Complete:RPQU/4/0 rc 0/0 job:''
Jan 10 18:08:42 sh-109-11.int kernel: LustreError: 404571:0:(layout.c:2121:__req_capsule_get()) @@@ Wrong buffer for field 'niobuf_inline' (7 of 7) in format 'LDLM_INTENT_OPEN', 0 vs. 0 (server)  req@ffff8a4bdd684800 x1652789475552640/t764228865102(764228865102) o101->oak-MDT0000-mdc-ffff8a63bc27e000@10.0.2.52@o2ib5:12/10 lens 584/672 e 0 to 0 dl 1578708566 ref 3 fl Complete:RPQU/4/0 rc 0/0 job:''
Jan 10 18:31:37 sh-109-11.int kernel: LustreError: 406707:0:(layout.c:2121:__req_capsule_get()) @@@ Wrong buffer for field 'niobuf_inline' (7 of 7) in format 'LDLM_INTENT_OPEN', 0 vs. 0 (server)  req@ffff8a4fbb5d2400 x1652789596142720/t764553338818(764553338818) o101->oak-MDT0000-mdc-ffff8a63bc27e000@10.0.2.52@o2ib5:12/10 lens 584/672 e 0 to 0 dl 1578709940 ref 3 fl Complete:RPQU/4/0 rc 0/0 job:''
Jan 10 18:55:58 sh-109-11.int kernel: LustreError: 409784:0:(layout.c:2121:__req_capsule_get()) @@@ Wrong buffer for field 'niobuf_inline' (7 of 7) in format 'LDLM_INTENT_OPEN', 0 vs. 0 (server)  req@ffff8a60041f1680 x1652789724785856/t764900612344(764900612344) o101->oak-MDT0000-mdc-ffff8a63bc27e000@10.0.2.52@o2ib5:12/10 lens 584/672 e 0 to 0 dl 1578711402 ref 3 fl Complete:RPQU/4/0 rc 0/0 job:''
Jan 10 19:21:06 sh-109-11.int kernel: LustreError: 412271:0:(layout.c:2121:__req_capsule_get()) @@@ Wrong buffer for field 'niobuf_inline' (7 of 7) in format 'LDLM_INTENT_OPEN', 0 vs. 0 (server)  req@ffff8a4bc148a400 x1652789839725568/t765251846397(765251846397) o101->oak-MDT0000-mdc-ffff8a63bc27e000@10.0.2.52@o2ib5:12/10 lens 584/672 e 0 to 0 dl 1578712910 ref 3 fl Complete:RPQU/4/0 rc 0/0 job:''
Jan 11 00:25:18 sh-109-11.int kernel: LustreError: 363221:0:(layout.c:2121:__req_capsule_get()) @@@ Wrong buffer for field 'niobuf_inline' (7 of 7) in format 'LDLM_INTENT_OPEN', 0 vs. 0 (server)  req@ffff8a49e8ebba80 x1652790011215872/t108687007176(108687007176) o101->oak-MDT0001-mdc-ffff8a63bc27e000@10.0.2.52@o2ib5:12/10 lens 592/648 e 0 to 0 dl 1578731162 ref 3 fl Complete:RPQU/4/0 rc 0/0 job:''
Jan 11 00:58:19 sh-109-11.int kernel: LustreError: 435607:0:(layout.c:2121:__req_capsule_get()) @@@ Wrong buffer for field 'niobuf_inline' (7 of 7) in format 'LDLM_INTENT_OPEN', 0 vs. 0 (server)  req@ffff8a5317f8e300 x1652790028203136/t767649780647(767649780647) o101->oak-MDT0000-mdc-ffff8a63bc27e000@10.0.2.52@o2ib5:12/10 lens 592/648 e 0 to 0 dl 1578733107 ref 3 fl Complete:RPQU/4/0 rc 0/0 job:''

It is only happening with Oak which is still running Lustre 2.10.8 (servers), not with Fir which is running Lustre 2.12.3_4 (servers).

It's unclear if this has a negative impact. Oak is accessible and seems to work as expected. The error messages are a bit annoying though.

Thanks!
Stephane



 Comments   
Comment by Andreas Dilger [ 15/Jan/20 ]

Mike, could you please take a look. It appears that the niobuf_inline buffer was added as part of patch https://review.whamcloud.com/23011 "LU-10181 mdt: read on open for DoM files", so it makes sense that this is unlikely to work with 2.10 servers, but sohuldn't be spewing an error on the client.

Mike,

  • is this message indicating any problem, or (as I suspect) just complaining that the read-on-open buffer is not available, since the 2.10.x server isn't sending any data?
  • is there some tunable parameter that could disable the read-on-open functionality temporarily?
  • can you please make a patch for the clients?

At first glance, it may be enough to replace the call in ll_dom_finish_open() to req_capsule_has_field() with req_capsule_field_present(). My understanding is that the former checks whether the named field might be present in a particular message format, while the latter checks whether the field is actually present in the message being processed.

Comment by Mikhail Pershin [ 15/Jan/20 ]

Andreas, right, the req_capsule_field_present() must be used there. This error message is just complaining, I will prepare patch. As for disabling read-on-open, that will not help, this feature is server parameter, client was supposed to ignore old servers silently. We could use mdc_dom_min_repsize option on client to set zero length of niobuf, but that 'extra data length' and buffer still has header so its size is not zero. I think that also should be addressed in patch - no sense to pass header with expected zero buffer length, whole buffer size should be set to zero. This is better to keep as is, changing that would cause protocol change and compatibility issues while benefit is aesthetic mostly

Comment by Gerrit Updater [ 15/Jan/20 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37249
Subject: LU-13136 dom: check read-on-open buffer presents in reply
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a12e36d71a9d06cca4aad69de28e9af052f9168a

Comment by Gerrit Updater [ 28/Jan/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37249/
Subject: LU-13136 dom: check read-on-open buffer presents in reply
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 58bea527100b50abf3df2dbab0ed6d6b42b69d86

Comment by Gerrit Updater [ 09/Apr/20 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38188
Subject: LU-13136 dom: check read-on-open buffer presents in reply
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 2df4913d85a3826636d27bbe6cf75a4c7dd21bfd

Comment by Gerrit Updater [ 19/Apr/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38188/
Subject: LU-13136 dom: check read-on-open buffer presents in reply
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: de60bf29f4a4f6b1443850ce5797c23b4290f36e

Generated at Sat Feb 10 02:58:41 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.