[LU-12140] DOM: limitaion of data size Created: 01/Apr/19  Updated: 01/Jun/19  Resolved: 01/Jun/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.1
Fix Version/s: Lustre 2.13.0

Type: Bug Priority: Critical
Reporter: Alexander Boyko Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: None
Environment:

configuration with router, MDS_FS_MKFS_OPTS="-O large_xattr", sanity-dom/sanityn


Epic/Theme: lnet
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

During sanity-dom testing the next issue was appeared:

 [ 1644.726837] LNetError: 3137:0:(lib-move.c:4143:lnet_parse()) 192.168.8.1@tcp, src 192.168.8.1@tcp: bad PUT payload 1051832 (1048576 max expected)

I've added a bit debug to take vmcores from a sender.

Here is analyze from crash

 
md = {
     start = 0xffff880098200100,
     length = 1051832,
     threshold = 0,
     max_size = -1742733056,
     options = 0,
     user_ptr = 0xffff880098200000,
     eq_handle = {
       cookie = 23
     },
     bulk_handle = {
       cookie = 0
     }
   },
msg_niov = 1,
 msg_iov = 0xffff88009995aba0,
 
msg_kiov = 0x0,
 
ffff880135be9800
rc_fmt = 0xffffffffc095d080 <RQF_LDLM_INTENT_OPEN>,
 
static const struct req_msg_field *ldlm_intent_open_server[] = {
       &RMF_PTLRPC_BODY,
       &RMF_DLM_REP,
       &RMF_MDT_BODY,
       &RMF_MDT_MD,
       &RMF_ACL,
       &RMF_CAPA1,
       &RMF_CAPA2,
       &RMF_NIOBUF_INLINE,
};
rc_area = {{4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295, 4294967295}, {184, 112, 216, 2432, 260, 0, 0, 1048592, 4294967295, 4294967295}}
 }
 
crash> p 184+112+216+2432+260+1048592
$15 = 1051796
 

The DOM size during open was 1Mb, the total length of lnet request was 1051796, and it doesn't fit at LNET_MTU limit. So the router shows error.
This brings us to problem when we cannot handle 1Mb stripe size DOM at LNET layer. I think it is a problem for PFL when a first stripe located at MDS, probably.
The workaround for sanity-dom testing is to decrease DOM_SIZE at sanity-dom.sh
Also MDS should limit this size to prevent such misbehavior.
I've assigned this to Mikhail, I'm not sure.



 Comments   
Comment by Gerrit Updater [ 28/May/19 ]

Alexandr Boyko (c17825@cray.com) uploaded a new patch: https://review.whamcloud.com/34975
Subject: LU-12140 lnet: adds checking msg len
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: aeb963fc19a6f78777e91ee3fd98ad93c7d9acca

Comment by Gerrit Updater [ 01/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34975/
Subject: LU-12140 lnet: adds checking msg len
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4d43a6c3b182485ffaf7d94c726653b1a36d1b9b

Comment by Peter Jones [ 01/Jun/19 ]

Landed for 2.13

Generated at Sat Feb 10 02:50:00 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.