[LU-11424] incorrect amount of cpts copied to lnet_cpts Created: 25/Sep/18  Updated: 23/Feb/21  Resolved: 06/Oct/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0, Lustre 2.12.0, Lustre 2.10.5
Fix Version/s: Lustre 2.12.0, Lustre 2.10.6

Type: Bug Priority: Critical
Reporter: James A Simmons Assignee: James A Simmons
Resolution: Fixed Votes: 0
Labels: None
Environment:

Any LNet setup


Issue Links:
Related
is related to LU-13235 Fix wrong size in lnet_net_append_cpts Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

While porting the Multi-Rail work to the linux lustre client a bug exposed itself that is in the OpenSFS branch but yet has not shown up yet. The wrong size is used to copy the cpts to net_cpts which causes the follow crash:

RIP: 0010:lnet_match2mt.isra.8+0x2b/0x40 [lnet]
2018-09-14 21:49:43 [ 4076.107190] Code: 66 66 66 90 83 3d 4c 9e 02 00 01 74 23 31 c0 f6 47 08 02 75 01 c3 48 89 f2 53 48 8b 5f 30
31 f6 48 89 d7 e8 77 3c ff ff 48 98 <48> 8b 04 c3 5b c3 48 8b 47 30 48 8b 00 c3 0f 1f 80 00 00 00 00 66
2018-09-14 21:49:43 [ 4076.128243] RSP: 0018:ffffc9000fc27b90 EFLAGS: 00010286
2018-09-14 21:49:43 [ 4076.134580] RAX: 000000001d5677f8 RBX: ffff88104a417e70 RCX: 000000000000003f
2018-09-14 21:49:43 [ 4076.142817] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 00000000ffffffff
2018-09-14 21:49:43 [ 4076.151028] RBP: ffffc9000fc27bc8 R08: 0000000000000001 R09: 0000000000000001
2018-09-14 21:49:43 [ 4076.159219] R10: 0000000000000000 R11: ffffc9000fc27a88 R12: 0000000000000001
2018-09-14 21:49:43 [ 4076.167398] R13: 0000000000000001 R14: ffff8808581c1060 R15: ffff88104ca21200
2018-09-14 21:49:43 [ 4076.175565] lnet_mt_of_attach+0x72/0x1b0 [lnet]
2018-09-14 21:49:43 [ 4076.181190] LNetMEAttach+0x60/0x1f0 [lnet]
2018-09-14 21:49:43 [ 4076.186388] ptl_send_rpc+0x26f/0xbb0 [ptlrpc]
2018-09-14 21:49:43 [ 4076.191812] ? libcfs_debug_msg+0x57/0x80 [libcfs]
2018-09-14 21:49:43 [ 4076.197604] ptlrpc_send_new_req+0x4c9/0x860 [ptlrpc]
2018-09-14 21:49:43 [ 4076.203653] ptlrpc_check_set.part.21+0x855/0x18b0 [ptlrpc]
2018-09-14 21:49:43 [ 4076.210209] ? try_to_del_timer_sync+0x4d/0x80
2018-09-14 21:49:43 [ 4076.215640] ? del_timer_sync+0x35/0x40
2018-09-14 21:49:43 [ 4076.220462] ptlrpcd_check+0x3ae/0x3f0 [ptlrpc]
2018-09-14 21:49:43 [ 4076.225972] ptlrpcd+0x2be/0x320 [ptlrpc]
2018-09-14 21:49:43 [ 4076.230930] ? wait_woken+0x80/0x80



 Comments   
Comment by Gerrit Updater [ 25/Sep/18 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33229
Subject: LU-11424 lnet: copy the correct amount of cpts to lnet_cpts
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 860bdd7db9abe8a2e9c94d47d43b9de15e860320

Comment by Gerrit Updater [ 05/Oct/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33229/
Subject: LU-11424 lnet: copy the correct amount of cpts to lnet_cpts
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 5afe99cac9a7be5bf776d68c65b2fe51b31591ae

Comment by Gerrit Updater [ 06/Oct/18 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33312
Subject: LU-11424 lnet: copy the correct amount of cpts to lnet_cpts
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 1530f84ddbfb537f20a473eb13c6b1f909ed2e35

Comment by Gerrit Updater [ 22/Nov/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33312/
Subject: LU-11424 lnet: copy the correct amount of cpts to lnet_cpts
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: daa4a4ed46495c1a2229d792c79e01208370eeef

Comment by Alexander Boyko [ 10/Jan/19 ]

I think there is some more memcpy errors.

8cbb8cd3 (Amir Shehata      2015-12-11 20:02:54 -0800  204)
8cbb8cd3 (Amir Shehata      2015-12-11 20:02:54 -0800  205)             memcpy(array, net->net_cpts, net->net_ncpts);
8cbb8cd3 (Amir Shehata      2015-12-11 20:02:54 -0800  206)             loc = array + net->net_ncpts;
8cbb8cd3 (Amir Shehata      2015-12-11 20:02:54 -0800  207)             memcpy(loc, added_cpts, j);
Comment by Gerrit Updater [ 31/Oct/19 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/36636
Subject: LU-11424 lnet: copy the correct amount of cpts to lnet_cpts
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d488dd308275babfad8c4cea5d2dc31cb1459a39

Comment by Andreas Dilger [ 12/Feb/20 ]

This issue was closed for the 2.12.0 release. Adding new patches in a later release makes it harder to track the new patch and where it needs to be fixed. I've moved the patch over to LU-13235 instead.

Generated at Sat Feb 10 02:43:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.