[LU-14095] Multiple tests crash with “ASSERTION( rsi->h.cache_list.next == ((void *)0) ) failed “ Created: 29/Oct/20  Updated: 16/Dec/20  Resolved: 14/Dec/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Critical
Reporter: James Nunez (Inactive) Assignee: Sebastien Buisson
Resolution: Fixed Votes: 0
Labels: None
Environment:

RHEL8.2 DNE/SSK


Issue Links:
Related
is related to LU-14151 GSS context initialization fails on R... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We have a variety of tests crashing with

[ 4493.310724] LustreError: 13384:0:(gss_svc_upcall.c:236:rsi_put()) ASSERTION( rsi->h.cache_list.next == ((void *)0) ) failed: 
[ 4493.312757] LustreError: 13384:0:(gss_svc_upcall.c:236:rsi_put()) LBUG
[ 4493.314408] Pid: 13384, comm: kworker/0:0 4.18.0-193.6.3.el8_lustre.x86_64 #1 SMP Fri Sep 25 21:03:21 UTC 2020
[ 4493.316084] Call Trace TBD:
[ 4493.316572] Kernel panic - not syncing: LBUG
[ 4493.317295] CPU: 0 PID: 13384 Comm: kworker/0:0 Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-193.6.3.el8_lustre.x86_64 #1
[ 4493.319387] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 4493.320460] Workqueue: events_power_efficient do_cache_clean [sunrpc]
[ 4493.321533] Call Trace:
[ 4493.322057]  dump_stack+0x5c/0x80
[ 4493.322652]  panic+0xe7/0x2a9
[ 4493.323246]  lbug_with_loc.cold.10+0x18/0x18 [libcfs]
[ 4493.324166]  rsi_put+0x10f/0x140 [ptlrpc_gss]
[ 4493.324920]  cache_clean+0x2a4/0x2e0 [sunrpc]
[ 4493.325691]  do_cache_clean+0xa/0x60 [sunrpc]
[ 4493.326449]  process_one_work+0x1a7/0x3b0
[ 4493.327136]  worker_thread+0x30/0x390
[ 4493.327755]  ? create_worker+0x1a0/0x1a0
[ 4493.328425]  kthread+0x112/0x130
[ 4493.328983]  ? kthread_flush_work_fn+0x10/0x10
[ 4493.329745]  ret_from_fork+0x35/0x40

So far, this is only seen on RHEL8.2 with security test groups, dne-ssk and dne-selinux-ssk, and started on 29 OCT 2020 with 2.13.56.46 for
sanity-sec test_16 https://testing.whamcloud.com/test_sets/b52da834-abb7-4080-9469-9bad89885f38

Other test failures are
recovery-small test_4 https://testing.whamcloud.com/test_sets/12054d95-b53a-4383-9fed-73e454728408
recovery-small test_10e https://testing.whamcloud.com/test_sets/67e2fa12-1d98-40ba-a7a0-4ce181b1f6d4
sanity-sec test_0 https://testing.whamcloud.com/test_sets/2d07b4f8-e891-4116-8c13-a2f1ff2370ae
sanity-sec test_17 https://testing.whamcloud.com/test_sets/c0ebf4e3-e06d-4773-9cdb-c785e65b8e29



 Comments   
Comment by Gerrit Updater [ 02/Nov/20 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/40514
Subject: LU-14095 gss: use hlist_unhashed() instead of ->next
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2d54ed4884ded9ad34daf9a7b2bf4663c7ae6f70

Comment by Gerrit Updater [ 18/Nov/20 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/40686
Subject: LU-14095 ssk: default rounds of Miller-Rabin for DH_check
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 74cdb66804e27de86f128aa4af8c8774baa19ed3

Comment by Gerrit Updater [ 03/Dec/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40514/
Subject: LU-14095 gss: use hlist_unhashed() instead of ->next
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a619ceabf44a561bcde1d7128b382f41deca602f

Comment by Gerrit Updater [ 09/Dec/20 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/40914
Subject: LU-14095 gss: use RCU protection for sunrpc cache
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 80dc81244bef7b15928ec54b5388ef78836037e8

Comment by Gerrit Updater [ 13/Dec/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40686/
Subject: LU-14095 ssk: default rounds of Miller-Rabin for DH_check
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0fece1af57e74efa5a7248f57495e2bddf72bb38

Comment by Gerrit Updater [ 14/Dec/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40914/
Subject: LU-14095 gss: use RCU protection for sunrpc cache
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 803a59b87d9b0de8c059447902db176dfd37a24a

Comment by Peter Jones [ 14/Dec/20 ]

Landed for 2.14

Comment by Gerrit Updater [ 16/Dec/20 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/41001
Subject: LU-14095 ssk: default rounds of Miller-Rabin for DH_check
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: a7a50a0297056c548bdf70cdc07f82690cad5568

Comment by Gerrit Updater [ 16/Dec/20 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/41002
Subject: LU-14095 gss: use RCU protection for sunrpc cache
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: fa524856520c8c00931284cb4614e3a53c3b0b6b

Comment by Gerrit Updater [ 16/Dec/20 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/40996
Subject: LU-14095 gss: use hlist_unhashed() instead of ->next
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: ca15730539b70a9065d4e6c5ffafe18440e04f05

Generated at Sat Feb 10 03:06:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.