[LU-12338] shared key/gss code sleeps in atomic context. Created: 25/May/19  Updated: 07/Oct/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Oleg Drokin Assignee: Sebastien Buisson
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

I incorporated shared key testing into my testrig and it looks like there are several places where we sleep under spinlock when SSK is enabled:

[  211.138071] Lustre: DEBUG MARKER: comparing 1000 newly copied files at Sat May 25 02:59:49 EDT 2019
[  221.144833] Lustre: DEBUG MARKER: finished at Sat May 25 02:59:59 EDT 2019 (32)
[  221.579442] Lustre: 588:0:(gss_cli_upcall.c:395:gss_do_ctx_fini_rpc()) client finishing forward ctx ffff8800d4204f00 idx 0x22bcca48b310ff2 (0->lustre-OST0001_UUID)
[  221.583203] Lustre: 2808:0:(sec_gss.c:1228:gss_cli_ctx_fini_common()) gss.keyring@ffff880114c0cc00: destroy ctx ffff8800d4204f00(0->lustre-OST0001_UUID)
[  221.587574] BUG: sleeping function called from invalid context at /home/green/git/lustre-release/lustre/ptlrpc/sec_gc.c:79
[  221.591244] in_atomic(): 1, irqs_disabled(): 0, pid: 2808, name: socknal_sd01_00
[  221.592958] CPU: 4 PID: 2808 Comm: socknal_sd01_00 Kdump: loaded Tainted: G           OE  ------------   3.10.0-7.6-debug #4
[  221.595999] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  221.597611] Call Trace:
[  221.598437]  [<ffffffff817b2bf2>] dump_stack+0x19/0x1b
[  221.599896]  [<ffffffff810c3bc9>] __might_sleep+0xd9/0x100
[  221.601459]  [<ffffffffa04e8202>] sptlrpc_gc_del_sec+0x32/0x110 [ptlrpc]
[  221.603125]  [<ffffffffa04dd359>] sptlrpc_sec_put+0x29/0x70 [ptlrpc]
[  221.605025]  [<ffffffffa072c5f7>] ctx_destroy_kr+0xc7/0x300 [ptlrpc_gss]
[  221.606808]  [<ffffffffa072cc9d>] gss_sec_release_ctx_kr+0x2d/0xa0 [ptlrpc_gss]
[  221.608511]  [<ffffffffa04dc792>] sptlrpc_cli_ctx_put+0x42/0xb0 [ptlrpc]
[  221.609902]  [<ffffffffa04de5b1>] sptlrpc_req_put_ctx+0x91/0x1e0 [ptlrpc]
[  221.611295]  [<ffffffffa04a1258>] __ptlrpc_req_finished+0x248/0x750 [ptlrpc]
[  221.613196]  [<ffffffffa04a1770>] ptlrpc_req_finished+0x10/0x20 [ptlrpc]
[  221.615457]  [<ffffffffa04b8505>] request_out_callback+0xc5/0x2a0 [ptlrpc]
[  221.618027]  [<ffffffffa04b804a>] ptlrpc_master_callback+0x2a/0xc0 [ptlrpc]
[  221.620624]  [<ffffffffa01af73b>] lnet_eq_enqueue_event+0x2b/0x140 [lnet]
[  221.622914]  [<ffffffffa01ac32e>] lnet_detach_md+0xde/0x170 [lnet]
[  221.625034]  [<ffffffffa01ae692>] lnet_finalize+0x602/0x7c0 [lnet]
[  221.626357]  [<ffffffffa00f69ce>] ksocknal_tx_done+0x9e/0x1f0 [ksocklnd]
[  221.627867]  [<ffffffffa00fb8d0>] ksocknal_scheduler+0x350/0xd60 [ksocklnd]
[  221.629275]  [<ffffffff810b6050>] ? wake_up_atomic_t+0x30/0x30
[  221.630491]  [<ffffffffa00fb580>] ? ksocknal_recv+0x2a0/0x2a0 [ksocklnd]
[  221.631926]  [<ffffffff810b4ed4>] kthread+0xe4/0xf0
[  221.632967]  [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140
[  221.634247]  [<ffffffff817c7c77>] ret_from_fork_nospec_begin+0x21/0x21
[  221.635488]  [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140

This is in runtests.

Here's the full report, the sleeping is only on the client: http://testing.linuxhacker.ru:3333/lustre-reports/274/testresults/runtests-ssk-ldiskfs-ldiskfs-SSK-centos7_x86_64-centos7_x86_64/



 Comments   
Comment by Andreas Dilger [ 07/Oct/19 ]

Still seeing this with current master.

Generated at Sat Feb 10 02:51:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.