[LU-12570] sanity test 134a crash with SSK in use Created: 22/Jul/19  Updated: 04/Oct/19  Resolved: 28/Sep/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0, Lustre 2.12.3
Fix Version/s: Lustre 2.13.0, Lustre 2.12.3

Type: Bug Priority: Critical
Reporter: Oleg Drokin Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-12592 sanity 134a hit panic time to time. Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

When running sanity with shared key enabled, it always crashes in test 134a like this:

LustreError: 32759:0:(ofd_internal.h:412:ofd_info()) ASSERTION( info ) failed:
LustreError: 32759:0:(ofd_internal.h:412:ofd_info()) LBUG
Pid: 32759, comm: mdt00_008 3.10.0-7.6-debug #1 SMP Fri Jul 12 02:40:17 EDT 2019
Call Trace:
[<ffffffffa01928cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[<ffffffffa019297c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[<ffffffffa0e94dca>] ofd_exit+0x0/0x236 [ofd]
[<ffffffffa0e9397b>] ofd_lvbo_update+0xd2b/0xe30 [ofd]
[<ffffffffa05ec99c>] ldlm_handle_ast_error+0x45c/0x820 [ptlrpc]
[<ffffffffa05ee6ea>] ldlm_cb_interpret+0x19a/0x700 [ptlrpc]
[<ffffffffa0608071>] ptlrpc_check_set.part.23+0x491/0x1e00 [ptlrpc]
[<ffffffffa0609a3b>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
[<ffffffffa0609ddc>] ptlrpc_set_wait+0x31c/0x790 [ptlrpc]
[<ffffffffa05c7e35>] ldlm_run_ast_work+0xd5/0x380 [ptlrpc]
[<ffffffffa05fe8c5>] ldlm_reclaim_full+0x425/0x7a0 [ptlrpc]
[<ffffffffa05f0338>] ldlm_handle_enqueue0+0x138/0x15d0 [ptlrpc]
[<ffffffffa0676b42>] tgt_enqueue+0x62/0x210 [ptlrpc]
[<ffffffffa067ef85>] tgt_request_handle+0x985/0x1630 [ptlrpc]
[<ffffffffa0622568>] ptlrpc_server_handle_request+0x258/0xb00 [ptlrpc]
[<ffffffffa062670a>] ptlrpc_main+0xcba/0x2500 [ptlrpc]
[<ffffffff810b4ed4>] kthread+0xe4/0xf0
[<ffffffff817c8c5d>] ret_from_fork_nospec_begin+0x7/0x21
[<ffffffffffffffff>] 0xffffffffffffffff


 Comments   
Comment by Alex Zhuravlev [ 22/Jul/19 ]

it's MDT thread which seem to be missing LCT_DT_THREAD ?

Comment by Alex Zhuravlev [ 22/Jul/19 ]
static int mds_start_ptlrpc_service(struct mds_device *m)
...
			.tc_ctx_tags		= LCT_MD_THREAD,
}

i.e. LCT_DT_THREAD is missing?

Comment by Alex Zhuravlev [ 22/Jul/19 ]

it would be great to have a log/dump for the case

Comment by Oleg Drokin [ 22/Jul/19 ]

sure thing.

master: http://testing.linuxhacker.ru:3333/lustre-reports/1455/testresults/sanity2-ldiskfs-DNE-SSK-centos7_x86_64-centos7_x86_64/

b2_12: http://testing.linuxhacker.ru:3333/lustre-reports/1460/testresults/sanity-ldiskfs-DNE-SSK-centos7_x86_64-centos7_x86_64/

Comment by Andreas Dilger [ 30/Jul/19 ]

Alex, is this just a matter of adding LCT_DT_THREAD to mds_start_ptlrpc_service() setting up the threads? What is the impact/overhead of doing this (if any)? Would it be better to limit ldlm_reclaim_ns() to only clean up locks in the same namespace type as mentioned in LU-12592?

Comment by Sebastien Buisson [ 19/Aug/19 ]

Hi,

I never hit this crash myself, and I have many examples of sanity test 134a passing with SSK enabled, for instance all custom-103 sessions triggered from https://review.whamcloud.com/34380 (latest one is run with patch rebased on August, 1st).

Not sure it is an issue with SSK, I am wondering if the crash you experience stills occurs with the modification suggested by Alex (LCT_DT_THREAD).

Comment by Alex Zhuravlev [ 19/Aug/19 ]

sorry for late response, adding LCT_DT_THREAD is not quite enough - the problem is that the client is trying to cancel extent locks sending them to MDT.
iirc, Alexey Lyashkov reported this in another ticket.. will try to find that one.

Comment by Gerrit Updater [ 13/Sep/19 ]

Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36179
Subject: LU-12570 mdt: request env for DT threads
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2b586c40f22ed9e096478d9caf634c42b3fff224

Comment by Gerrit Updater [ 27/Sep/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36179/
Subject: LU-12570 mdt: request env for DT threads
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1f94d5eb2be4e921e909d8f18523dcab91bb6531

Comment by Peter Jones [ 28/Sep/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 28/Sep/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36312
Subject: LU-12570 mdt: request env for DT threads
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: a561e4848e6e03a380974ed74ee10d41a3c25a98

Comment by Gerrit Updater [ 04/Oct/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36312/
Subject: LU-12570 mdt: request env for DT threads
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: dee23bcfb821e339ca3b3df2fb08fa946f7d09a1

Generated at Sat Feb 10 02:53:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.