[LU-12781] sanity test_272a crashes with SSK Created: 18/Sep/19 Updated: 03/Jan/20 Resolved: 03/Jan/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.13.0 |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sebastien Buisson | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | gss | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
With the recent landing of patch " I used the following patches to trigger tests: Without SSK, test_272a does not crash. With SSK, test_272a crashed unless patch " The crash is due to an assertion failed: [ 406.653680] Lustre: DEBUG MARKER: == sanity test 272a: DoM migration: new layout with the same DOM component =========================== 08:37:07 (1568795827) [ 406.726294] format at mdt_io.c:215:mdt_rw_hpreq_check doesn't end in newline [ 406.743661] format at mdt_io.c:215:mdt_rw_hpreq_check doesn't end in newline [ 406.792396] LustreError: 15793:0:(pack_generic.c:454:lustre_shrink_msg_v2()) ASSERTION( msg->lm_buflens[segment] >= newlen ) failed: [ 406.793584] LustreError: 15793:0:(pack_generic.c:454:lustre_shrink_msg_v2()) LBUG [ 406.794352] Pid: 15793, comm: mdt00_002 3.10.0-957.27.2.el7_lustre.x86_64 #1 SMP Thu Sep 12 03:53:14 UTC 2019 [ 406.795309] Call Trace: [ 406.795600] [<ffffffffc09188ac>] libcfs_call_trace+0x8c/0xc0 [libcfs] [ 406.796459] [<ffffffffc091895c>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 406.797125] [<ffffffffc0e32c54>] lustre_shrink_msg+0x164/0x200 [ptlrpc] [ 406.797912] [<ffffffffc146e11e>] gss_svc_authorize+0x16e/0x5b0 [ptlrpc_gss] [ 406.798676] [<ffffffffc0e647c5>] sptlrpc_svc_wrap_reply+0x55/0x1d0 [ptlrpc] [ 406.799455] [<ffffffffc0e2eca8>] ptlrpc_send_reply+0x1e8/0x830 [ptlrpc] [ 406.800340] [<ffffffffc0ded6be>] target_send_reply_msg+0x8e/0x170 [ptlrpc] [ 406.801092] [<ffffffffc0df7d4e>] target_send_reply+0x30e/0x730 [ptlrpc] [ 406.801847] [<ffffffffc0e9d3d1>] tgt_request_handle+0x2f1/0x15c0 [ptlrpc] [ 406.802620] [<ffffffffc0e42516>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [ 406.803501] [<ffffffffc0e4604c>] ptlrpc_main+0xbac/0x1540 [ptlrpc] [ 406.804193] [<ffffffff954c2e81>] kthread+0xd1/0xe0 [ 406.804779] [<ffffffff95b77c37>] ret_from_fork_nospec_end+0x0/0x39 [ 406.805484] [<ffffffffffffffff>] 0xffffffffffffffff [ 406.806058] Kernel panic - not syncing: LBUG [ 406.806628] CPU: 1 PID: 15793 Comm: mdt00_002 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.x86_64 #1 [ 406.807770] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 406.808333] Call Trace: [ 406.808604] [<ffffffff95b65147>] dump_stack+0x19/0x1b [ 406.809120] [<ffffffff95b5e850>] panic+0xe8/0x21f [ 406.809595] [<ffffffffc09189ab>] lbug_with_loc+0x9b/0xa0 [libcfs] [ 406.810219] [<ffffffffc0e32c54>] lustre_shrink_msg+0x164/0x200 [ptlrpc] [ 406.810867] [<ffffffffc146e11e>] gss_svc_authorize+0x16e/0x5b0 [ptlrpc_gss] [ 406.811570] [<ffffffffc0e647c5>] sptlrpc_svc_wrap_reply+0x55/0x1d0 [ptlrpc] [ 406.812272] [<ffffffffc0e2eca8>] ptlrpc_send_reply+0x1e8/0x830 [ptlrpc] [ 406.812946] [<ffffffffc0ded6be>] target_send_reply_msg+0x8e/0x170 [ptlrpc] [ 406.813633] [<ffffffffc0df7d4e>] target_send_reply+0x30e/0x730 [ptlrpc] [ 406.814305] [<ffffffffc0e362d7>] ? lustre_msg_set_last_committed+0x27/0xa0 [ptlrpc] [ 406.815083] [<ffffffffc0e9d3d1>] tgt_request_handle+0x2f1/0x15c0 [ptlrpc] [ 406.815752] [<ffffffffc0a60f3e>] ? libcfs_nid2str_r+0xfe/0x130 [lnet] [ 406.816412] [<ffffffffc0e42516>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [ 406.817157] [<ffffffff954cfeb4>] ? __wake_up+0x44/0x50 [ 406.817689] [<ffffffffc0e4604c>] ptlrpc_main+0xbac/0x1540 [ptlrpc] [ 406.818302] [<ffffffff954d1ad0>] ? finish_task_switch+0x50/0x1c0 [ 406.818914] [<ffffffffc0e454a0>] ? ptlrpc_register_service+0xf90/0xf90 [ptlrpc] [ 406.819620] [<ffffffff954c2e81>] kthread+0xd1/0xe0 [ 406.820102] [<ffffffff954c2db0>] ? insert_kthread_work+0x40/0x40 [ 406.820685] [<ffffffff95b77c37>] ret_from_fork_nospec_begin+0x21/0x21 [ 406.821312] [<ffffffff954c2db0>] ? insert_kthread_work+0x40/0x40 |
| Comments |
| Comment by Sebastien Buisson [ 18/Sep/19 ] |
|
Mike, any advice on this? |
| Comment by Oleg Drokin [ 21/Oct/19 ] |
|
I tried to run all of our tests with SSK on and some more failed with this: sanity-pfl, racer and replay-single See the test session here: http://testing.linuxhacker.ru:3333/lustre-reports/3825/results-retry3.html |
| Comment by Andreas Dilger [ 01/Nov/19 ] |
|
On a semi-related note, please fix the mdt_rw_hpreq_check() message format to include a newline as part of this patch. |
| Comment by Andreas Dilger [ 10/Nov/19 ] |
|
This seems possibly related to the other lu_buf shrinking issue that you are both working on? |
| Comment by Gerrit Updater [ 11/Nov/19 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36727 |
| Comment by Gerrit Updater [ 11/Nov/19 ] |
|
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36732 |
| Comment by Mikhail Pershin [ 12/Nov/19 ] |
|
With the latest patch SSK looks better, Sebastien, can you try this patch in your SSK testing, please? |
| Comment by Sebastien Buisson [ 12/Nov/19 ] |
|
Hi Mike, it looks better indeed, I commented on the patch. |
| Comment by Gerrit Updater [ 03/Jan/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36732/ |
| Comment by Peter Jones [ 03/Jan/20 ] |
|
Landed for 2.14 |