Details
-
Bug
-
Resolution: Unresolved
-
Major
-
Lustre 2.16.0
-
3
-
9223372036854775807
Description
MDT got the following ASSERTION at "flock" scaling test. servers and clients are running master (commit:8011e33370)
[ 1912.913309] LustreError: 218855:0:(ldlm_flock.c:221:ldlm_flock_deadlock()) ASSERTION( req != lock ) failed: [ 1912.915070] LustreError: 218855:0:(ldlm_flock.c:221:ldlm_flock_deadlock()) LBUG [ 1912.916394] CPU: 4 PID: 218855 Comm: mdt01_001 Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1 [ 1912.920720] Hardware name: DDN SFA400NVX2TE, BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [ 1912.922239] Call Trace: [ 1912.922818] dump_stack+0x41/0x60 [ 1912.923510] lbug_with_loc.cold.8+0x5/0x58 [libcfs] [ 1912.924413] ldlm_flock_deadlock.isra.19+0x1fb/0x240 [ptlrpc] [ 1912.925519] ldlm_process_flock_lock+0x116f/0x3250 [ptlrpc] [ 1912.926594] ? lustre_msg_get_flags+0x2a/0x90 [ptlrpc] [ 1912.927582] ldlm_lock_enqueue+0x226/0x890 [ptlrpc] [ 1912.928547] ldlm_handle_enqueue+0x421/0x1750 [ptlrpc] [ 1912.929555] tgt_enqueue+0xa8/0x230 [ptlrpc] [ 1912.930442] tgt_request_handle+0x3f4/0x1a30 [ptlrpc] [ 1912.931428] ? ptlrpc_update_export_timer+0x3d/0x500 [ptlrpc] [ 1912.932496] ptlrpc_server_handle_request+0x2aa/0xcf0 [ptlrpc] [ 1912.933572] ? lprocfs_counter_add+0x10e/0x180 [obdclass] [ 1912.934578] ptlrpc_main+0xc9e/0x15c0 [ptlrpc] [ 1912.935463] ? __schedule+0x2d9/0x870 [ 1912.936188] ? ptlrpc_wait_event+0x5b0/0x5b0 [ptlrpc] [ 1912.937146] kthread+0x134/0x150 [ 1912.937803] ? set_kthread_struct+0x50/0x50 [ 1912.938581] ret_from_fork+0x1f/0x40 [ 1919.422438] Kernel panic - not syncing: LBUG [ 1919.423416] CPU: 4 PID: 218855 Comm: mdt01_001 Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1 [ 1919.425520] Hardware name: DDN SFA400NVX2TE, BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [ 1919.427037] Call Trace: [ 1919.427606] dump_stack+0x41/0x60 [ 1919.428278] panic+0xe7/0x2ac [ 1919.428906] lbug_with_loc.cold.8+0x2f/0x58 [libcfs] [ 1919.429802] ldlm_flock_deadlock.isra.19+0x1fb/0x240 [ptlrpc] [ 1919.430891] ldlm_process_flock_lock+0x116f/0x3250 [ptlrpc] [ 1919.431946] ? lustre_msg_get_flags+0x2a/0x90 [ptlrpc] [ 1919.432911] ldlm_lock_enqueue+0x226/0x890 [ptlrpc] [ 1919.433852] ldlm_handle_enqueue+0x421/0x1750 [ptlrpc] [ 1919.434818] tgt_enqueue+0xa8/0x230 [ptlrpc] [ 1919.435669] tgt_request_handle+0x3f4/0x1a30 [ptlrpc] [ 1919.436623] ? ptlrpc_update_export_timer+0x3d/0x500 [ptlrpc] [ 1919.437653] ptlrpc_server_handle_request+0x2aa/0xcf0 [ptlrpc] [ 1919.438694] ? lprocfs_counter_add+0x10e/0x180 [obdclass] [ 1919.439668] ptlrpc_main+0xc9e/0x15c0 [ptlrpc] [ 1919.440519] ? __schedule+0x2d9/0x870 [ 1919.441200] ? ptlrpc_wait_event+0x5b0/0x5b0 [ptlrpc] [ 1919.442126] kthread+0x134/0x150 [ 1919.442750] ? set_kthread_struct+0x50/0x50 [ 1919.443494] ret_from_fork+0x1f/0x40
when test_5c "merging 20k flocks" performance-sanity.sh ran on 16 clients simultaneously, it hit problem.
test_5c() { touch $DIR/$tfile for((i=0; i < 20001; i++)) { echo "R$((i * 10)), 5" [ $i -eq 20000 ] && echo "W0,99999999" && echo "T0" && continue } | flocks_test 6 $DIR/$tfile rm -r $DIR/$tfile } run_test 5c "merge 20k flocks"
Attachments
Issue Links
- is related to
-
LU-17589 Flock blocking information becomes stale
- Resolved