[LU-11269] ptlrpc_set_add_req()) ASSERTION( req->rq_import->imp_state != LUSTRE_IMP_IDLE ) failed Created: 21/Aug/18  Updated: 16/Sep/20  Resolved: 20/Feb/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0, Lustre 2.12.3
Fix Version/s: Lustre 2.14.0, Lustre 2.12.5, Lustre 2.12.6

Type: Bug Priority: Major
Reporter: Oleg Drokin Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 1
Labels: None

Issue Links:
Duplicate
is duplicated by LU-12636 LustreError: 25940:0:(client.c:1088:p... Resolved
Related
is related to LU-11128 replay-single test timeout Resolved
is related to LU-13498 sanity test 56w fails with '/usr/bin/... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Still seeing this issue:

Feb 6 16:27:00 bcn1833 kernel: LustreError: 119911:0:(client.c:1087:ptlrpc_set_add_req()) ASSERTION( req->rq_import->imp_state != LUSTRE_IMP_IDLE ) failed:
Feb 6 16:27:00 bcn1833 kernel: LustreError: 119911:0:(client.c:1087:ptlrpc_set_add_req()) LBUG
Feb 6 16:27:00 bcn1833 kernel: Pid: 119911, comm: namd2 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 15:49:49 UTC 2019
Feb 6 16:27:00 bcn1833 kernel: Call Trace:
Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc11c37cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc11c387c>] lbug_with_loc+0x4c/0xa0 [libcfs]
Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc1878ecc>] ptlrpc_set_add_req+0xfc/0x100 [ptlrpc]
Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc187d809>] ptlrpc_queue_wait+0x79/0x230 [ptlrpc]
Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc1864942>] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc]
Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc1453b35>] osc_enqueue_base+0x2c5/0x6b0 [osc]
Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc145eb7b>] osc_lock_enqueue+0x38b/0x840 [osc]
Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc16fce35>] cl_lock_enqueue+0x65/0x120 [obdclass]
Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc12801a5>] lov_lock_enqueue+0x95/0x150 [lov]
Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc16fce35>] cl_lock_enqueue+0x65/0x120 [obdclass]
Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc16fd3c7>] cl_lock_request+0x67/0x1f0 [obdclass]
Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc170123b>] cl_io_lock+0x2bb/0x3d0 [obdclass]
Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc170156a>] cl_io_loop+0xba/0x1c0 [obdclass]
Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc157798b>] ll_file_io_generic+0x63b/0xc90 [lustre]
Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc1578479>] ll_file_aio_write+0x289/0x660 [lustre]
Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc1578950>] ll_file_write+0x100/0x1c0 [lustre]
Feb 6 16:27:00 bcn1833 kernel: [<ffffffff9fe4a7f0>] vfs_write+0xc0/0x1f0
Feb 6 16:27:00 bcn1833 kernel: [<ffffffff9fe4b60f>] SyS_write+0x7f/0xf0
Feb 6 16:27:00 bcn1833 kernel: [<ffffffffa038dede>] system_call_fastpath+0x25/0x2a
Feb 6 16:27:00 bcn1833 kernel: [<ffffffffffffffff>] 0xffffffffffffffff
Feb 6 16:27:00 bcn1833 kernel: Kernel panic - not syncing: LBUG


 Comments   
Comment by Alex Zhuravlev [ 21/Aug/18 ]

Oleg, please try https://review.whamcloud.com/#/c/32980/

Comment by Oleg Drokin [ 21/Aug/18 ]

added to my testing

Comment by Sarah Liu [ 05/Sep/18 ]

Hit similar issue on soak, lustre-master-ib 2.11.54_91_g605a139

client dmesg

[ 2388.749104] alg: No test for adler32 (adler32-zlib)
[ 2389.628880] Lustre: Lustre: Build Version: 2.11.54_91_g605a139
[ 2389.806899] LNet: Using FMR for registration
[ 2389.825192] LNet: Added LNI 192.168.1.129@o2ib [8/256/0/180]
[ 3597.244535] LustreError: 11-0: soaked-MDT0003-mdc-ffff8bc97a7fd000: operation mds_connect to node 192.168.1.111@o2ib failed: rc = -16
[ 3597.258019] LustreError: 11-0: soaked-MDT0002-mdc-ffff8bc97a7fd000: operation mds_connect to node 192.168.1.110@o2ib failed: rc = -16
[ 3597.273043] LustreError: 11-0: soaked-OST0002-osc-ffff8bc97a7fd000: operation ost_connect to node 192.168.1.106@o2ib failed: rc = -16
[ 3597.275373] Lustre: Mounted soaked-client
[ 3597.291219] LustreError: 11-0: soaked-OST0006-osc-ffff8bc97a7fd000: operation ost_connect to node 192.168.1.106@o2ib failed: rc = -16
[ 3597.304919] LustreError: 11-0: soaked-OST000a-osc-ffff8bc97a7fd000: operation ost_connect to node 192.168.1.106@o2ib failed: rc = -16
[ 3597.318665] LustreError: 11-0: soaked-OST000e-osc-ffff8bc97a7fd000: operation ost_connect to node 192.168.1.106@o2ib failed: rc = -16
Sep  4 22:01:01 soak-29 TIME: Time stamp for console
[ 6457.411019] LustreError: 4511:0:(client.c:1082:ptlrpc_set_add_req()) ASSERTION( req->rq_import->imp_state != LUSTRE_IMP_IDLE ) failed:
[ 6457.424658] LustreError: 4511:0:(client.c:1082:ptlrpc_set_add_req()) LBUG
[ 6457.432264] Pid: 4511, comm: simul 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018
[ 6457.442107] Call Trace:
[ 6457.444861]  [<ffffffffc0b907cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[ 6457.452209]  [<ffffffffc0b9087c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[ 6457.459158]  [<ffffffffc0ea9dec>] ptlrpc_set_add_req+0xfc/0x100 [ptlrpc]
[ 6457.466718]  [<ffffffffc0eae735>] ptlrpc_queue_wait+0x75/0x220 [ptlrpc]
[ 6457.474167]  [<ffffffffc0e936a2>] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc]
[ 6457.481618]  [<ffffffffc101e965>] osc_enqueue_base+0x2b5/0x6a0 [osc]
[ 6457.488776]  [<ffffffffc102917b>] osc_lock_enqueue+0x38b/0x840 [osc]
[ 6457.495916]  [<ffffffffc0ce4de5>] cl_lock_enqueue+0x65/0x120 [obdclass]
[ 6457.503386]  [<ffffffffc0fcf285>] lov_lock_enqueue+0x95/0x150 [lov]
[ 6457.510458]  [<ffffffffc0ce4de5>] cl_lock_enqueue+0x65/0x120 [obdclass]
[ 6457.517903]  [<ffffffffc0ce5377>] cl_lock_request+0x67/0x1f0 [obdclass]
[ 6457.525352]  [<ffffffffc0ce926b>] cl_io_lock+0x2bb/0x3d0 [obdclass]
[ 6457.532408]  [<ffffffffc0ce95fb>] cl_io_loop+0x11b/0xc70 [obdclass]
[ 6457.539477]  [<ffffffffc10c6db2>] ll_file_io_generic+0x4e2/0xd10 [lustre]
[ 6457.547145]  [<ffffffffc10c7b32>] ll_file_aio_write+0x372/0x540 [lustre]
[ 6457.554681]  [<ffffffffc10c7da4>] ll_file_write+0xa4/0x170 [lustre]
[ 6457.561731]  [<ffffffffb1a1b490>] vfs_write+0xc0/0x1f0
[ 6457.567514]  [<ffffffffb1a1c2bf>] SyS_write+0x7f/0xf0
[ 6457.573197]  [<ffffffffb1f20795>] system_call_fastpath+0x1c/0x21
[ 6457.579954]  [<ffffffffffffffff>] 0xffffffffffffffff
[ 6457.585582] Kernel panic - not syncing: LBUG
[ 6457.591152] CPU: 18 PID: 4511 Comm: simul Tainted: G          IOE  ------------   3.10.0-862.9.1.el7.x86_64 #1
[ 6457.603125] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
[ 6457.615578] Call Trace:
[ 6457.619126]  [<ffffffffb1f0e84e>] dump_stack+0x19/0x1b
[ 6457.625657]  [<ffffffffb1f08b50>] panic+0xe8/0x21f
[ 6457.631785]  [<ffffffffc0b908cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
[ 6457.639483]  [<ffffffffc0ea9dec>] ptlrpc_set_add_req+0xfc/0x100 [ptlrpc]
[ 6457.647758]  [<ffffffffc0eae735>] ptlrpc_queue_wait+0x75/0x220 [ptlrpc]
[ 6457.655965]  [<ffffffffc0e936a2>] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc]
[ 6457.664156]  [<ffffffffc0e8e7c0>] ? ldlm_expired_completion_wait+0x220/0x220 [ptlrpc]
[ 6457.673637]  [<ffffffffc1028710>] ? osc_lock_lockless_cancel+0xe0/0xe0 [osc]
[ 6457.682276]  [<ffffffffc1027ad0>] ? osc_lock_upcall+0x580/0x580 [osc]
[ 6457.690247]  [<ffffffffc101e965>] osc_enqueue_base+0x2b5/0x6a0 [osc]
[ 6457.698079]  [<ffffffffc1027550>] ? osc_lock_lvb_update+0x330/0x330 [osc]
[ 6457.706337]  [<ffffffffc102917b>] osc_lock_enqueue+0x38b/0x840 [osc]
[ 6457.714139]  [<ffffffffc1027550>] ? osc_lock_lvb_update+0x330/0x330 [osc]
[ 6457.722390]  [<ffffffffc0ce4de5>] cl_lock_enqueue+0x65/0x120 [obdclass]
[ 6457.730416]  [<ffffffffc0fcf285>] lov_lock_enqueue+0x95/0x150 [lov]
[ 6457.738100]  [<ffffffffc0ce4de5>] cl_lock_enqueue+0x65/0x120 [obdclass]
[ 6457.746155]  [<ffffffffc0ce5377>] cl_lock_request+0x67/0x1f0 [obdclass]
[ 6457.754191]  [<ffffffffc0ce926b>] cl_io_lock+0x2bb/0x3d0 [obdclass]
[ 6457.761754]  [<ffffffffc0ce95fb>] cl_io_loop+0x11b/0xc70 [obdclass]
[ 6457.769300]  [<ffffffffc10c6db2>] ll_file_io_generic+0x4e2/0xd10 [lustre]
[ 6457.777391]  [<ffffffffc10c7b32>] ll_file_aio_write+0x372/0x540 [lustre]
[ 6457.785395]  [<ffffffffc10c7da4>] ll_file_write+0xa4/0x170 [lustre]
[ 6457.792946]  [<ffffffffb1a1b490>] vfs_write+0xc0/0x1f0
[ 6457.799194]  [<ffffffffb1a1c2bf>] SyS_write+0x7f/0xf0
[ 6457.805311]  [<ffffffffb1f20795>] system_call_fastpath+0x1c/0x21
[ 6457.812550] Kernel Offset: 0x30800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Comment by Jian Yu [ 09/Dec/18 ]

+1 on master branch while running sanity-sec test 20 in review-dne-zfs-part-2 test group:
https://testing.whamcloud.com/test_sets/dcbe721c-f9ab-11e8-b756-52540065bddc

Comment by Sarah Liu [ 22/Aug/19 ]

also hit this on SOAK on client side, lustre-b2_12-ib #35

[180971.440478] Lustre: soaked-OST0007-osc-ffffa0200627b800: Connection restored to 192.168.1.107@o2ib (at 192.168.1.107@o2ib)
[182441.030502] LustreError: 61428:0:(client.c:1088:ptlrpc_set_add_req()) ASSERTION( req->rq_import->imp_state != LUSTRE_IMP_IDLE ) failed: 
[182441.044298] LustreError: 61428:0:(client.c:1088:ptlrpc_set_add_req()) LBUG
[182441.052078] Pid: 61428, comm: simul 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019
[182441.062176] Call Trace:
[182441.065025]  [<ffffffffc0a2a7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[182441.072438]  [<ffffffffc0a2a87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[182441.079472]  [<ffffffffc0e0002c>] ptlrpc_set_add_req+0xfc/0x100 [ptlrpc]
[182441.087102]  [<ffffffffc0e04959>] ptlrpc_queue_wait+0x79/0x230 [ptlrpc]
[182441.094614]  [<ffffffffc0de97c2>] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc]
[182441.102122]  [<ffffffffc0f77a15>] osc_enqueue_base+0x2c5/0x6b0 [osc]
[182441.109335]  [<ffffffffc0f82a1b>] osc_lock_enqueue+0x38b/0x840 [osc]
[182441.116555]  [<ffffffffc0b9e315>] cl_lock_enqueue+0x65/0x120 [obdclass]
[182441.124096]  [<ffffffffc0f281a5>] lov_lock_enqueue+0x95/0x150 [lov]
[182441.131205]  [<ffffffffc0b9e315>] cl_lock_enqueue+0x65/0x120 [obdclass]
[182441.138727]  [<ffffffffc0b9e8a7>] cl_lock_request+0x67/0x1f0 [obdclass]
[182441.146221]  [<ffffffffc0ba270b>] cl_io_lock+0x2bb/0x3d0 [obdclass]
[182441.153335]  [<ffffffffc0ba2a3a>] cl_io_loop+0xba/0x1c0 [obdclass]
[182441.160371]  [<ffffffffc104f530>] ll_file_io_generic+0x590/0xcb0 [lustre]
[182441.168080]  [<ffffffffc1050942>] ll_file_aio_read+0x352/0x410 [lustre]
[182441.175584]  [<ffffffffc1050b00>] ll_file_read+0x100/0x1c0 [lustre]
[182441.182695]  [<ffffffffac2416ff>] vfs_read+0x9f/0x170
[182441.188443]  [<ffffffffac2425bf>] SyS_read+0x7f/0xf0
[182441.194092]  [<ffffffffac775ddb>] system_call_fastpath+0x22/0x27
[182441.200916]  [<ffffffffffffffff>] 0xffffffffffffffff
[182441.206599] Kernel panic - not syncing: LBUG
[182441.211472] CPU: 0 PID: 61428 Comm: simul Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.21.3.el7.x86_64 #1
[182441.224185] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
[182441.236801] Call Trace:
[182441.239627]  [<ffffffffac763107>] dump_stack+0x19/0x1b
[182441.245457]  [<ffffffffac75c810>] panic+0xe8/0x21f
[182441.250903]  [<ffffffffc0a2a8cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
[182441.257911]  [<ffffffffc0e0002c>] ptlrpc_set_add_req+0xfc/0x100 [ptlrpc]
[182441.265499]  [<ffffffffc0e04959>] ptlrpc_queue_wait+0x79/0x230 [ptlrpc]
[182441.272988]  [<ffffffffc0de97c2>] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc]
[182441.280469]  [<ffffffffac63ce30>] ? dev_queue_xmit+0x10/0x20
[182441.286893]  [<ffffffffc0de4990>] ? ldlm_expired_completion_wait+0x220/0x220 [ptlrpc]
[182441.295732]  [<ffffffffc0f81fb0>] ? osc_lock_lockless_cancel+0xe0/0xe0 [osc]
[182441.303689]  [<ffffffffc0f80d30>] ? osc_lock_upcall+0x580/0x580 [osc]
[182441.310969]  [<ffffffffc0f77a15>] osc_enqueue_base+0x2c5/0x6b0 [osc]
[182441.318162]  [<ffffffffc0f807b0>] ? osc_lock_lvb_update+0x330/0x330 [osc]
[182441.325828]  [<ffffffffc0f82a1b>] osc_lock_enqueue+0x38b/0x840 [osc]
[182441.333020]  [<ffffffffc0f807b0>] ? osc_lock_lvb_update+0x330/0x330 [osc]
[182441.340694]  [<ffffffffc0b9e315>] cl_lock_enqueue+0x65/0x120 [obdclass]
[182441.348176]  [<ffffffffc0f281a5>] lov_lock_enqueue+0x95/0x150 [lov]
[182441.355267]  [<ffffffffc0b9e315>] cl_lock_enqueue+0x65/0x120 [obdclass]
[182441.362756]  [<ffffffffc0b9e8a7>] cl_lock_request+0x67/0x1f0 [obdclass]
[182441.370245]  [<ffffffffc0ba270b>] cl_io_lock+0x2bb/0x3d0 [obdclass]
[182441.377345]  [<ffffffffc0ba2a3a>] cl_io_loop+0xba/0x1c0 [obdclass]
[182441.384345]  [<ffffffffc104f530>] ll_file_io_generic+0x590/0xcb0 [lustre]
[182441.392022]  [<ffffffffc1050942>] ll_file_aio_read+0x352/0x410 [lustre]
[182441.399506]  [<ffffffffc1050b00>] ll_file_read+0x100/0x1c0 [lustre]
[182441.406596]  [<ffffffffac2416ff>] vfs_read+0x9f/0x170
[182441.412329]  [<ffffffffac2425bf>] SyS_read+0x7f/0xf0
[182441.417977]  [<ffffffffac775d21>] ? system_call_after_swapgs+0xae/0x146
[182441.425455]  [<ffffffffac775ddb>] system_call_fastpath+0x22/0x27
[182441.432256]  [<ffffffffac775d21>] ? system_call_after_swapgs+0xae/0x146
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.10.0-957.21.3.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Tue Jun 18 16:35:19 UTC 2019
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-957.21.3.el7.x86_64 ro console=ttyS0,115200 LANG=en_US.UTF-8 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never nokaslr disable_cpu_apicid=0 elfcorehdr=871860K


Comment by Gerrit Updater [ 10/Feb/20 ]

Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37523
Subject: LU-11269 ptlrpc: do not expose transient IDLE state
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d9f015f84433100177792a8402ef3a289703d57d

Comment by Gerrit Updater [ 20/Feb/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37523/
Subject: LU-11269 ptlrpc: do not expose transient IDLE state
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ea8d2ecc783fbaff12c581935ac426b9b8567031

Comment by Peter Jones [ 20/Feb/20 ]

Landed for 2.14

Comment by Gerrit Updater [ 20/Feb/20 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37649
Subject: LU-11269 ptlrpc: do not expose transient IDLE state
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 22b238fb902d90229f25b0d109540b362d14be87

Comment by Gerrit Updater [ 10/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37649/
Subject: LU-11269 ptlrpc: do not expose transient IDLE state
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: e668075a922a2a39247c00fe5f47d7a4368967f3

Comment by Gerrit Updater [ 23/Apr/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37722/
Subject: LU-11269 ptlrpc: request's counter in import
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b09afdf57643cbc1c437a42b4babb0837dd19e65

Comment by Gerrit Updater [ 23/Apr/20 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38340
Subject: LU-11269 ptlrpc: request's counter in import
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 5cef888e8524932c5913191e40d79b96e668b32a

Comment by Gerrit Updater [ 30/Apr/20 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38437
Subject: LU-11269 ptlrpc: revert "request's counter in import"
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0098de040cc3953034807c61f742077a3025beb0

Comment by Gerrit Updater [ 01/May/20 ]

Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38445
Subject: LU-11269 debug: try to reproduce sanity/56w issue
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0fbd7702f177f7854c67f1638a2d3046719f9021

Comment by Jay Lan (Inactive) [ 07/Jul/20 ]

This review https://review.whamcloud.com/38340 (for b2_12) was created on April 23rd. Is it ready for landing?

Comment by Gerrit Updater [ 11/Jul/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38340/
Subject: LU-11269 ptlrpc: request's counter in import
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: aeee2ce6793ace7e617e1df49735d473957dd06e

Generated at Sat Feb 10 02:42:25 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.