[LU-11269] ptlrpc_set_add_req()) ASSERTION( req->rq_import->imp_state != LUSTRE_IMP_IDLE ) failed Created: 21/Aug/18 Updated: 16/Sep/20 Resolved: 20/Feb/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0, Lustre 2.12.3 |
| Fix Version/s: | Lustre 2.14.0, Lustre 2.12.5, Lustre 2.12.6 |
| Type: | Bug | Priority: | Major |
| Reporter: | Oleg Drokin | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
Still seeing this issue: Feb 6 16:27:00 bcn1833 kernel: LustreError: 119911:0:(client.c:1087:ptlrpc_set_add_req()) ASSERTION( req->rq_import->imp_state != LUSTRE_IMP_IDLE ) failed: Feb 6 16:27:00 bcn1833 kernel: LustreError: 119911:0:(client.c:1087:ptlrpc_set_add_req()) LBUG Feb 6 16:27:00 bcn1833 kernel: Pid: 119911, comm: namd2 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 15:49:49 UTC 2019 Feb 6 16:27:00 bcn1833 kernel: Call Trace: Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc11c37cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc11c387c>] lbug_with_loc+0x4c/0xa0 [libcfs] Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc1878ecc>] ptlrpc_set_add_req+0xfc/0x100 [ptlrpc] Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc187d809>] ptlrpc_queue_wait+0x79/0x230 [ptlrpc] Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc1864942>] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc1453b35>] osc_enqueue_base+0x2c5/0x6b0 [osc] Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc145eb7b>] osc_lock_enqueue+0x38b/0x840 [osc] Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc16fce35>] cl_lock_enqueue+0x65/0x120 [obdclass] Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc12801a5>] lov_lock_enqueue+0x95/0x150 [lov] Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc16fce35>] cl_lock_enqueue+0x65/0x120 [obdclass] Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc16fd3c7>] cl_lock_request+0x67/0x1f0 [obdclass] Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc170123b>] cl_io_lock+0x2bb/0x3d0 [obdclass] Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc170156a>] cl_io_loop+0xba/0x1c0 [obdclass] Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc157798b>] ll_file_io_generic+0x63b/0xc90 [lustre] Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc1578479>] ll_file_aio_write+0x289/0x660 [lustre] Feb 6 16:27:00 bcn1833 kernel: [<ffffffffc1578950>] ll_file_write+0x100/0x1c0 [lustre] Feb 6 16:27:00 bcn1833 kernel: [<ffffffff9fe4a7f0>] vfs_write+0xc0/0x1f0 Feb 6 16:27:00 bcn1833 kernel: [<ffffffff9fe4b60f>] SyS_write+0x7f/0xf0 Feb 6 16:27:00 bcn1833 kernel: [<ffffffffa038dede>] system_call_fastpath+0x25/0x2a Feb 6 16:27:00 bcn1833 kernel: [<ffffffffffffffff>] 0xffffffffffffffff Feb 6 16:27:00 bcn1833 kernel: Kernel panic - not syncing: LBUG |
| Comments |
| Comment by Alex Zhuravlev [ 21/Aug/18 ] |
|
Oleg, please try https://review.whamcloud.com/#/c/32980/ |
| Comment by Oleg Drokin [ 21/Aug/18 ] |
|
added to my testing |
| Comment by Sarah Liu [ 05/Sep/18 ] |
|
Hit similar issue on soak, lustre-master-ib 2.11.54_91_g605a139 client dmesg [ 2388.749104] alg: No test for adler32 (adler32-zlib) [ 2389.628880] Lustre: Lustre: Build Version: 2.11.54_91_g605a139 [ 2389.806899] LNet: Using FMR for registration [ 2389.825192] LNet: Added LNI 192.168.1.129@o2ib [8/256/0/180] [ 3597.244535] LustreError: 11-0: soaked-MDT0003-mdc-ffff8bc97a7fd000: operation mds_connect to node 192.168.1.111@o2ib failed: rc = -16 [ 3597.258019] LustreError: 11-0: soaked-MDT0002-mdc-ffff8bc97a7fd000: operation mds_connect to node 192.168.1.110@o2ib failed: rc = -16 [ 3597.273043] LustreError: 11-0: soaked-OST0002-osc-ffff8bc97a7fd000: operation ost_connect to node 192.168.1.106@o2ib failed: rc = -16 [ 3597.275373] Lustre: Mounted soaked-client [ 3597.291219] LustreError: 11-0: soaked-OST0006-osc-ffff8bc97a7fd000: operation ost_connect to node 192.168.1.106@o2ib failed: rc = -16 [ 3597.304919] LustreError: 11-0: soaked-OST000a-osc-ffff8bc97a7fd000: operation ost_connect to node 192.168.1.106@o2ib failed: rc = -16 [ 3597.318665] LustreError: 11-0: soaked-OST000e-osc-ffff8bc97a7fd000: operation ost_connect to node 192.168.1.106@o2ib failed: rc = -16 Sep 4 22:01:01 soak-29 TIME: Time stamp for console [ 6457.411019] LustreError: 4511:0:(client.c:1082:ptlrpc_set_add_req()) ASSERTION( req->rq_import->imp_state != LUSTRE_IMP_IDLE ) failed: [ 6457.424658] LustreError: 4511:0:(client.c:1082:ptlrpc_set_add_req()) LBUG [ 6457.432264] Pid: 4511, comm: simul 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018 [ 6457.442107] Call Trace: [ 6457.444861] [<ffffffffc0b907cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [ 6457.452209] [<ffffffffc0b9087c>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 6457.459158] [<ffffffffc0ea9dec>] ptlrpc_set_add_req+0xfc/0x100 [ptlrpc] [ 6457.466718] [<ffffffffc0eae735>] ptlrpc_queue_wait+0x75/0x220 [ptlrpc] [ 6457.474167] [<ffffffffc0e936a2>] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] [ 6457.481618] [<ffffffffc101e965>] osc_enqueue_base+0x2b5/0x6a0 [osc] [ 6457.488776] [<ffffffffc102917b>] osc_lock_enqueue+0x38b/0x840 [osc] [ 6457.495916] [<ffffffffc0ce4de5>] cl_lock_enqueue+0x65/0x120 [obdclass] [ 6457.503386] [<ffffffffc0fcf285>] lov_lock_enqueue+0x95/0x150 [lov] [ 6457.510458] [<ffffffffc0ce4de5>] cl_lock_enqueue+0x65/0x120 [obdclass] [ 6457.517903] [<ffffffffc0ce5377>] cl_lock_request+0x67/0x1f0 [obdclass] [ 6457.525352] [<ffffffffc0ce926b>] cl_io_lock+0x2bb/0x3d0 [obdclass] [ 6457.532408] [<ffffffffc0ce95fb>] cl_io_loop+0x11b/0xc70 [obdclass] [ 6457.539477] [<ffffffffc10c6db2>] ll_file_io_generic+0x4e2/0xd10 [lustre] [ 6457.547145] [<ffffffffc10c7b32>] ll_file_aio_write+0x372/0x540 [lustre] [ 6457.554681] [<ffffffffc10c7da4>] ll_file_write+0xa4/0x170 [lustre] [ 6457.561731] [<ffffffffb1a1b490>] vfs_write+0xc0/0x1f0 [ 6457.567514] [<ffffffffb1a1c2bf>] SyS_write+0x7f/0xf0 [ 6457.573197] [<ffffffffb1f20795>] system_call_fastpath+0x1c/0x21 [ 6457.579954] [<ffffffffffffffff>] 0xffffffffffffffff [ 6457.585582] Kernel panic - not syncing: LBUG [ 6457.591152] CPU: 18 PID: 4511 Comm: simul Tainted: G IOE ------------ 3.10.0-862.9.1.el7.x86_64 #1 [ 6457.603125] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015 [ 6457.615578] Call Trace: [ 6457.619126] [<ffffffffb1f0e84e>] dump_stack+0x19/0x1b [ 6457.625657] [<ffffffffb1f08b50>] panic+0xe8/0x21f [ 6457.631785] [<ffffffffc0b908cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [ 6457.639483] [<ffffffffc0ea9dec>] ptlrpc_set_add_req+0xfc/0x100 [ptlrpc] [ 6457.647758] [<ffffffffc0eae735>] ptlrpc_queue_wait+0x75/0x220 [ptlrpc] [ 6457.655965] [<ffffffffc0e936a2>] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] [ 6457.664156] [<ffffffffc0e8e7c0>] ? ldlm_expired_completion_wait+0x220/0x220 [ptlrpc] [ 6457.673637] [<ffffffffc1028710>] ? osc_lock_lockless_cancel+0xe0/0xe0 [osc] [ 6457.682276] [<ffffffffc1027ad0>] ? osc_lock_upcall+0x580/0x580 [osc] [ 6457.690247] [<ffffffffc101e965>] osc_enqueue_base+0x2b5/0x6a0 [osc] [ 6457.698079] [<ffffffffc1027550>] ? osc_lock_lvb_update+0x330/0x330 [osc] [ 6457.706337] [<ffffffffc102917b>] osc_lock_enqueue+0x38b/0x840 [osc] [ 6457.714139] [<ffffffffc1027550>] ? osc_lock_lvb_update+0x330/0x330 [osc] [ 6457.722390] [<ffffffffc0ce4de5>] cl_lock_enqueue+0x65/0x120 [obdclass] [ 6457.730416] [<ffffffffc0fcf285>] lov_lock_enqueue+0x95/0x150 [lov] [ 6457.738100] [<ffffffffc0ce4de5>] cl_lock_enqueue+0x65/0x120 [obdclass] [ 6457.746155] [<ffffffffc0ce5377>] cl_lock_request+0x67/0x1f0 [obdclass] [ 6457.754191] [<ffffffffc0ce926b>] cl_io_lock+0x2bb/0x3d0 [obdclass] [ 6457.761754] [<ffffffffc0ce95fb>] cl_io_loop+0x11b/0xc70 [obdclass] [ 6457.769300] [<ffffffffc10c6db2>] ll_file_io_generic+0x4e2/0xd10 [lustre] [ 6457.777391] [<ffffffffc10c7b32>] ll_file_aio_write+0x372/0x540 [lustre] [ 6457.785395] [<ffffffffc10c7da4>] ll_file_write+0xa4/0x170 [lustre] [ 6457.792946] [<ffffffffb1a1b490>] vfs_write+0xc0/0x1f0 [ 6457.799194] [<ffffffffb1a1c2bf>] SyS_write+0x7f/0xf0 [ 6457.805311] [<ffffffffb1f20795>] system_call_fastpath+0x1c/0x21 [ 6457.812550] Kernel Offset: 0x30800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) |
| Comment by Jian Yu [ 09/Dec/18 ] |
|
+1 on master branch while running sanity-sec test 20 in review-dne-zfs-part-2 test group: |
| Comment by Sarah Liu [ 22/Aug/19 ] |
|
also hit this on SOAK on client side, lustre-b2_12-ib #35 [180971.440478] Lustre: soaked-OST0007-osc-ffffa0200627b800: Connection restored to 192.168.1.107@o2ib (at 192.168.1.107@o2ib) [182441.030502] LustreError: 61428:0:(client.c:1088:ptlrpc_set_add_req()) ASSERTION( req->rq_import->imp_state != LUSTRE_IMP_IDLE ) failed: [182441.044298] LustreError: 61428:0:(client.c:1088:ptlrpc_set_add_req()) LBUG [182441.052078] Pid: 61428, comm: simul 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019 [182441.062176] Call Trace: [182441.065025] [<ffffffffc0a2a7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [182441.072438] [<ffffffffc0a2a87c>] lbug_with_loc+0x4c/0xa0 [libcfs] [182441.079472] [<ffffffffc0e0002c>] ptlrpc_set_add_req+0xfc/0x100 [ptlrpc] [182441.087102] [<ffffffffc0e04959>] ptlrpc_queue_wait+0x79/0x230 [ptlrpc] [182441.094614] [<ffffffffc0de97c2>] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] [182441.102122] [<ffffffffc0f77a15>] osc_enqueue_base+0x2c5/0x6b0 [osc] [182441.109335] [<ffffffffc0f82a1b>] osc_lock_enqueue+0x38b/0x840 [osc] [182441.116555] [<ffffffffc0b9e315>] cl_lock_enqueue+0x65/0x120 [obdclass] [182441.124096] [<ffffffffc0f281a5>] lov_lock_enqueue+0x95/0x150 [lov] [182441.131205] [<ffffffffc0b9e315>] cl_lock_enqueue+0x65/0x120 [obdclass] [182441.138727] [<ffffffffc0b9e8a7>] cl_lock_request+0x67/0x1f0 [obdclass] [182441.146221] [<ffffffffc0ba270b>] cl_io_lock+0x2bb/0x3d0 [obdclass] [182441.153335] [<ffffffffc0ba2a3a>] cl_io_loop+0xba/0x1c0 [obdclass] [182441.160371] [<ffffffffc104f530>] ll_file_io_generic+0x590/0xcb0 [lustre] [182441.168080] [<ffffffffc1050942>] ll_file_aio_read+0x352/0x410 [lustre] [182441.175584] [<ffffffffc1050b00>] ll_file_read+0x100/0x1c0 [lustre] [182441.182695] [<ffffffffac2416ff>] vfs_read+0x9f/0x170 [182441.188443] [<ffffffffac2425bf>] SyS_read+0x7f/0xf0 [182441.194092] [<ffffffffac775ddb>] system_call_fastpath+0x22/0x27 [182441.200916] [<ffffffffffffffff>] 0xffffffffffffffff [182441.206599] Kernel panic - not syncing: LBUG [182441.211472] CPU: 0 PID: 61428 Comm: simul Kdump: loaded Tainted: G OE ------------ 3.10.0-957.21.3.el7.x86_64 #1 [182441.224185] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [182441.236801] Call Trace: [182441.239627] [<ffffffffac763107>] dump_stack+0x19/0x1b [182441.245457] [<ffffffffac75c810>] panic+0xe8/0x21f [182441.250903] [<ffffffffc0a2a8cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [182441.257911] [<ffffffffc0e0002c>] ptlrpc_set_add_req+0xfc/0x100 [ptlrpc] [182441.265499] [<ffffffffc0e04959>] ptlrpc_queue_wait+0x79/0x230 [ptlrpc] [182441.272988] [<ffffffffc0de97c2>] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] [182441.280469] [<ffffffffac63ce30>] ? dev_queue_xmit+0x10/0x20 [182441.286893] [<ffffffffc0de4990>] ? ldlm_expired_completion_wait+0x220/0x220 [ptlrpc] [182441.295732] [<ffffffffc0f81fb0>] ? osc_lock_lockless_cancel+0xe0/0xe0 [osc] [182441.303689] [<ffffffffc0f80d30>] ? osc_lock_upcall+0x580/0x580 [osc] [182441.310969] [<ffffffffc0f77a15>] osc_enqueue_base+0x2c5/0x6b0 [osc] [182441.318162] [<ffffffffc0f807b0>] ? osc_lock_lvb_update+0x330/0x330 [osc] [182441.325828] [<ffffffffc0f82a1b>] osc_lock_enqueue+0x38b/0x840 [osc] [182441.333020] [<ffffffffc0f807b0>] ? osc_lock_lvb_update+0x330/0x330 [osc] [182441.340694] [<ffffffffc0b9e315>] cl_lock_enqueue+0x65/0x120 [obdclass] [182441.348176] [<ffffffffc0f281a5>] lov_lock_enqueue+0x95/0x150 [lov] [182441.355267] [<ffffffffc0b9e315>] cl_lock_enqueue+0x65/0x120 [obdclass] [182441.362756] [<ffffffffc0b9e8a7>] cl_lock_request+0x67/0x1f0 [obdclass] [182441.370245] [<ffffffffc0ba270b>] cl_io_lock+0x2bb/0x3d0 [obdclass] [182441.377345] [<ffffffffc0ba2a3a>] cl_io_loop+0xba/0x1c0 [obdclass] [182441.384345] [<ffffffffc104f530>] ll_file_io_generic+0x590/0xcb0 [lustre] [182441.392022] [<ffffffffc1050942>] ll_file_aio_read+0x352/0x410 [lustre] [182441.399506] [<ffffffffc1050b00>] ll_file_read+0x100/0x1c0 [lustre] [182441.406596] [<ffffffffac2416ff>] vfs_read+0x9f/0x170 [182441.412329] [<ffffffffac2425bf>] SyS_read+0x7f/0xf0 [182441.417977] [<ffffffffac775d21>] ? system_call_after_swapgs+0xae/0x146 [182441.425455] [<ffffffffac775ddb>] system_call_fastpath+0x22/0x27 [182441.432256] [<ffffffffac775d21>] ? system_call_after_swapgs+0xae/0x146 [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Initializing cgroup subsys cpuacct [ 0.000000] Linux version 3.10.0-957.21.3.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Tue Jun 18 16:35:19 UTC 2019 [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-957.21.3.el7.x86_64 ro console=ttyS0,115200 LANG=en_US.UTF-8 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never nokaslr disable_cpu_apicid=0 elfcorehdr=871860K |
| Comment by Gerrit Updater [ 10/Feb/20 ] |
|
Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37523 |
| Comment by Gerrit Updater [ 20/Feb/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37523/ |
| Comment by Peter Jones [ 20/Feb/20 ] |
|
Landed for 2.14 |
| Comment by Gerrit Updater [ 20/Feb/20 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37649 |
| Comment by Gerrit Updater [ 10/Mar/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37649/ |
| Comment by Gerrit Updater [ 23/Apr/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37722/ |
| Comment by Gerrit Updater [ 23/Apr/20 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38340 |
| Comment by Gerrit Updater [ 30/Apr/20 ] |
|
James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38437 |
| Comment by Gerrit Updater [ 01/May/20 ] |
|
Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38445 |
| Comment by Jay Lan (Inactive) [ 07/Jul/20 ] |
|
This review https://review.whamcloud.com/38340 (for b2_12) was created on April 23rd. Is it ready for landing? |
| Comment by Gerrit Updater [ 11/Jul/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38340/ |