[LU-10429] soak, LBUG lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size ) failed: Created: 22/Dec/17 Updated: 26/Feb/19 Resolved: 25/Jan/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Cliff White (Inactive) | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | soak | ||
| Environment: |
Soak test cluster, tip of master plus patch for |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
MDS completed a failover/failback sequence, then LBUGged.Bogus recovery timer [ 573.543826] Lustre: soaked-MDT0002: Connection restored to soaked-MDT0002-lwp-OST0000_UUID (at 192.168.1.102@o2ib) [ 573.580891] Lustre: Skipped 11 previous similar messages [ 620.036501] Lustre: soaked-MDT0002: Denying connection for new client b3e244d6-b85b-a278-a032-0b483389bc28(at 192.168.1.116@o2ib), waiting for 31 known clients (3 recovered, 0 in progress, and 0 evicted) to recover in 71579:58 [ 620.590408] Lustre: soaked-MDT0002: Denying connection for new client b894efcc-1695-2739-719d-dce1f0686406(at 192.168.1.136@o2ib), waiting for 31 known clients (3 recovered, 0 in progress, and 0 evicted) to recover in 71579:57 [ 621.816145] Lustre: soaked-MDT0002: Denying connection for new client 62ac41a6-bf69-92f4-befc-577b5fee6b59(at 192.168.1.138@o2ib), waiting for 31 known clients (3 recovered, 0 in progress, and 0 evicted) to recover in 71579:56 [ 621.888798] Lustre: Skipped 1 previous similar message [ 624.206173] Lustre: soaked-MDT0002: Denying connection for new client 3f82d520-6a51-5083-afbc-e5ea95b116e5(at 192.168.1.142@o2ib), waiting for 31 known clients (3 recovered, 0 in progress, and 0 evicted) to recover in 71579:54 [ 624.279133] Lustre: Skipped 3 previous similar messages [ 628.413382] Lustre: soaked-MDT0002: Denying connection for new client 470fa3e8-31ff-e6f2-850c-c32c2fccf087(at 192.168.1.120@o2ib), waiting for 31 known clients (3 recovered, 0 in progress, and 0 evicted) to recover in 71579:49 [ 628.486832] Lustre: Skipped 4 previous similar messages [ 630.186023] Lustre: 2476:0:(ldlm_lib.c:2029:target_recovery_overseer()) recovery is aborted, evict exports in recovery [ 630.248533] Lustre: soaked-MDT0002: disconnecting 28 stale clients [ 630.275436] Lustre: soaked-MDT0002: Recovery over after 3:00, of 31 clients 3 recovered and 28 were evicted. [ 637.790744] Lustre: soaked-MDT0002: Connection restored to a4e0fd52-cac9-d111-1b57-d9203252395d (at 192.168.1.135@o2ib) [ 637.829810] Lustre: Skipped 31 previous similar messages [ 738.737321] LNet: 2239:0:(o2iblnd_cb.c:3198:kiblnd_check_conns()) Timed out tx for 192.168.1.115@o2ib: 0 seconds [ 738.862656] LNet: 2239:0:(o2iblnd_cb.c:3198:kiblnd_check_conns()) Skipped 1 previous similar message [ 834.963313] Lustre: soaked-OST0004-osc-MDT0002: Connection restored to 192.168.1.106@o2ib (at 192.168.1.106@o2ib) [ 834.978267] Lustre: Skipped 16 previous similar messages [ 930.946444] LustreError: 2580:0:(lod_qos.c:858:lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size ) failed: [ 930.946461] LustreError: 2578:0:(lod_qos.c:858:lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size ) failed: [ 930.946465] LustreError: 2578:0:(lod_qos.c:858:lod_comp_ost_in_use()) LBUG [ 930.946466] Pid: 2578, comm: mdt01_012 [ 930.946467] Call Trace: [ 930.946474] LustreError: 2539:0:(lod_qos.c:858:lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size ) failed: [ 930.946480] LustreError: 2539:0:(lod_qos.c:858:lod_comp_ost_in_use()) LBUG [ 930.946482] Pid: 2539, comm: mdt00_011 [ 930.946483] 930.946465] LustreError: 2578:0:(lod_qos.c:858:lod_comp_ost_in_use()) LBUG [ 930.946466] Pid: 2578, comm: mdt01_012 [ 930.946467] Call Trace: [ 930.946474] LustreError: 2539:0:(lod_qos.c:858:lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size ) failed: [ 930.946480] LustreError: 2539:0:(lod_qos.c:858:lod_comp_ost_in_use()) LBUG [ 930.946482] Pid: 2539, comm: mdt00_011 [ 930.946483] Call Trace: [ 930.946488] LustreError: 2538:0:(lod_qos.c:858:lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size ) failed: [ 930.946493] LustreError: 2538:0:(lod_qos.c:858:lod_comp_ost_in_use()) LBUG [ 930.946494] Pid: 2538, comm: mdt00_010 [ 930.946495] Call Trace: [ 930.946498] [<ffffffffc09737ae>] libcfs_call_trace+0x4e/0x60 [libcfs] [ 930.946504] LustreError: 2536:0:(lod_qos.c:858:lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size ) failed: [ 930.946508] LustreError: 2536:0:(lod_qos.c:858:lod_comp_ost_in_use()) LBUG [ 930.946510] Pid: 2536, comm: mdt00_009 [ 930.946510] Call Trace: [ 930.946512] [<ffffffffc097383c>] lbug_with_loc+0x4c/0xb0 [libcfs] [ 930.946523] [<ffffffffc09737ae>] libcfs_call_trace+0x4e/0x60 [libcfs] [ 930.946525] [<ffffffffc09737ae>] libcfs_call_trace+0x4e/0x60 [libcfs] [ 930.946530] LustreError: 2586:0:(lod_qos.c:858:lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size ) failed: [ 930.946531] [<ffffffffc09737ae>] libcfs_call_trace+0x4e/0x60 [libcfs] [ 930.946538] LustreError: 2530:0:(lod_qos.c:858:lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size ) failed: [ 930.946540] LustreError: 2586:0:(lod_qos.c:858:lod_comp_ost_in_use()) LBUG [ 930.946543] [<ffffffffc185e858>] lod_env_info.part.10+0x0/0x36 [lod] [ 930.946545] [<ffffffffc097383c>] lbug_with_loc+0x4c/0xb0 [libcfs] [ 930.946549] LustreError: 2530:0:(lod_qos.c:858:lod_comp_ost_in_use()) LBUG [ 930.946551] Pid: System crash dumped, dump available on soak. |
| Comments |
| Comment by Peter Jones [ 22/Dec/17 ] |
|
Bobijam Can you please look into this one? Thanks Peter |
| Comment by Oleg Drokin [ 22/Dec/17 ] |
|
Unrelated, but wow, quite a timeout! "to recover in 71579:49" |
| Comment by Cliff White (Inactive) [ 22/Dec/17 ] |
|
We see this sort of timeout frequently on soak. There are several bugs |
| Comment by Zhenyu Xu [ 25/Dec/17 ] |
|
What's the gerrit page of this build? I want to check whether this image contains the |
| Comment by Gerrit Updater [ 17/Jan/18 ] |
|
Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/30889 |
| Comment by Gerrit Updater [ 25/Jan/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30889/ |
| Comment by Peter Jones [ 25/Jan/18 ] |
|
Landed for 2.11 |