[LU-10429] soak, LBUG lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size ) failed: Created: 22/Dec/17  Updated: 26/Feb/19  Resolved: 25/Jan/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: Lustre 2.11.0

Type: Bug Priority: Critical
Reporter: Cliff White (Inactive) Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: soak
Environment:

Soak test cluster, tip of master plus patch for LU-10321


Attachments: Text File vmcore-dmesg.txt    
Issue Links:
Duplicate
is duplicated by LU-11935 MDS hit LBUG: (lod_qos.c:862:lod_comp... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

MDS completed a failover/failback sequence, then LBUGged.Bogus recovery timer

[  573.543826] Lustre: soaked-MDT0002: Connection restored to soaked-MDT0002-lwp-OST0000_UUID (at 192.168.1.102@o2ib)
[  573.580891] Lustre: Skipped 11 previous similar messages
[  620.036501] Lustre: soaked-MDT0002: Denying connection for new client b3e244d6-b85b-a278-a032-0b483389bc28(at 192.168.1.116@o2ib), waiting for 31 known clients (3 recovered, 0 in progress, and 0 evicted) to recover in 71579:58
[  620.590408] Lustre: soaked-MDT0002: Denying connection for new client b894efcc-1695-2739-719d-dce1f0686406(at 192.168.1.136@o2ib), waiting for 31 known clients (3 recovered, 0 in progress, and 0 evicted) to recover in 71579:57
[  621.816145] Lustre: soaked-MDT0002: Denying connection for new client 62ac41a6-bf69-92f4-befc-577b5fee6b59(at 192.168.1.138@o2ib), waiting for 31 known clients (3 recovered, 0 in progress, and 0 evicted) to recover in 71579:56
[  621.888798] Lustre: Skipped 1 previous similar message
[  624.206173] Lustre: soaked-MDT0002: Denying connection for new client 3f82d520-6a51-5083-afbc-e5ea95b116e5(at 192.168.1.142@o2ib), waiting for 31 known clients (3 recovered, 0 in progress, and 0 evicted) to recover in 71579:54
[  624.279133] Lustre: Skipped 3 previous similar messages
[  628.413382] Lustre: soaked-MDT0002: Denying connection for new client 470fa3e8-31ff-e6f2-850c-c32c2fccf087(at 192.168.1.120@o2ib), waiting for 31 known clients (3 recovered, 0 in progress, and 0 evicted) to recover in 71579:49
[  628.486832] Lustre: Skipped 4 previous similar messages
[  630.186023] Lustre: 2476:0:(ldlm_lib.c:2029:target_recovery_overseer()) recovery is aborted, evict exports in recovery
[  630.248533] Lustre: soaked-MDT0002: disconnecting 28 stale clients
[  630.275436] Lustre: soaked-MDT0002: Recovery over after 3:00, of 31 clients 3 recovered and 28 were evicted.
[  637.790744] Lustre: soaked-MDT0002: Connection restored to a4e0fd52-cac9-d111-1b57-d9203252395d (at 192.168.1.135@o2ib)
[  637.829810] Lustre: Skipped 31 previous similar messages
[  738.737321] LNet: 2239:0:(o2iblnd_cb.c:3198:kiblnd_check_conns()) Timed out tx for 192.168.1.115@o2ib: 0 seconds
[  738.862656] LNet: 2239:0:(o2iblnd_cb.c:3198:kiblnd_check_conns()) Skipped 1 previous similar message
[  834.963313] Lustre: soaked-OST0004-osc-MDT0002: Connection restored to 192.168.1.106@o2ib (at 192.168.1.106@o2ib)
[  834.978267] Lustre: Skipped 16 previous similar messages
[  930.946444] LustreError: 2580:0:(lod_qos.c:858:lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size ) failed:
[  930.946461] LustreError: 2578:0:(lod_qos.c:858:lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size ) failed:
[  930.946465] LustreError: 2578:0:(lod_qos.c:858:lod_comp_ost_in_use()) LBUG
[  930.946466] Pid: 2578, comm: mdt01_012
[  930.946467]
Call Trace:
[  930.946474] LustreError: 2539:0:(lod_qos.c:858:lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size ) failed:
[  930.946480] LustreError: 2539:0:(lod_qos.c:858:lod_comp_ost_in_use()) LBUG
[  930.946482] Pid: 2539, comm: mdt00_011
[  930.946483]
  930.946465] LustreError: 2578:0:(lod_qos.c:858:lod_comp_ost_in_use()) LBUG
[  930.946466] Pid: 2578, comm: mdt01_012
[  930.946467]
Call Trace:
[  930.946474] LustreError: 2539:0:(lod_qos.c:858:lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size ) failed:
[  930.946480] LustreError: 2539:0:(lod_qos.c:858:lod_comp_ost_in_use()) LBUG
[  930.946482] Pid: 2539, comm: mdt00_011
[  930.946483]
Call Trace:
[  930.946488] LustreError: 2538:0:(lod_qos.c:858:lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size ) failed:
[  930.946493] LustreError: 2538:0:(lod_qos.c:858:lod_comp_ost_in_use()) LBUG
[  930.946494] Pid: 2538, comm: mdt00_010
[  930.946495]
Call Trace:   
[  930.946498]  [<ffffffffc09737ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
[  930.946504] LustreError: 2536:0:(lod_qos.c:858:lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size ) failed:
[  930.946508] LustreError: 2536:0:(lod_qos.c:858:lod_comp_ost_in_use()) LBUG
[  930.946510] Pid: 2536, comm: mdt00_009
[  930.946510]
Call Trace:   
[  930.946512]  [<ffffffffc097383c>] lbug_with_loc+0x4c/0xb0 [libcfs]
[  930.946523]  [<ffffffffc09737ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
[  930.946525]  [<ffffffffc09737ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
[  930.946530] LustreError: 2586:0:(lod_qos.c:858:lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size ) failed:
[  930.946531]  [<ffffffffc09737ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
[  930.946538] LustreError: 2530:0:(lod_qos.c:858:lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size ) failed:
[  930.946540] LustreError: 2586:0:(lod_qos.c:858:lod_comp_ost_in_use()) LBUG
[  930.946543]  [<ffffffffc185e858>] lod_env_info.part.10+0x0/0x36 [lod]
[  930.946545]  [<ffffffffc097383c>] lbug_with_loc+0x4c/0xb0 [libcfs]
[  930.946549] LustreError: 2530:0:(lod_qos.c:858:lod_comp_ost_in_use()) LBUG
[  930.946551] Pid:

System crash dumped, dump available on soak.



 Comments   
Comment by Peter Jones [ 22/Dec/17 ]

Bobijam

Can you please look into this one?

Thanks

Peter

Comment by Oleg Drokin [ 22/Dec/17 ]

Unrelated, but wow, quite a timeout! "to recover in 71579:49"

Comment by Cliff White (Inactive) [ 22/Dec/17 ]

We see this sort of timeout frequently on soak. There are several bugs

Comment by Zhenyu Xu [ 25/Dec/17 ]

What's the gerrit page of this build? I want to check whether this image contains the LU-10297 fix.

Comment by Gerrit Updater [ 17/Jan/18 ]

Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/30889
Subject: LU-10429 lod: LBUG lod_comp_ost_in_use()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8a8a913468db56862c526d2153b45f0471f70a3f

Comment by Gerrit Updater [ 25/Jan/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30889/
Subject: LU-10429 lod: LBUG lod_comp_ost_in_use()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 487cb7382571e090739455167ce0d7ff6d17c57f

Comment by Peter Jones [ 25/Jan/18 ]

Landed for 2.11

Generated at Sat Feb 10 02:35:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.