[LU-11463] DOM: osc_extent_make_ready()) ASSERTION( last_oap_count > 0 ) failed Created: 03/Oct/18 Updated: 23/Jan/21 |
|
| Status: | Reopened |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Upstream |
| Type: | Bug | Priority: | Major |
| Reporter: | Mikhail Pershin | Assignee: | Mikhail Pershin |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
Separate DOM-related issue from the original one in |
| Comments |
| Comment by Mikhail Pershin [ 03/Oct/18 ] |
|
Copy of message from Oleg about this issue: [35616.314978] LustreError: 11162:0:(osc_cache.c:1141:osc_extent_make_ready()) ASSERTION( last_oap_count > 0 ) failed: [35616.322765] LustreError: 11162:0:(osc_cache.c:1141:osc_extent_make_ready()) LBUG [35616.326801] Pid: 11162, comm: ldlm_bl_05 3.10.0-7.5-debug #1 SMP Sun Jun 3 13:35:38 EDT 2018 [35616.329285] Call Trace: [35616.330443] [<ffffffffa01eb7dc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [35616.331870] [<ffffffffa01eb88c>] lbug_with_loc+0x4c/0xa0 [libcfs] [35616.334288] [<ffffffffa0d3eda6>] osc_extent_make_ready+0x936/0xe70 [osc] [35616.335640] [<ffffffffa0d45ab3>] osc_cache_writeback_range+0x4f3/0x1260 [osc] [35616.337641] [<ffffffffa0e0bac7>] mdc_lock_flush+0x2e7/0x3f0 [mdc] [35616.339070] [<ffffffffa0e0bfb4>] mdc_ldlm_blocking_ast+0x2f4/0x3f0 [mdc] [35616.343073] [<ffffffffa0b01bd4>] ldlm_cancel_callback+0x84/0x320 [ptlrpc] [35616.344357] [<ffffffffa0b189b0>] ldlm_cli_cancel_local+0xa0/0x420 [ptlrpc] [35616.345719] [<ffffffffa0b1e7e7>] ldlm_cli_cancel+0x157/0x620 [ptlrpc] [35616.347470] [<ffffffffa0e0be4a>] mdc_ldlm_blocking_ast+0x18a/0x3f0 [mdc] [35616.348765] [<ffffffffa0b2a67f>] ldlm_handle_bl_callback+0xff/0x530 [ptlrpc] [35616.351095] [<ffffffffa0b2afb1>] ldlm_bl_thread_main+0x501/0x680 [ptlrpc] [35616.352495] [<ffffffff810ae864>] kthread+0xe4/0xf0 [35616.353757] [<ffffffff81783777>] ret_from_fork_nospec_end+0x0/0x39 |
| Comment by James A Simmons [ 07/Jan/20 ] |
|
I can easily reproduce this with the linux lustre client |
| Comment by Mikhail Pershin [ 08/Jan/20 ] |
|
James, this issue wasn't seen for a long time, looks like client is missing some related fixes. Where can I check its commit history? |
| Comment by James A Simmons [ 08/Jan/20 ] |
|
Currently I have a local kernel tree that is up to Lustre 2.12.0. Do you know patches fixed this issue? |
| Comment by James A Simmons [ 10/Feb/20 ] |
|
One of the patches that landed during 2.13 fixed this issue. We can reopen if it shows up again. |
| Comment by James A Simmons [ 26/Feb/20 ] |
|
Sorry but add direct I/O support to the linux client exposes this problem again. I can easily reproduce it with sanity.sh test |
| Comment by Mikhail Pershin [ 26/Feb/20 ] |
|
James, could you describe how to reproduce that, is that happening in some particular test or occasionally? Any specific parameters for testing? |
| Comment by Arshad Hussain [ 12/Apr/20 ] |
|
James, if possible could you please point me the patch that fixed it initially. I also faced this while doing testing locally of my own patch (https://review.whamcloud.com/#/c/9275/) test 150b triggered it. I want to get hints what and how it got fixed. This is on the latest master : 742897a [ 6559.142453] LustreError: 18592:0:(osc_cache.c:1124:osc_extent_make_ready()) ASSERTION( last_oap_count > 0 ) failed: [ 6559.142454] LustreError: 18592:0:(osc_cache.c:1124:osc_extent_make_ready()) LBUG [ 6559.142456] Pid: 18592, comm: ptlrpcd_00_01 3.10.0-957.el7_lustre.x86_64 #1 SMP Fri Dec 21 21:49:33 UTC 2018 [ 6559.142456] Call Trace: [ 6559.142477] [<ffffffffc0475e4c>] libcfs_call_trace+0x8c/0xc0 [libcfs] [ 6559.142481] [<ffffffffc0475efc>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 6559.142490] [<ffffffffc0b6334c>] osc_extent_make_ready+0xb4c/0xe40 [osc] [ 6559.142496] [<ffffffffc0b662eb>] osc_io_unplug0+0xe9b/0x18b0 [osc] [ 6559.142501] [<ffffffffc0b3f773>] brw_queue_work+0x33/0xd0 [osc] [ 6559.142537] [<ffffffffc08f27fa>] work_interpreter+0x3a/0xf0 [ptlrpc] [ 6559.142561] [<ffffffffc08fb220>] ptlrpc_check_set+0x510/0x1ed0 [ptlrpc] [ 6559.142587] [<ffffffffc0929e0b>] ptlrpcd_check+0x4ab/0x590 [ptlrpc] [ 6559.142611] [<ffffffffc092a0a0>] ptlrpcd+0x1b0/0x6a0 [ptlrpc] [ 6559.142614] [<ffffffff876c1c31>] kthread+0xd1/0xe0 [ 6559.142617] [<ffffffff87d74c37>] ret_from_fork_nospec_end+0x0/0x39 [ 6559.142631] [<ffffffffffffffff>] 0xffffffffffffffff [ 6559.142631] Kernel panic - not syncing: LBUG [ 6559.142634] CPU: 1 PID: 18592 Comm: ptlrpcd_00_01 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7_lustre.x86_64 #1 [ 6559.142635] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 6559.142635] Call Trace: [ 6559.142639] [<ffffffff87d61dc1>] dump_stack+0x19/0x1b [ 6559.142641] [<ffffffff87d5b4d0>] panic+0xe8/0x21f [ 6559.142646] [<ffffffffc0475f4b>] lbug_with_loc+0x9b/0xa0 [libcfs] [ 6559.142653] [<ffffffffc0b6334c>] osc_extent_make_ready+0xb4c/0xe40 [osc] [ 6559.142672] [<ffffffffc0b662eb>] osc_io_unplug0+0xe9b/0x18b0 [osc] [ 6559.142677] [<ffffffffc0b3f773>] brw_queue_work+0x33/0xd0 [osc] [ 6559.142699] [<ffffffffc08f27fa>] work_interpreter+0x3a/0xf0 [ptlrpc] [ 6559.142721] [<ffffffffc08fb220>] ptlrpc_check_set+0x510/0x1ed0 [ptlrpc] [ 6559.142746] [<ffffffffc0929e0b>] ptlrpcd_check+0x4ab/0x590 [ptlrpc] [ 6559.142769] [<ffffffffc092a0a0>] ptlrpcd+0x1b0/0x6a0 [ptlrpc] [ 6559.142772] [<ffffffff8762a59e>] ? __switch_to+0xce/0x580 [ 6559.142773] [<ffffffff876c2d00>] ? wake_up_atomic_t+0x30/0x30 [ 6559.142796] [<ffffffffc0929ef0>] ? ptlrpcd_check+0x590/0x590 [ptlrpc] [ 6559.142798] [<ffffffff876c1c31>] kthread+0xd1/0xe0 [ 6559.142799] [<ffffffff876c1b60>] ? insert_kthread_work+0x40/0x40 [ 6559.142801] [<ffffffff87d74c37>] ret_from_fork_nospec_begin+0x21/0x21 [ 6559.142802] [<ffffffff876c1b60>] ? insert_kthread_work+0x40/0x40 |
| Comment by Alex Zhuravlev [ 11/Jan/21 ] |
|
https://testing.whamcloud.com/test_sessions/0f2a364a-716a-4eec-9786-3fa81b3143c2 |
| Comment by James A Simmons [ 11/Jan/21 ] |
|
I can easily reproduce it with the Linux client (https://github.com/jasimmons1973/lustre). If you run sanity test 398b it crashes every time. Also sanity-flr.sh test 200. |
| Comment by Mikhail Pershin [ 11/Jan/21 ] |
|
Please note that Alex report and James comment is not about DoM as ticket says, I assume |
| Comment by James A Simmons [ 23/Jan/21 ] |
|
I'm stilling seeing failures with the Linux client. It might be specific to the Linux version so please keep this ticket open. |