Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11463

DOM: osc_extent_make_ready()) ASSERTION( last_oap_count > 0 ) failed

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Upstream
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Separate DOM-related issue from the original one in LU-10407 because there can be different reasons.

      Attachments

        Issue Links

          Activity

            [LU-11463] DOM: osc_extent_make_ready()) ASSERTION( last_oap_count > 0 ) failed

            This failure in the upstream client was due to the interval implementation we had with kms handling. It was fixed in the OpenSFS tree port and that has been back ported to the native Client. I do seem a very similar error but its not kms related.

            simmonsja James A Simmons added a comment - This failure in the upstream client was due to the interval implementation we had with kms handling. It was fixed in the OpenSFS tree port and that has been back ported to the native Client. I do seem a very similar error but its not kms related.

            I'm stilling seeing failures with the Linux client. It might be specific to the Linux version so please keep this ticket open.

            simmonsja James A Simmons added a comment - I'm stilling seeing failures with the Linux client. It might be specific to the Linux version so please keep this ticket open.

            Please note that Alex report and James comment is not about DoM as ticket says, I assume LU-10407 is proper place to report non-DOM issue of this kind

            tappro Mikhail Pershin added a comment - Please note that Alex report and James comment is not about DoM as ticket says, I assume LU-10407 is proper place to report non-DOM issue of this kind

            I can easily reproduce it with the Linux client (https://github.com/jasimmons1973/lustre). If you run sanity test 398b it crashes every time. Also sanity-flr.sh test 200.

            simmonsja James A Simmons added a comment - I can easily reproduce it with the Linux client ( https://github.com/jasimmons1973/lustre).  If you run sanity test 398b it crashes every time. Also sanity-flr.sh test 200.
            bzzz Alex Zhuravlev added a comment - https://testing.whamcloud.com/test_sessions/0f2a364a-716a-4eec-9786-3fa81b3143c2
            arshad512 Arshad Hussain added a comment - - edited

            James, if possible could you please point me the patch that fixed it initially. I also faced this while doing testing locally of my own patch (https://review.whamcloud.com/#/c/9275/) test 150b triggered it. I want to get hints what and how it got fixed. This is on the latest master : 742897a LU-13274 uapi: make lnet UAPI headers C99 compliant

            [ 6559.142453] LustreError: 18592:0:(osc_cache.c:1124:osc_extent_make_ready()) ASSERTION( last_oap_count > 0 ) failed:
            [ 6559.142454] LustreError: 18592:0:(osc_cache.c:1124:osc_extent_make_ready()) LBUG
            [ 6559.142456] Pid: 18592, comm: ptlrpcd_00_01 3.10.0-957.el7_lustre.x86_64 #1 SMP Fri Dec 21 21:49:33 UTC 2018
            [ 6559.142456] Call Trace:
            [ 6559.142477] [<ffffffffc0475e4c>] libcfs_call_trace+0x8c/0xc0 [libcfs]
            [ 6559.142481] [<ffffffffc0475efc>] lbug_with_loc+0x4c/0xa0 [libcfs]
            [ 6559.142490] [<ffffffffc0b6334c>] osc_extent_make_ready+0xb4c/0xe40 [osc]
            [ 6559.142496] [<ffffffffc0b662eb>] osc_io_unplug0+0xe9b/0x18b0 [osc]
            [ 6559.142501] [<ffffffffc0b3f773>] brw_queue_work+0x33/0xd0 [osc]
            [ 6559.142537] [<ffffffffc08f27fa>] work_interpreter+0x3a/0xf0 [ptlrpc]
            [ 6559.142561] [<ffffffffc08fb220>] ptlrpc_check_set+0x510/0x1ed0 [ptlrpc]
            [ 6559.142587] [<ffffffffc0929e0b>] ptlrpcd_check+0x4ab/0x590 [ptlrpc]
            [ 6559.142611] [<ffffffffc092a0a0>] ptlrpcd+0x1b0/0x6a0 [ptlrpc]
            [ 6559.142614] [<ffffffff876c1c31>] kthread+0xd1/0xe0
            [ 6559.142617] [<ffffffff87d74c37>] ret_from_fork_nospec_end+0x0/0x39
            [ 6559.142631] [<ffffffffffffffff>] 0xffffffffffffffff
            [ 6559.142631] Kernel panic - not syncing: LBUG
            [ 6559.142634] CPU: 1 PID: 18592 Comm: ptlrpcd_00_01 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7_lustre.x86_64 #1
            [ 6559.142635] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
            [ 6559.142635] Call Trace:
            [ 6559.142639] [<ffffffff87d61dc1>] dump_stack+0x19/0x1b
            [ 6559.142641] [<ffffffff87d5b4d0>] panic+0xe8/0x21f
            [ 6559.142646] [<ffffffffc0475f4b>] lbug_with_loc+0x9b/0xa0 [libcfs]
            [ 6559.142653] [<ffffffffc0b6334c>] osc_extent_make_ready+0xb4c/0xe40 [osc]
            [ 6559.142672] [<ffffffffc0b662eb>] osc_io_unplug0+0xe9b/0x18b0 [osc]
            [ 6559.142677] [<ffffffffc0b3f773>] brw_queue_work+0x33/0xd0 [osc]
            [ 6559.142699] [<ffffffffc08f27fa>] work_interpreter+0x3a/0xf0 [ptlrpc]
            [ 6559.142721] [<ffffffffc08fb220>] ptlrpc_check_set+0x510/0x1ed0 [ptlrpc]
            [ 6559.142746] [<ffffffffc0929e0b>] ptlrpcd_check+0x4ab/0x590 [ptlrpc]
            [ 6559.142769] [<ffffffffc092a0a0>] ptlrpcd+0x1b0/0x6a0 [ptlrpc]
            [ 6559.142772] [<ffffffff8762a59e>] ? __switch_to+0xce/0x580
            [ 6559.142773] [<ffffffff876c2d00>] ? wake_up_atomic_t+0x30/0x30
            [ 6559.142796] [<ffffffffc0929ef0>] ? ptlrpcd_check+0x590/0x590 [ptlrpc]
            [ 6559.142798] [<ffffffff876c1c31>] kthread+0xd1/0xe0
            [ 6559.142799] [<ffffffff876c1b60>] ? insert_kthread_work+0x40/0x40
            [ 6559.142801] [<ffffffff87d74c37>] ret_from_fork_nospec_begin+0x21/0x21
            [ 6559.142802] [<ffffffff876c1b60>] ? insert_kthread_work+0x40/0x40
            arshad512 Arshad Hussain added a comment - - edited James, if possible could you please point me the patch that fixed it initially. I also faced this while doing testing locally of my own patch ( https://review.whamcloud.com/#/c/9275/) test 150b triggered it. I want to get hints what and how it got fixed. This is on the latest master : 742897a LU-13274 uapi: make lnet UAPI headers C99 compliant [ 6559.142453] LustreError: 18592:0:(osc_cache.c:1124:osc_extent_make_ready()) ASSERTION( last_oap_count > 0 ) failed: [ 6559.142454] LustreError: 18592:0:(osc_cache.c:1124:osc_extent_make_ready()) LBUG [ 6559.142456] Pid: 18592, comm: ptlrpcd_00_01 3.10.0-957.el7_lustre.x86_64 #1 SMP Fri Dec 21 21:49:33 UTC 2018 [ 6559.142456] Call Trace: [ 6559.142477] [<ffffffffc0475e4c>] libcfs_call_trace+0x8c/0xc0 [libcfs] [ 6559.142481] [<ffffffffc0475efc>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 6559.142490] [<ffffffffc0b6334c>] osc_extent_make_ready+0xb4c/0xe40 [osc] [ 6559.142496] [<ffffffffc0b662eb>] osc_io_unplug0+0xe9b/0x18b0 [osc] [ 6559.142501] [<ffffffffc0b3f773>] brw_queue_work+0x33/0xd0 [osc] [ 6559.142537] [<ffffffffc08f27fa>] work_interpreter+0x3a/0xf0 [ptlrpc] [ 6559.142561] [<ffffffffc08fb220>] ptlrpc_check_set+0x510/0x1ed0 [ptlrpc] [ 6559.142587] [<ffffffffc0929e0b>] ptlrpcd_check+0x4ab/0x590 [ptlrpc] [ 6559.142611] [<ffffffffc092a0a0>] ptlrpcd+0x1b0/0x6a0 [ptlrpc] [ 6559.142614] [<ffffffff876c1c31>] kthread+0xd1/0xe0 [ 6559.142617] [<ffffffff87d74c37>] ret_from_fork_nospec_end+0x0/0x39 [ 6559.142631] [<ffffffffffffffff>] 0xffffffffffffffff [ 6559.142631] Kernel panic - not syncing: LBUG [ 6559.142634] CPU: 1 PID: 18592 Comm: ptlrpcd_00_01 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7_lustre.x86_64 #1 [ 6559.142635] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 6559.142635] Call Trace: [ 6559.142639] [<ffffffff87d61dc1>] dump_stack+0x19/0x1b [ 6559.142641] [<ffffffff87d5b4d0>] panic+0xe8/0x21f [ 6559.142646] [<ffffffffc0475f4b>] lbug_with_loc+0x9b/0xa0 [libcfs] [ 6559.142653] [<ffffffffc0b6334c>] osc_extent_make_ready+0xb4c/0xe40 [osc] [ 6559.142672] [<ffffffffc0b662eb>] osc_io_unplug0+0xe9b/0x18b0 [osc] [ 6559.142677] [<ffffffffc0b3f773>] brw_queue_work+0x33/0xd0 [osc] [ 6559.142699] [<ffffffffc08f27fa>] work_interpreter+0x3a/0xf0 [ptlrpc] [ 6559.142721] [<ffffffffc08fb220>] ptlrpc_check_set+0x510/0x1ed0 [ptlrpc] [ 6559.142746] [<ffffffffc0929e0b>] ptlrpcd_check+0x4ab/0x590 [ptlrpc] [ 6559.142769] [<ffffffffc092a0a0>] ptlrpcd+0x1b0/0x6a0 [ptlrpc] [ 6559.142772] [<ffffffff8762a59e>] ? __switch_to+0xce/0x580 [ 6559.142773] [<ffffffff876c2d00>] ? wake_up_atomic_t+0x30/0x30 [ 6559.142796] [<ffffffffc0929ef0>] ? ptlrpcd_check+0x590/0x590 [ptlrpc] [ 6559.142798] [<ffffffff876c1c31>] kthread+0xd1/0xe0 [ 6559.142799] [<ffffffff876c1b60>] ? insert_kthread_work+0x40/0x40 [ 6559.142801] [<ffffffff87d74c37>] ret_from_fork_nospec_begin+0x21/0x21 [ 6559.142802] [<ffffffff876c1b60>] ? insert_kthread_work+0x40/0x40

            James, could you describe how to reproduce that, is that happening in some particular test or occasionally? Any specific parameters for testing?

            tappro Mikhail Pershin added a comment - James, could you describe how to reproduce that, is that happening in some particular test or occasionally? Any specific parameters for testing?

            Sorry but add direct I/O support to the linux client exposes this problem again. I can easily reproduce it with sanity.sh test 

            simmonsja James A Simmons added a comment - Sorry but add direct I/O support to the linux client exposes this problem again. I can easily reproduce it with sanity.sh test 

            One of the patches that landed during 2.13 fixed this issue. We can reopen if it shows up again.

            simmonsja James A Simmons added a comment - One of the patches that landed during 2.13 fixed this issue. We can reopen if it shows up again.

            Currently I have a local kernel tree that is up to Lustre 2.12.0. Do you know patches fixed this issue?

            simmonsja James A Simmons added a comment - Currently I have a local kernel tree that is up to Lustre 2.12.0. Do you know patches fixed this issue?

            People

              tappro Mikhail Pershin
              tappro Mikhail Pershin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: