Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6271

(osc_cache.c:3150:discard_cb()) ASSERTION( (!(page->cp_type == CPT_CACHEABLE) || (!PageDirty(cl_page_vmpage(page)))) ) failed:

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.8.0
    • Lustre 2.7.0
    • Hyperon
    • 3
    • 17581

    Description

      While performing failover testing:

      LustreError: 167-0: lustre-OST0000-osc-ffff88083b339000: This client was evicted by lustre-OST0000; in progress operations using this service will fail.
      Feb 23 09:22:48 iwc260 kernel: LustreError: 51585:0:(ldlm_resource.c:777:ldlm_resource_complain()) lustre-OST0000-osc-ffff88083b339000: namespace resource [0x2d097bb:0x0:0x0].0 (ffff88106a992e40) refcount nonzero (1) after lock cleanup; forcing cleanup.
      Feb 23 09:22:48 iwc260 kernel: LustreError: 51585:0:(ldlm_resource.c:1374:ldlm_resource_dump()) --- Resource: [0x2d097bb:0x0:0x0].0 (ffff88106a992e40) refcount = 3
      Feb 23 09:22:48 iwc260 kernel: LustreError: 51585:0:(ldlm_resource.c:1377:ldlm_resource_dump()) Granted locks (in reverse order):
      Feb 23 09:22:48 iwc260 kernel: LustreError: 51585:0:(ldlm_resource.c:1380:ldlm_resource_dump()) ### ### ns: lustre-OST0000-osc-ffff88083b339000 lock: ffff88106a60d540/0xa1f402a8812988f4 lrc: 3/0,1 mode: PW/PW res: [0x2d097bb:0x0:0x0].0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->8191) flags: 0x126400020000 nid: local remote: 0x72a9f4c80e66e07b expref: -99 pid: 51570 timeout: 0 lvb_type: 1
      Feb 23 09:22:48 iwc260 kernel: Lustre: lustre-OST0000-osc-ffff88083b339000: Connection restored to lustre-OST0000 (at 192.168.120.13@o2ib)
      Feb 23 09:22:49 iwc260 kernel: LustreError: 90252:0:(osc_cache.c:3150:discard_cb()) ASSERTION( (!(page->cp_type == CPT_CACHEABLE) || (!PageDirty(cl_page_vmpage(page)))) ) failed:
      Feb 23 09:22:49 iwc260 kernel: LustreError: 90252:0:(osc_cache.c:3150:discard_cb()) ASSERTION( (!(page->cp_type == CPT_CACHEABLE) || (!PageDirty(cl_page_vmpage(page)))) ) failed:
      Feb 23 09:22:49 iwc260 kernel: LustreError: 90252:0:(osc_cache.c:3150:discard_cb()) LBUG
      Feb 23 09:22:49 iwc260 kernel: LustreError: 90252:0:(osc_cache.c:3150:discard_cb()) LBUG
      Feb 23 09:22:49 iwc260 kernel: Pid: 90252, comm: ldlm_bl_36
      Feb 23 09:22:49 iwc260 kernel:
      Feb 23 09:22:49 iwc260 kernel: Call Trace:
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0435895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0435e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b2ba56>] discard_cb+0x156/0x190 [osc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b2bdcc>] osc_page_gang_lookup+0x1ac/0x330 [osc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b2b900>] ? discard_cb+0x0/0x190 [osc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b2c094>] osc_lock_discard_pages+0x144/0x240 [osc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa04461c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b2b900>] ? discard_cb+0x0/0x190 [osc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b2298b>] osc_lock_flush+0x8b/0x260 [osc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b22e08>] osc_ldlm_blocking_ast+0x2a8/0x3c0 [osc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0761a6c>] ldlm_cancel_callback+0x6c/0x170 [ptlrpc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa077450a>] ldlm_cli_cancel_local+0x8a/0x470 [ptlrpc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0779120>] ldlm_cli_cancel+0x60/0x360 [ptlrpc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b22c3b>] osc_ldlm_blocking_ast+0xdb/0x3c0 [osc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa04461c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa077cb60>] ldlm_handle_bl_callback+0x130/0x400 [ptlrpc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa077d0c1>] ldlm_bl_thread_main+0x291/0x3f0 [ptlrpc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa077ce30>] ? ldlm_bl_thread_main+0x0/0x3f0 [ptlrpc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffff8109abf6>] kthread+0x96/0xa0
      Feb 23 09:22:49 iwc260 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
      Feb 23 09:22:49 iwc260 kernel: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
      Feb 23 09:22:49 iwc260 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      Attachments

        1. iwc189.lbug.log.txt.gz
          0.2 kB
        2. iwc260.log.gz
          4.81 MB
        3. l-23.LU-6271.txt.gz
          0.2 kB
        4. lustre-log.iwc37.txt
          329 kB
        5. r2i1n4.messages.gz
          96 kB
        6. server_messages.tar.gz
          924 kB

        Issue Links

          Activity

            [LU-6271] (osc_cache.c:3150:discard_cb()) ASSERTION( (!(page->cp_type == CPT_CACHEABLE) || (!PageDirty(cl_page_vmpage(page)))) ) failed:

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16727/
            Subject: LU-6271 osc: faulty assertion in osc_object_prune()
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: c5e84d53198142919b526fd304c6782005e38bdb

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16727/ Subject: LU-6271 osc: faulty assertion in osc_object_prune() Project: fs/lustre-release Branch: master Current Patch Set: Commit: c5e84d53198142919b526fd304c6782005e38bdb

            Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: http://review.whamcloud.com/16727
            Subject: LU-6271 osc: check oo_npages inside spin lock
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e220fbbfa93069a783430d15a26dc3518006bc1c

            gerrit Gerrit Updater added a comment - Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: http://review.whamcloud.com/16727 Subject: LU-6271 osc: check oo_npages inside spin lock Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e220fbbfa93069a783430d15a26dc3518006bc1c

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16456/
            Subject: LU-6271 osc: further OSC cleanup after eviction
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e8b421531c166b91ab5c1f417570c544bcdd050c

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16456/ Subject: LU-6271 osc: further OSC cleanup after eviction Project: fs/lustre-release Branch: master Current Patch Set: Commit: e8b421531c166b91ab5c1f417570c544bcdd050c
            jay Jinshan Xiong (Inactive) added a comment - - edited

            Hi Dongyang, I took a look at your latest patch 14915 and I suggest you shouldn't apply this patch to your own site because it's not production ready yet.

            jay Jinshan Xiong (Inactive) added a comment - - edited Hi Dongyang, I took a look at your latest patch 14915 and I suggest you shouldn't apply this patch to your own site because it's not production ready yet.

            this problem can only happen at eviction so even patch 14989 is good enough for daily use.

            jay Jinshan Xiong (Inactive) added a comment - this problem can only happen at eviction so even patch 14989 is good enough for daily use.
            jay Jinshan Xiong (Inactive) added a comment - - edited

            Hi Dongyang, the problem may be fixed in patch set 7 but feel free to use patch 14915 on your own site.

            jay Jinshan Xiong (Inactive) added a comment - - edited Hi Dongyang, the problem may be fixed in patch set 7 but feel free to use patch 14915 on your own site.

            With patch set 6 pf 16456, the client won't crash with the gplck reproducer and single eviction.
            However, the client crashes if:
            1. run the gplck reproducer, evict the client once.
            2. after client reconnects, don't kill the gplck, run another instance of gplck with a different gid.
            3. evict the client again, the client crashes with the same stack trace ASSERTION( !ext->oe_hp ) failed

            This makes me think that the problem should be fixed in the ldlm layer like http://review.whamcloud.com/#/c/14915/
            not in the osc layer.
            Thanks
            Dongyang

            lidongyang Li Dongyang (Inactive) added a comment - With patch set 6 pf 16456, the client won't crash with the gplck reproducer and single eviction. However, the client crashes if: 1. run the gplck reproducer, evict the client once. 2. after client reconnects, don't kill the gplck, run another instance of gplck with a different gid. 3. evict the client again, the client crashes with the same stack trace ASSERTION( !ext->oe_hp ) failed This makes me think that the problem should be fixed in the ldlm layer like http://review.whamcloud.com/#/c/14915/ not in the osc layer. Thanks Dongyang

            Thanks. Please check if the latest patch can fix the problem

            jay Jinshan Xiong (Inactive) added a comment - Thanks. Please check if the latest patch can fix the problem

            Hi Jinshan,
            See my comment earlier on 28/Jul/15 8:39 PM.

            lidongyang Li Dongyang (Inactive) added a comment - Hi Jinshan, See my comment earlier on 28/Jul/15 8:39 PM.

            Hi Dongyang, please upload your test cases.

            jay Jinshan Xiong (Inactive) added a comment - Hi Dongyang, please upload your test cases.

            OK I've been running the scripts with patchset 5 of 16456 for some time, seems the general IO is fine.
            But when I tried the group lock, the client crashed immediately after eviction:

            <0>LustreError: 1697:0:(osc_cache.c:2907:osc_cache_writeback_range()) ASSERTION( !ext->oe_hp ) failed: 
            <0>LustreError: 1697:0:(osc_cache.c:2907:osc_cache_writeback_range()) LBUG
            <4>Pid: 1697, comm: ldlm_bl_11
            <4>
            <4>Call Trace:
            <4> [<ffffffffa021d875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
            <4> [<ffffffffa021de77>] lbug_with_loc+0x47/0xb0 [libcfs]
            <4> [<ffffffffa0922085>] osc_cache_writeback_range+0x1275/0x1280 [osc]
            <4> [<ffffffffa090b545>] osc_lock_flush+0x175/0x260 [osc]
            <4> [<ffffffffa090b8d8>] osc_ldlm_blocking_ast+0x2a8/0x3c0 [osc]
            <4> [<ffffffffa050ba57>] ldlm_cancel_callback+0x87/0x280 [ptlrpc]
            <4> [<ffffffff81060530>] ? __dequeue_entity+0x30/0x50
            <4> [<ffffffff8100969d>] ? __switch_to+0x7d/0x340
            <4> [<ffffffffa051e84a>] ldlm_cli_cancel_local+0x8a/0x470 [ptlrpc]
            <4> [<ffffffffa05234bc>] ldlm_cli_cancel+0x9c/0x3e0 [ptlrpc]
            <4> [<ffffffffa090b70b>] osc_ldlm_blocking_ast+0xdb/0x3c0 [osc]
            <4> [<ffffffff810672c2>] ? default_wake_function+0x12/0x20
            <4> [<ffffffffa0527400>] ldlm_handle_bl_callback+0x130/0x400 [ptlrpc]
            <4> [<ffffffffa0527ee4>] ldlm_bl_thread_main+0x484/0x700 [ptlrpc]
            <4> [<ffffffff810672b0>] ? default_wake_function+0x0/0x20
            <4> [<ffffffffa0527a60>] ? ldlm_bl_thread_main+0x0/0x700 [ptlrpc]
            <4> [<ffffffff810a101e>] kthread+0x9e/0xc0
            <4> [<ffffffff8100c28a>] child_rip+0xa/0x20
            <4> [<ffffffff810a0f80>] ? kthread+0x0/0xc0
            <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
            <4>
            <0>Kernel panic - not syncing: LBUG
            <4>Pid: 1697, comm: ldlm_bl_11 Not tainted 2.6.32-573.3.1.el6.x86_64 #1
            <4>Call Trace:
            <4> [<ffffffff81537c54>] ? panic+0xa7/0x16f
            <4> [<ffffffffa021decb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
            <4> [<ffffffffa0922085>] ? osc_cache_writeback_range+0x1275/0x1280 [osc]
            <4> [<ffffffffa090b545>] ? osc_lock_flush+0x175/0x260 [osc]
            <4> [<ffffffffa090b8d8>] ? osc_ldlm_blocking_ast+0x2a8/0x3c0 [osc]
            <4> [<ffffffffa050ba57>] ? ldlm_cancel_callback+0x87/0x280 [ptlrpc]
            <4> [<ffffffff81060530>] ? __dequeue_entity+0x30/0x50
            <4> [<ffffffff8100969d>] ? __switch_to+0x7d/0x340
            <4> [<ffffffffa051e84a>] ? ldlm_cli_cancel_local+0x8a/0x470 [ptlrpc]
            <4> [<ffffffffa05234bc>] ? ldlm_cli_cancel+0x9c/0x3e0 [ptlrpc]
            <4> [<ffffffffa090b70b>] ? osc_ldlm_blocking_ast+0xdb/0x3c0 [osc]
            <4> [<ffffffff810672c2>] ? default_wake_function+0x12/0x20
            <4> [<ffffffffa0527400>] ? ldlm_handle_bl_callback+0x130/0x400 [ptlrpc]
            <4> [<ffffffffa0527ee4>] ? ldlm_bl_thread_main+0x484/0x700 [ptlrpc]
            <4> [<ffffffff810672b0>] ? default_wake_function+0x0/0x20
            <4> [<ffffffffa0527a60>] ? ldlm_bl_thread_main+0x0/0x700 [ptlrpc]
            <4> [<ffffffff810a101e>] ? kthread+0x9e/0xc0
            <4> [<ffffffff8100c28a>] ? child_rip+0xa/0x20
            <4> [<ffffffff810a0f80>] ? kthread+0x0/0xc0
            <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
            
            lidongyang Li Dongyang (Inactive) added a comment - OK I've been running the scripts with patchset 5 of 16456 for some time, seems the general IO is fine. But when I tried the group lock, the client crashed immediately after eviction: <0>LustreError: 1697:0:(osc_cache.c:2907:osc_cache_writeback_range()) ASSERTION( !ext->oe_hp ) failed: <0>LustreError: 1697:0:(osc_cache.c:2907:osc_cache_writeback_range()) LBUG <4>Pid: 1697, comm: ldlm_bl_11 <4> <4>Call Trace: <4> [<ffffffffa021d875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa021de77>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa0922085>] osc_cache_writeback_range+0x1275/0x1280 [osc] <4> [<ffffffffa090b545>] osc_lock_flush+0x175/0x260 [osc] <4> [<ffffffffa090b8d8>] osc_ldlm_blocking_ast+0x2a8/0x3c0 [osc] <4> [<ffffffffa050ba57>] ldlm_cancel_callback+0x87/0x280 [ptlrpc] <4> [<ffffffff81060530>] ? __dequeue_entity+0x30/0x50 <4> [<ffffffff8100969d>] ? __switch_to+0x7d/0x340 <4> [<ffffffffa051e84a>] ldlm_cli_cancel_local+0x8a/0x470 [ptlrpc] <4> [<ffffffffa05234bc>] ldlm_cli_cancel+0x9c/0x3e0 [ptlrpc] <4> [<ffffffffa090b70b>] osc_ldlm_blocking_ast+0xdb/0x3c0 [osc] <4> [<ffffffff810672c2>] ? default_wake_function+0x12/0x20 <4> [<ffffffffa0527400>] ldlm_handle_bl_callback+0x130/0x400 [ptlrpc] <4> [<ffffffffa0527ee4>] ldlm_bl_thread_main+0x484/0x700 [ptlrpc] <4> [<ffffffff810672b0>] ? default_wake_function+0x0/0x20 <4> [<ffffffffa0527a60>] ? ldlm_bl_thread_main+0x0/0x700 [ptlrpc] <4> [<ffffffff810a101e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c28a>] child_rip+0xa/0x20 <4> [<ffffffff810a0f80>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20 <4> <0>Kernel panic - not syncing: LBUG <4>Pid: 1697, comm: ldlm_bl_11 Not tainted 2.6.32-573.3.1.el6.x86_64 #1 <4>Call Trace: <4> [<ffffffff81537c54>] ? panic+0xa7/0x16f <4> [<ffffffffa021decb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] <4> [<ffffffffa0922085>] ? osc_cache_writeback_range+0x1275/0x1280 [osc] <4> [<ffffffffa090b545>] ? osc_lock_flush+0x175/0x260 [osc] <4> [<ffffffffa090b8d8>] ? osc_ldlm_blocking_ast+0x2a8/0x3c0 [osc] <4> [<ffffffffa050ba57>] ? ldlm_cancel_callback+0x87/0x280 [ptlrpc] <4> [<ffffffff81060530>] ? __dequeue_entity+0x30/0x50 <4> [<ffffffff8100969d>] ? __switch_to+0x7d/0x340 <4> [<ffffffffa051e84a>] ? ldlm_cli_cancel_local+0x8a/0x470 [ptlrpc] <4> [<ffffffffa05234bc>] ? ldlm_cli_cancel+0x9c/0x3e0 [ptlrpc] <4> [<ffffffffa090b70b>] ? osc_ldlm_blocking_ast+0xdb/0x3c0 [osc] <4> [<ffffffff810672c2>] ? default_wake_function+0x12/0x20 <4> [<ffffffffa0527400>] ? ldlm_handle_bl_callback+0x130/0x400 [ptlrpc] <4> [<ffffffffa0527ee4>] ? ldlm_bl_thread_main+0x484/0x700 [ptlrpc] <4> [<ffffffff810672b0>] ? default_wake_function+0x0/0x20 <4> [<ffffffffa0527a60>] ? ldlm_bl_thread_main+0x0/0x700 [ptlrpc] <4> [<ffffffff810a101e>] ? kthread+0x9e/0xc0 <4> [<ffffffff8100c28a>] ? child_rip+0xa/0x20 <4> [<ffffffff810a0f80>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20

            People

              jay Jinshan Xiong (Inactive)
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: