Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1723

osc_page_delete()) ASSERTION(0) failed

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.1.2
    • None
    • Client RHEL-6.2 2.6.32-220.23.1.el6.x86_64
    • 3
    • 4210

    Description

      LBUG on production cluster while node was running a user job.

      Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) page@ffff88106917b5c0[1 ffff8806726e8f48:1263 ^(null)_ffff88106917b500 4 0 1 (null) (null) 0x1]
      Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) page@ffff88106917b500[1 ffff880770b2c4c8:1263 ^ffff88106917b5c0_(null) 4 0 1 (null) (null) 0x0]
      Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) vvp-page@ffff8809d88c6be0(1:0:0) vm@ffffea0034f6ce20 c0000000000001 3:0 0 1263 lru
      Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) lov-page@ffff8809f22e93a8
      Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) osc-page@ffff881067d54bc8: 1< 0x845fed 1 0 - - + > 2< 5173248 0 4096 0x0 0x8 | (null) ffff8806d7f24688 ffff88016958f700 ffffffffa07ff>
      Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) end page@ffff88106917b5c0
      Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) Trying to teardown failed: -16
      Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:426:osc_page_delete()) ASSERTION(0) failed
      Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:426:osc_page_delete()) LBUG
      Aug 8 16:00:02 sand-4-52 kernel: Pid: 64082, comm: calculate_propa
      Aug 8 16:00:02 sand-4-52 kernel:
      Aug 8 16:00:02 sand-4-52 kernel: Call Trace:
      Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa0446855>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa0446e95>] lbug_with_loc+0x75/0xe0 [libcfs]
      Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa0451d86>] libcfs_assertion_failed+0x66/0x70 [libcfs]
      Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa07f8df6>] osc_page_delete+0x236/0x240 [osc]
      Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa0553d1e>] cl_page_delete0+0xce/0x400 [obdclass]
      Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa055128e>] ? cl_env_get+0x19e/0x350 [obdclass]
      Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa0550bf6>] ? cl_env_peek+0x36/0x110 [obdclass]
      Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa055408d>] cl_page_delete+0x3d/0xf0 [obdclass]
      Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa0560bde>] ? cl_io_is_going+0xe/0x20 [obdclass]
      Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa08e791b>] ll_releasepage+0x10b/0x150 [lustre]
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff81168ff0>] ? mem_cgroup_uncharge_cache_page+0x10/0x20
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff811100b0>] try_to_release_page+0x30/0x60
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8112a4f1>] shrink_page_list.clone.0+0x4f1/0x5c0
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8112a8bb>] shrink_inactive_list+0x2fb/0x740
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8112b5cf>] shrink_zone+0x38f/0x520
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8112c374>] zone_reclaim+0x354/0x410
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8112cfc0>] ? isolate_pages_global+0x0/0x350
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff81122874>] get_page_from_freelist+0x694/0x820
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff81123af1>] __alloc_pages_nodemask+0x111/0x940
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8116a728>] ? __mem_cgroup_try_charge+0x78/0x420
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff811586ca>] alloc_pages_vma+0x9a/0x150
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff81172015>] do_huge_pmd_anonymous_page+0x145/0x370
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8113c79a>] handle_mm_fault+0x25a/0x2b0
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff81042c29>] __do_page_fault+0x139/0x480
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8100988e>] ? __switch_to+0x26e/0x320
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff814ed250>] ? thread_return+0x4e/0x76e
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff814f2c8e>] do_page_fault+0x3e/0xa0
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff814f0045>] page_fault+0x25/0x30
      Aug 8 16:00:03 sand-4-52 kernel:
      Aug 8 16:00:03 sand-4-52 kernel: LustreError: dumping log to /tmp/lustre-log.1344438003.64082

      Attachments

        Activity

          [LU-1723] osc_page_delete()) ASSERTION(0) failed

          OK. Thank you. It just looks a lot alike.
          I will keep reading code.

          2012-11-06T14:57:10.872333-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_cache.c:2367:osc_teardown_async_page()) extent ffff88060eedfe58@

          {[23 -> 23/255], [2|0|-|cache|wi|ffff88020e18f8c8], [4096|1|+|-|ffff8801f9ed9c18|256| (null)]}

          trunc at 23.
          2012-11-06T14:57:10.872389-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) page@ffff88080e8c8bc0[2 ffff88020cefad08:23 ^ (null)_ffff88080e8c8b00 4 0 1 (null) (null) 0x0]
          2012-11-06T14:57:10.902445-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) page@ffff88080e8c8b00[1 ffff88020d8c1f58:23 ^ffff88080e8c8bc0_ (null) 4 0 1 (null) (null) 0x0]
          2012-11-06T14:57:10.902494-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) vvp-page@ffff88080e8cf5a0(0:0:0) vm@ffffea001c6a99b8 e00000000000063 7:0 0 23 lru
          2012-11-06T14:57:10.902508-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) lov-page@ffff8808050a7888
          2012-11-06T14:57:10.958005-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) osc-page@ffff88080d78ed20: 1< 0x845fed 258 0 + - > 2< 94208 0 4096 0x0 0x520 | (null) ffff880804350700 ffff88020e18f8c8 > 3< + ffff8801f8f577b0 0 0 0 > 4< 0 0 8 33824768 - | - - + - > 5< - - + - | 0 - | 1 - ->
          2012-11-06T14:57:10.958049-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) end page@ffff88080e8c8bc0
          2012-11-06T14:57:10.983504-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) Trying to teardown failed: -16
          2012-11-06T14:57:10.983536-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:412:osc_page_delete()) ASSERTION( 0 ) failed:
          2012-11-06T14:57:10.983549-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:412:osc_page_delete()) LBUG
          2012-11-06T14:57:10.983570-06:00 c0-0c1s6n0 Pid: 5270, comm: fsx-linux
          2012-11-06T14:57:10.983582-06:00 c0-0c1s6n0 Call Trace:
          2012-11-06T14:57:10.983605-06:00 c0-0c1s6n0 [<ffffffff810063b1>] try_stack_unwind+0x161/0x1a0
          2012-11-06T14:57:11.009114-06:00 c0-0c1s6n0 [<ffffffff81004bf9>] dump_trace+0x89/0x440
          2012-11-06T14:57:11.009138-06:00 c0-0c1s6n0 [<ffffffffa014e887>] libcfs_debug_dumpstack+0x57/0x80 [libcfs]
          2012-11-06T14:57:11.009161-06:00 c0-0c1s6n0 [<ffffffffa014ede7>] lbug_with_loc+0x47/0xc0 [libcfs]
          2012-11-06T14:57:11.009173-06:00 c0-0c1s6n0 [<ffffffffa0671d41>] osc_page_delete+0x2d1/0x2e0 [osc]
          2012-11-06T14:57:11.034700-06:00 c0-0c1s6n0 [<ffffffffa02b0095>] cl_page_delete0+0xd5/0x4e0 [obdclass]
          2012-11-06T14:57:11.034737-06:00 c0-0c1s6n0 [<ffffffffa02b04e2>] cl_page_delete+0x42/0x120 [obdclass]
          2012-11-06T14:57:11.034762-06:00 c0-0c1s6n0 [<ffffffffa07f2e2d>] ll_invalidatepage+0x8d/0x170 [lustre]
          2012-11-06T14:57:11.034774-06:00 c0-0c1s6n0 [<ffffffffa07ea290>] ll_page_mkwrite+0x7c0/0x840 [lustre]
          2012-11-06T14:57:11.034820-06:00 c0-0c1s6n0 [<ffffffff81107cb7>] __do_fault+0xe7/0x570
          2012-11-06T14:57:11.034833-06:00 c0-0c1s6n0 [<ffffffff811081e4>] handle_pte_fault+0xa4/0xcd0
          2012-11-06T14:57:11.060561-06:00 c0-0c1s6n0 [<ffffffff81108fbe>] handle_mm_fault+0x1ae/0x240
          2012-11-06T14:57:11.060588-06:00 c0-0c1s6n0 [<ffffffff81025471>] do_page_fault+0x191/0x410
          2012-11-06T14:57:11.060600-06:00 c0-0c1s6n0 [<ffffffff81301b5f>] page_fault+0x1f/0x30
          2012-11-06T14:57:11.060647-06:00 c0-0c1s6n0 [<00000000200422b3>] 0x200422b3
          2012-11-06T14:57:11.060660-06:00 c0-0c1s6n0 Kernel panic - not syncing: LBUG

          bkorb Bruce Korb (Inactive) added a comment - OK. Thank you. It just looks a lot alike. I will keep reading code. 2012-11-06T14:57:10.872333-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_cache.c:2367:osc_teardown_async_page()) extent ffff88060eedfe58@ {[23 -> 23/255], [2|0|-|cache|wi|ffff88020e18f8c8], [4096|1|+|-|ffff8801f9ed9c18|256| (null)]} trunc at 23. 2012-11-06T14:57:10.872389-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) page@ffff88080e8c8bc0 [2 ffff88020cefad08:23 ^ (null)_ffff88080e8c8b00 4 0 1 (null) (null) 0x0] 2012-11-06T14:57:10.902445-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) page@ffff88080e8c8b00 [1 ffff88020d8c1f58:23 ^ffff88080e8c8bc0_ (null) 4 0 1 (null) (null) 0x0] 2012-11-06T14:57:10.902494-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) vvp-page@ffff88080e8cf5a0(0:0:0) vm@ffffea001c6a99b8 e00000000000063 7:0 0 23 lru 2012-11-06T14:57:10.902508-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) lov-page@ffff8808050a7888 2012-11-06T14:57:10.958005-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) osc-page@ffff88080d78ed20: 1< 0x845fed 258 0 + - > 2< 94208 0 4096 0x0 0x520 | (null) ffff880804350700 ffff88020e18f8c8 > 3< + ffff8801f8f577b0 0 0 0 > 4< 0 0 8 33824768 - | - - + - > 5< - - + - | 0 - | 1 - -> 2012-11-06T14:57:10.958049-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) end page@ffff88080e8c8bc0 2012-11-06T14:57:10.983504-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) Trying to teardown failed: -16 2012-11-06T14:57:10.983536-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:412:osc_page_delete()) ASSERTION( 0 ) failed: 2012-11-06T14:57:10.983549-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:412:osc_page_delete()) LBUG 2012-11-06T14:57:10.983570-06:00 c0-0c1s6n0 Pid: 5270, comm: fsx-linux 2012-11-06T14:57:10.983582-06:00 c0-0c1s6n0 Call Trace: 2012-11-06T14:57:10.983605-06:00 c0-0c1s6n0 [<ffffffff810063b1>] try_stack_unwind+0x161/0x1a0 2012-11-06T14:57:11.009114-06:00 c0-0c1s6n0 [<ffffffff81004bf9>] dump_trace+0x89/0x440 2012-11-06T14:57:11.009138-06:00 c0-0c1s6n0 [<ffffffffa014e887>] libcfs_debug_dumpstack+0x57/0x80 [libcfs] 2012-11-06T14:57:11.009161-06:00 c0-0c1s6n0 [<ffffffffa014ede7>] lbug_with_loc+0x47/0xc0 [libcfs] 2012-11-06T14:57:11.009173-06:00 c0-0c1s6n0 [<ffffffffa0671d41>] osc_page_delete+0x2d1/0x2e0 [osc] 2012-11-06T14:57:11.034700-06:00 c0-0c1s6n0 [<ffffffffa02b0095>] cl_page_delete0+0xd5/0x4e0 [obdclass] 2012-11-06T14:57:11.034737-06:00 c0-0c1s6n0 [<ffffffffa02b04e2>] cl_page_delete+0x42/0x120 [obdclass] 2012-11-06T14:57:11.034762-06:00 c0-0c1s6n0 [<ffffffffa07f2e2d>] ll_invalidatepage+0x8d/0x170 [lustre] 2012-11-06T14:57:11.034774-06:00 c0-0c1s6n0 [<ffffffffa07ea290>] ll_page_mkwrite+0x7c0/0x840 [lustre] 2012-11-06T14:57:11.034820-06:00 c0-0c1s6n0 [<ffffffff81107cb7>] __do_fault+0xe7/0x570 2012-11-06T14:57:11.034833-06:00 c0-0c1s6n0 [<ffffffff811081e4>] handle_pte_fault+0xa4/0xcd0 2012-11-06T14:57:11.060561-06:00 c0-0c1s6n0 [<ffffffff81108fbe>] handle_mm_fault+0x1ae/0x240 2012-11-06T14:57:11.060588-06:00 c0-0c1s6n0 [<ffffffff81025471>] do_page_fault+0x191/0x410 2012-11-06T14:57:11.060600-06:00 c0-0c1s6n0 [<ffffffff81301b5f>] page_fault+0x1f/0x30 2012-11-06T14:57:11.060647-06:00 c0-0c1s6n0 [<00000000200422b3>] 0x200422b3 2012-11-06T14:57:11.060660-06:00 c0-0c1s6n0 Kernel panic - not syncing: LBUG

          this is a different issue from what have described by Oleg

          jay Jinshan Xiong (Inactive) added a comment - this is a different issue from what have described by Oleg

          OK, well we (Cory at Cray) see the same symptom, even after the LU-2170 patch.
          So there must be another cause for this.

          > Denis Kondratenko added a comment - 06/Nov/12 4:31 AM
          > Wally,
          > was this fixed by http://jira.whamcloud.com/browse/LU-2170 ?
          > [...]
          > Wally Wang added a comment - 06/Nov/12 3:19 PM
          > No, attached is the SP2 version of the cdump and it is with the LU-2170 applied.

          The SP2 version of the cdump was, unfortunately, corrupt, so I've been futzing
          with the pre-SP2 version. That earlier dump may well have been caused by the
          issue fixed in LU-2170, but the later dump not. (Presuming Wally is correct about
          the patch having been applied.)

          I can attach the SP2 console log, if you like. But not the "lctl dk" dump,
          pending another crash dump.

          bkorb Bruce Korb (Inactive) added a comment - OK, well we (Cory at Cray) see the same symptom, even after the LU-2170 patch. So there must be another cause for this. > Denis Kondratenko added a comment - 06/Nov/12 4:31 AM > Wally, > was this fixed by http://jira.whamcloud.com/browse/LU-2170 ? > [...] > Wally Wang added a comment - 06/Nov/12 3:19 PM > No, attached is the SP2 version of the cdump and it is with the LU-2170 applied. The SP2 version of the cdump was, unfortunately, corrupt, so I've been futzing with the pre-SP2 version. That earlier dump may well have been caused by the issue fixed in LU-2170 , but the later dump not. (Presuming Wally is correct about the patch having been applied.) I can attach the SP2 console log, if you like. But not the "lctl dk" dump, pending another crash dump.

          In LU-2170, I discovered that a write lock can be matched and enqueued by a glimpse process so that the write lock will be marked with INTENT(this write lock was wrongly converted to glimpse lock) in the enqueue RPC, then the lock was denied by OST, then the client calls cl_lock_delete() to delete this lock and grants cl_lock to make it go through with glimpse. After glimpse process finishes to use this lock, it will be deleted w/o canceling.

          Depending on the time when this lock is found by other process, difference symptoms will show up. For example, in osc_extent_merge(), I discovered that one of the extent's oe_lock pointed to that "glimpse write" lock, and another extent had the normal one. For the osc_page_delete() case, that was because one dirty page was added into the "glimpse write" lock, since there is no cancel for this lock, the page still remained in the cache after the lock was deleted. Afterwards, when a read lock was enqueued and canceled, it discovered this page and then hit the assert.

          jay Jinshan Xiong (Inactive) added a comment - In LU-2170 , I discovered that a write lock can be matched and enqueued by a glimpse process so that the write lock will be marked with INTENT(this write lock was wrongly converted to glimpse lock) in the enqueue RPC, then the lock was denied by OST, then the client calls cl_lock_delete() to delete this lock and grants cl_lock to make it go through with glimpse. After glimpse process finishes to use this lock, it will be deleted w/o canceling. Depending on the time when this lock is found by other process, difference symptoms will show up. For example, in osc_extent_merge(), I discovered that one of the extent's oe_lock pointed to that "glimpse write" lock, and another extent had the normal one. For the osc_page_delete() case, that was because one dirty page was added into the "glimpse write" lock, since there is no cancel for this lock, the page still remained in the cache after the lock was deleted. Afterwards, when a read lock was enqueued and canceled, it discovered this page and then hit the assert.

          I am a little unclear as to how this was diagnosed as a duplicate of LU-2170.
          The assertions are different. This "assert(0)" was initiated because
          osc_page_delete() was called while the opg->ops_oap field referenced a
          osc_async_page structure with a non-empty oap_rpc_item list. (That caused
          osc_teardown_async_page to return EBUSY (16) and osc_page_delete handles
          that by asserting that zero is not zero.)

          So, I don't see the obvious reason the osc_extent_merge assertion would be
          seen as a duplicate of this issue.

          bkorb Bruce Korb (Inactive) added a comment - I am a little unclear as to how this was diagnosed as a duplicate of LU-2170 . The assertions are different. This "assert(0)" was initiated because osc_page_delete() was called while the opg->ops_oap field referenced a osc_async_page structure with a non-empty oap_rpc_item list. (That caused osc_teardown_async_page to return EBUSY (16) and osc_page_delete handles that by asserting that zero is not zero.) So, I don't see the obvious reason the osc_extent_merge assertion would be seen as a duplicate of this issue.

          duplicate of LU-2170

          jay Jinshan Xiong (Inactive) added a comment - duplicate of LU-2170
          green Oleg Drokin added a comment -

          I just hit this running racer on a small local fs (That gets ENOSPC easily):

          [12126.733752] LustreError: 11222:0:(vvp_io.c:1038:vvp_io_commit_write()) Write page 0 of inode ffff88004431db20 failed -28
          [12126.734303] LustreError: 11222:0:(vvp_io.c:1038:vvp_io_commit_write()) Skipped 708 previous similar messages
          [12198.367486] LustreError: 26187:0:(osc_cache.c:2343:osc_teardown_async_page()) extent ffff88002223ef38@{[256 -> 402/511], [2|0|-|cache|wi|ffff880088af7eb0], [602112|147|+|-|ffff880032770ed8|256|(null)]} trunc at 256.
          [12198.368328] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) page@ffff880011a6def0[2 ffff8800169d7f88:256 ^(null)_ffff8800868dfef0 4 0 1 (null) (null) 0x0]
          [12198.369065] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) page@ffff8800868dfef0[1 ffff88004b855f98:256 ^ffff880011a6def0_(null) 4 0 1 (null) (null) 0x0]
          [12198.369829] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) vvp-page@ffff88009f775af8(0:0:0) vm@ffffea00011fc5f8 20000000000035 2:0 0 256 lru
          [12198.370386] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) lov-page@ffff880010cfbae8
          [12198.370856] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) osc-page@ffff88008e13ef08: 1< 0x845fed 258 0 + - > 2< 1048576 0 4096 0x0 0x520 | (null) ffff88006f5c8cb0 ffff880088af7eb0 > 3< + ffff880051112540 0 0 0 > 4< 0 0 8 35164160 - | - - + - > 5< - - + - | 0 - | 147 - ->
          [12198.371911] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) end page@ffff880011a6def0
          [12198.372377] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) Trying to teardown failed: -16
          [12198.372850] LustreError: 26187:0:(osc_page.c:412:osc_page_delete()) ASSERTION( 0 ) failed: 
          [12198.373329] LustreError: 26187:0:(osc_page.c:412:osc_page_delete()) LBUG
          [12198.373602] Pid: 26187, comm: ldlm_bl_21
          [12198.373826] 
          [12198.373827] Call Trace:
          [12198.374850]  [<ffffffffa0caa915>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
          [12198.375155]  [<ffffffffa0caaf27>] lbug_with_loc+0x47/0xb0 [libcfs]
          [12198.375436]  [<ffffffffa04cec1d>] osc_page_delete+0x29d/0x2a0 [osc]
          [12198.375752]  [<ffffffffa11d8e65>] cl_page_delete0+0xd5/0x500 [obdclass]
          [12198.376052]  [<ffffffffa11d92d2>] cl_page_delete+0x42/0x120 [obdclass]
          [12198.376357]  [<ffffffffa0ac37cd>] ll_invalidatepage+0x8d/0x180 [lustre]
          [12198.376692]  [<ffffffffa0ad592f>] vvp_page_discard+0x9f/0x140 [lustre]
          [12198.376989]  [<ffffffffa11d6358>] cl_page_invoid+0x78/0x170 [obdclass]
          [12198.377307]  [<ffffffffa11d68a8>] ? cl_page_invoke+0x158/0x1c0 [obdclass]
          [12198.377607]  [<ffffffffa11d6463>] cl_page_discard+0x13/0x20 [obdclass]
          [12198.377903]  [<ffffffffa11e08ae>] check_and_discard_cb+0x13e/0x160 [obdclass]
          [12198.378220]  [<ffffffffa11db54c>] cl_page_gang_lookup+0x1fc/0x410 [obdclass]
          [12198.378527]  [<ffffffffa11e0770>] ? check_and_discard_cb+0x0/0x160 [obdclass]
          [12198.378834]  [<ffffffffa11e0770>] ? check_and_discard_cb+0x0/0x160 [obdclass]
          [12198.379144]  [<ffffffffa11de03e>] cl_lock_discard_pages+0x11e/0x1f0 [obdclass]
          [12198.379607]  [<ffffffffa11d33d5>] ? cl_env_info+0x15/0x20 [obdclass]
          [12198.379888]  [<ffffffffa04cfb80>] osc_lock_flush+0x110/0x200 [osc]
          [12198.380167]  [<ffffffffa04cfcc9>] osc_lock_cancel+0x59/0x1a0 [osc]
          [12198.380454]  [<ffffffffa11dbda5>] cl_lock_cancel0+0x75/0x160 [obdclass]
          [12198.380752]  [<ffffffffa11dca0b>] cl_lock_cancel+0x13b/0x140 [obdclass]
          [12198.381045]  [<ffffffffa04d19ba>] osc_ldlm_blocking_ast+0x13a/0x380 [osc]
          [12198.381368]  [<ffffffffa12dbaf3>] ldlm_handle_bl_callback+0x123/0x2e0 [ptlrpc]
          [12198.381831]  [<ffffffffa12dbf31>] ldlm_bl_thread_main+0x281/0x3e0 [ptlrpc]
          [12198.382119]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
          [12198.382408]  [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
          [12198.382692]  [<ffffffff8100c14a>] child_rip+0xa/0x20
          [12198.382958]  [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
          [12198.383826]  [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
          [12198.384126]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
          [12198.384378] 
          [12198.455840] Kernel panic - not syncing: LBUG
          [12198.456115] Pid: 26187, comm: ldlm_bl_21 Not tainted 2.6.32-debug #6
          [12198.456423] Call Trace:
          [12198.456632]  [<ffffffff814f75e4>] ? panic+0xa0/0x168
          [12198.456887]  [<ffffffffa0caaf7b>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
          [12198.457201]  [<ffffffffa04cec1d>] ? osc_page_delete+0x29d/0x2a0 [osc]
          [12198.457516]  [<ffffffffa11d8e65>] ? cl_page_delete0+0xd5/0x500 [obdclass]
          [12198.457816]  [<ffffffffa11d92d2>] ? cl_page_delete+0x42/0x120 [obdclass]
          [12198.458127]  [<ffffffffa0ac37cd>] ? ll_invalidatepage+0x8d/0x180 [lustre]
          [12198.458420]  [<ffffffffa0ad592f>] ? vvp_page_discard+0x9f/0x140 [lustre]
          [12198.458718]  [<ffffffffa11d6358>] ? cl_page_invoid+0x78/0x170 [obdclass]
          [12198.459014]  [<ffffffffa11d68a8>] ? cl_page_invoke+0x158/0x1c0 [obdclass]
          [12198.459328]  [<ffffffffa11d6463>] ? cl_page_discard+0x13/0x20 [obdclass]
          [12198.459627]  [<ffffffffa11e08ae>] ? check_and_discard_cb+0x13e/0x160 [obdclass]
          [12198.460126]  [<ffffffffa11db54c>] ? cl_page_gang_lookup+0x1fc/0x410 [obdclass]
          [12198.460617]  [<ffffffffa11e0770>] ? check_and_discard_cb+0x0/0x160 [obdclass]
          [12198.460970]  [<ffffffffa11e0770>] ? check_and_discard_cb+0x0/0x160 [obdclass]
          [12198.462162]  [<ffffffffa11de03e>] ? cl_lock_discard_pages+0x11e/0x1f0 [obdclass]
          [12198.462646]  [<ffffffffa11d33d5>] ? cl_env_info+0x15/0x20 [obdclass]
          [12198.462932]  [<ffffffffa04cfb80>] ? osc_lock_flush+0x110/0x200 [osc]
          [12198.463225]  [<ffffffffa04cfcc9>] ? osc_lock_cancel+0x59/0x1a0 [osc]
          [12198.463533]  [<ffffffffa11dbda5>] ? cl_lock_cancel0+0x75/0x160 [obdclass]
          [12198.463829]  [<ffffffffa11dca0b>] ? cl_lock_cancel+0x13b/0x140 [obdclass]
          [12198.464120]  [<ffffffffa04d19ba>] ? osc_ldlm_blocking_ast+0x13a/0x380 [osc]
          [12198.464436]  [<ffffffffa12dbaf3>] ? ldlm_handle_bl_callback+0x123/0x2e0 [ptlrpc]
          [12198.464898]  [<ffffffffa12dbf31>] ? ldlm_bl_thread_main+0x281/0x3e0 [ptlrpc]
          [12198.465240]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
          [12198.465533]  [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
          [12198.465815]  [<ffffffff8100c14a>] ? child_rip+0xa/0x20
          [12198.466084]  [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
          [12198.466393]  [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
          [12198.466699]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
          

          I have vmcore for this occurence

          green Oleg Drokin added a comment - I just hit this running racer on a small local fs (That gets ENOSPC easily): [12126.733752] LustreError: 11222:0:(vvp_io.c:1038:vvp_io_commit_write()) Write page 0 of inode ffff88004431db20 failed -28 [12126.734303] LustreError: 11222:0:(vvp_io.c:1038:vvp_io_commit_write()) Skipped 708 previous similar messages [12198.367486] LustreError: 26187:0:(osc_cache.c:2343:osc_teardown_async_page()) extent ffff88002223ef38@{[256 -> 402/511], [2|0|-|cache|wi|ffff880088af7eb0], [602112|147|+|-|ffff880032770ed8|256|(null)]} trunc at 256. [12198.368328] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) page@ffff880011a6def0[2 ffff8800169d7f88:256 ^(null)_ffff8800868dfef0 4 0 1 (null) (null) 0x0] [12198.369065] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) page@ffff8800868dfef0[1 ffff88004b855f98:256 ^ffff880011a6def0_(null) 4 0 1 (null) (null) 0x0] [12198.369829] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) vvp-page@ffff88009f775af8(0:0:0) vm@ffffea00011fc5f8 20000000000035 2:0 0 256 lru [12198.370386] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) lov-page@ffff880010cfbae8 [12198.370856] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) osc-page@ffff88008e13ef08: 1< 0x845fed 258 0 + - > 2< 1048576 0 4096 0x0 0x520 | (null) ffff88006f5c8cb0 ffff880088af7eb0 > 3< + ffff880051112540 0 0 0 > 4< 0 0 8 35164160 - | - - + - > 5< - - + - | 0 - | 147 - -> [12198.371911] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) end page@ffff880011a6def0 [12198.372377] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) Trying to teardown failed: -16 [12198.372850] LustreError: 26187:0:(osc_page.c:412:osc_page_delete()) ASSERTION( 0 ) failed: [12198.373329] LustreError: 26187:0:(osc_page.c:412:osc_page_delete()) LBUG [12198.373602] Pid: 26187, comm: ldlm_bl_21 [12198.373826] [12198.373827] Call Trace: [12198.374850] [<ffffffffa0caa915>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [12198.375155] [<ffffffffa0caaf27>] lbug_with_loc+0x47/0xb0 [libcfs] [12198.375436] [<ffffffffa04cec1d>] osc_page_delete+0x29d/0x2a0 [osc] [12198.375752] [<ffffffffa11d8e65>] cl_page_delete0+0xd5/0x500 [obdclass] [12198.376052] [<ffffffffa11d92d2>] cl_page_delete+0x42/0x120 [obdclass] [12198.376357] [<ffffffffa0ac37cd>] ll_invalidatepage+0x8d/0x180 [lustre] [12198.376692] [<ffffffffa0ad592f>] vvp_page_discard+0x9f/0x140 [lustre] [12198.376989] [<ffffffffa11d6358>] cl_page_invoid+0x78/0x170 [obdclass] [12198.377307] [<ffffffffa11d68a8>] ? cl_page_invoke+0x158/0x1c0 [obdclass] [12198.377607] [<ffffffffa11d6463>] cl_page_discard+0x13/0x20 [obdclass] [12198.377903] [<ffffffffa11e08ae>] check_and_discard_cb+0x13e/0x160 [obdclass] [12198.378220] [<ffffffffa11db54c>] cl_page_gang_lookup+0x1fc/0x410 [obdclass] [12198.378527] [<ffffffffa11e0770>] ? check_and_discard_cb+0x0/0x160 [obdclass] [12198.378834] [<ffffffffa11e0770>] ? check_and_discard_cb+0x0/0x160 [obdclass] [12198.379144] [<ffffffffa11de03e>] cl_lock_discard_pages+0x11e/0x1f0 [obdclass] [12198.379607] [<ffffffffa11d33d5>] ? cl_env_info+0x15/0x20 [obdclass] [12198.379888] [<ffffffffa04cfb80>] osc_lock_flush+0x110/0x200 [osc] [12198.380167] [<ffffffffa04cfcc9>] osc_lock_cancel+0x59/0x1a0 [osc] [12198.380454] [<ffffffffa11dbda5>] cl_lock_cancel0+0x75/0x160 [obdclass] [12198.380752] [<ffffffffa11dca0b>] cl_lock_cancel+0x13b/0x140 [obdclass] [12198.381045] [<ffffffffa04d19ba>] osc_ldlm_blocking_ast+0x13a/0x380 [osc] [12198.381368] [<ffffffffa12dbaf3>] ldlm_handle_bl_callback+0x123/0x2e0 [ptlrpc] [12198.381831] [<ffffffffa12dbf31>] ldlm_bl_thread_main+0x281/0x3e0 [ptlrpc] [12198.382119] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20 [12198.382408] [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc] [12198.382692] [<ffffffff8100c14a>] child_rip+0xa/0x20 [12198.382958] [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc] [12198.383826] [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc] [12198.384126] [<ffffffff8100c140>] ? child_rip+0x0/0x20 [12198.384378] [12198.455840] Kernel panic - not syncing: LBUG [12198.456115] Pid: 26187, comm: ldlm_bl_21 Not tainted 2.6.32-debug #6 [12198.456423] Call Trace: [12198.456632] [<ffffffff814f75e4>] ? panic+0xa0/0x168 [12198.456887] [<ffffffffa0caaf7b>] ? lbug_with_loc+0x9b/0xb0 [libcfs] [12198.457201] [<ffffffffa04cec1d>] ? osc_page_delete+0x29d/0x2a0 [osc] [12198.457516] [<ffffffffa11d8e65>] ? cl_page_delete0+0xd5/0x500 [obdclass] [12198.457816] [<ffffffffa11d92d2>] ? cl_page_delete+0x42/0x120 [obdclass] [12198.458127] [<ffffffffa0ac37cd>] ? ll_invalidatepage+0x8d/0x180 [lustre] [12198.458420] [<ffffffffa0ad592f>] ? vvp_page_discard+0x9f/0x140 [lustre] [12198.458718] [<ffffffffa11d6358>] ? cl_page_invoid+0x78/0x170 [obdclass] [12198.459014] [<ffffffffa11d68a8>] ? cl_page_invoke+0x158/0x1c0 [obdclass] [12198.459328] [<ffffffffa11d6463>] ? cl_page_discard+0x13/0x20 [obdclass] [12198.459627] [<ffffffffa11e08ae>] ? check_and_discard_cb+0x13e/0x160 [obdclass] [12198.460126] [<ffffffffa11db54c>] ? cl_page_gang_lookup+0x1fc/0x410 [obdclass] [12198.460617] [<ffffffffa11e0770>] ? check_and_discard_cb+0x0/0x160 [obdclass] [12198.460970] [<ffffffffa11e0770>] ? check_and_discard_cb+0x0/0x160 [obdclass] [12198.462162] [<ffffffffa11de03e>] ? cl_lock_discard_pages+0x11e/0x1f0 [obdclass] [12198.462646] [<ffffffffa11d33d5>] ? cl_env_info+0x15/0x20 [obdclass] [12198.462932] [<ffffffffa04cfb80>] ? osc_lock_flush+0x110/0x200 [osc] [12198.463225] [<ffffffffa04cfcc9>] ? osc_lock_cancel+0x59/0x1a0 [osc] [12198.463533] [<ffffffffa11dbda5>] ? cl_lock_cancel0+0x75/0x160 [obdclass] [12198.463829] [<ffffffffa11dca0b>] ? cl_lock_cancel+0x13b/0x140 [obdclass] [12198.464120] [<ffffffffa04d19ba>] ? osc_ldlm_blocking_ast+0x13a/0x380 [osc] [12198.464436] [<ffffffffa12dbaf3>] ? ldlm_handle_bl_callback+0x123/0x2e0 [ptlrpc] [12198.464898] [<ffffffffa12dbf31>] ? ldlm_bl_thread_main+0x281/0x3e0 [ptlrpc] [12198.465240] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20 [12198.465533] [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc] [12198.465815] [<ffffffff8100c14a>] ? child_rip+0xa/0x20 [12198.466084] [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc] [12198.466393] [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc] [12198.466699] [<ffffffff8100c140>] ? child_rip+0x0/0x20 I have vmcore for this occurence

          Maybe the output was truncated, from the output you posted:

          Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) osc-page@ffff881067d54bc8: 1< 0x845fed 1 0 - - + > 2< 5173248 0 4096 0x0 0x8 | (null) ffff8806d7f24688 ffff88016958f700 ffffffffa07ff>
          

          only paragraph 1 <...> and 2 <...> were printed, but I expect to see all 5 paragraphs.

          How easy you can reproduce this problem, if possible can you please trigger it and post the output on jira again?

          jay Jinshan Xiong (Inactive) added a comment - Maybe the output was truncated, from the output you posted: Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) osc-page@ffff881067d54bc8: 1< 0x845fed 1 0 - - + > 2< 5173248 0 4096 0x0 0x8 | (null) ffff8806d7f24688 ffff88016958f700 ffffffffa07ff> only paragraph 1 <...> and 2 <...> were printed, but I expect to see all 5 paragraphs. How easy you can reproduce this problem, if possible can you please trigger it and post the output on jira again?

          Hi Jinshan,

          I don't understand why you say it mismatches, can you elaborate please?

          wjt27 Wojciech Turek added a comment - Hi Jinshan, I don't understand why you say it mismatches, can you elaborate please?
          jay Jinshan Xiong (Inactive) added a comment - - edited

          Hi Wojciech Turek, Thanks. It looks like the code mismatches the output from your first comment.

          BTW, you can use:

          {code} /* C code */ {code}

          to quote source code on jira.

          jay Jinshan Xiong (Inactive) added a comment - - edited Hi Wojciech Turek, Thanks. It looks like the code mismatches the output from your first comment. BTW, you can use: {code} /* C code */ {code} to quote source code on jira.
          wjt27 Wojciech Turek added a comment - - edited
          static int osc_page_print(const struct lu_env *env,
                                    const struct cl_page_slice *slice,
                                    void *cookie, lu_printer_t printer)
          {
                  struct osc_page       *opg = cl2osc_page(slice);
                  struct osc_async_page *oap = &opg->ops_oap;
                  struct osc_object     *obj = cl2osc(slice->cpl_obj);
                  struct client_obd     *cli = &osc_export(obj)->exp_obd->u.cli;
                  struct lov_oinfo      *loi = obj->oo_oinfo;
          
                  return (*printer)(env, cookie, LUSTRE_OSC_NAME"-page@%p: "
                                    "1< %#x %d %u %s %s %s > "
                                    "2< "LPU64" %u %u %#x %#x | %p %p %p %p %p > "
                                    "3< %s %p %d %lu %d > "
                                    "4< %d %d %d %lu %s | %s %s %s %s > "
                                    "5< %s %s %s %s | %d %s %s | %d %s %s>\n",
                                    opg,
                                    /* 1 */
                                    oap->oap_magic, oap->oap_cmd,
                                    oap->oap_interrupted,
                                    osc_list(&oap->oap_pending_item),
                                    osc_list(&oap->oap_urgent_item),
                                    osc_list(&oap->oap_rpc_item),
                                    /* 2 */
                                    oap->oap_obj_off, oap->oap_page_off, oap->oap_count,
                                    oap->oap_async_flags, oap->oap_brw_flags,
                                    oap->oap_request,
                                    oap->oap_cli, oap->oap_loi, oap->oap_caller_ops,
                                    oap->oap_caller_data,
                                    /* 3 */
                                    osc_list(&opg->ops_inflight),
                                    opg->ops_submitter, opg->ops_transfer_pinned,
                                    osc_submit_duration(opg), opg->ops_srvlock,
                                    /* 4 */
                                    cli->cl_r_in_flight, cli->cl_w_in_flight,
                                    cli->cl_max_rpcs_in_flight,
                                    cli->cl_avail_grant,
                                    osc_list(&cli->cl_cache_waiters),
                                    osc_list(&cli->cl_loi_ready_list),
                                    osc_list(&cli->cl_loi_hp_ready_list),
                                    osc_list(&cli->cl_loi_write_list),
                                    osc_list(&cli->cl_loi_read_list),
                                    /* 5 */
                                    osc_list(&loi->loi_ready_item),
                                    osc_list(&loi->loi_hp_ready_item),
                                    osc_list(&loi->loi_write_item),
                                    osc_list(&loi->loi_read_item),
                                    loi->loi_read_lop.lop_num_pending,
                                    osc_list(&loi->loi_read_lop.lop_pending),
                                    osc_list(&loi->loi_read_lop.lop_urgent),
                                    loi->loi_write_lop.lop_num_pending,
                                    osc_list(&loi->loi_write_lop.lop_pending),
                                    osc_list(&loi->loi_write_lop.lop_urgent));
          }
          
          wjt27 Wojciech Turek added a comment - - edited static int osc_page_print( const struct lu_env *env, const struct cl_page_slice *slice, void *cookie, lu_printer_t printer) { struct osc_page *opg = cl2osc_page(slice); struct osc_async_page *oap = &opg->ops_oap; struct osc_object *obj = cl2osc(slice->cpl_obj); struct client_obd *cli = &osc_export(obj)->exp_obd->u.cli; struct lov_oinfo *loi = obj->oo_oinfo; return (*printer)(env, cookie, LUSTRE_OSC_NAME "-page@%p: " "1< %#x %d %u %s %s %s > " "2< " LPU64 " %u %u %#x %#x | %p %p %p %p %p > " "3< %s %p %d %lu %d > " "4< %d %d %d %lu %s | %s %s %s %s > " "5< %s %s %s %s | %d %s %s | %d %s %s>\n" , opg, /* 1 */ oap->oap_magic, oap->oap_cmd, oap->oap_interrupted, osc_list(&oap->oap_pending_item), osc_list(&oap->oap_urgent_item), osc_list(&oap->oap_rpc_item), /* 2 */ oap->oap_obj_off, oap->oap_page_off, oap->oap_count, oap->oap_async_flags, oap->oap_brw_flags, oap->oap_request, oap->oap_cli, oap->oap_loi, oap->oap_caller_ops, oap->oap_caller_data, /* 3 */ osc_list(&opg->ops_inflight), opg->ops_submitter, opg->ops_transfer_pinned, osc_submit_duration(opg), opg->ops_srvlock, /* 4 */ cli->cl_r_in_flight, cli->cl_w_in_flight, cli->cl_max_rpcs_in_flight, cli->cl_avail_grant, osc_list(&cli->cl_cache_waiters), osc_list(&cli->cl_loi_ready_list), osc_list(&cli->cl_loi_hp_ready_list), osc_list(&cli->cl_loi_write_list), osc_list(&cli->cl_loi_read_list), /* 5 */ osc_list(&loi->loi_ready_item), osc_list(&loi->loi_hp_ready_item), osc_list(&loi->loi_write_item), osc_list(&loi->loi_read_item), loi->loi_read_lop.lop_num_pending, osc_list(&loi->loi_read_lop.lop_pending), osc_list(&loi->loi_read_lop.lop_urgent), loi->loi_write_lop.lop_num_pending, osc_list(&loi->loi_write_lop.lop_pending), osc_list(&loi->loi_write_lop.lop_urgent)); }

          People

            jay Jinshan Xiong (Inactive)
            wjt27 Wojciech Turek
            Votes:
            2 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: