Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1723

osc_page_delete()) ASSERTION(0) failed

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.1.2
    • None
    • Client RHEL-6.2 2.6.32-220.23.1.el6.x86_64
    • 3
    • 4210

    Description

      LBUG on production cluster while node was running a user job.

      Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) page@ffff88106917b5c0[1 ffff8806726e8f48:1263 ^(null)_ffff88106917b500 4 0 1 (null) (null) 0x1]
      Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) page@ffff88106917b500[1 ffff880770b2c4c8:1263 ^ffff88106917b5c0_(null) 4 0 1 (null) (null) 0x0]
      Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) vvp-page@ffff8809d88c6be0(1:0:0) vm@ffffea0034f6ce20 c0000000000001 3:0 0 1263 lru
      Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) lov-page@ffff8809f22e93a8
      Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) osc-page@ffff881067d54bc8: 1< 0x845fed 1 0 - - + > 2< 5173248 0 4096 0x0 0x8 | (null) ffff8806d7f24688 ffff88016958f700 ffffffffa07ff>
      Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) end page@ffff88106917b5c0
      Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) Trying to teardown failed: -16
      Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:426:osc_page_delete()) ASSERTION(0) failed
      Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:426:osc_page_delete()) LBUG
      Aug 8 16:00:02 sand-4-52 kernel: Pid: 64082, comm: calculate_propa
      Aug 8 16:00:02 sand-4-52 kernel:
      Aug 8 16:00:02 sand-4-52 kernel: Call Trace:
      Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa0446855>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa0446e95>] lbug_with_loc+0x75/0xe0 [libcfs]
      Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa0451d86>] libcfs_assertion_failed+0x66/0x70 [libcfs]
      Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa07f8df6>] osc_page_delete+0x236/0x240 [osc]
      Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa0553d1e>] cl_page_delete0+0xce/0x400 [obdclass]
      Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa055128e>] ? cl_env_get+0x19e/0x350 [obdclass]
      Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa0550bf6>] ? cl_env_peek+0x36/0x110 [obdclass]
      Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa055408d>] cl_page_delete+0x3d/0xf0 [obdclass]
      Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa0560bde>] ? cl_io_is_going+0xe/0x20 [obdclass]
      Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa08e791b>] ll_releasepage+0x10b/0x150 [lustre]
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff81168ff0>] ? mem_cgroup_uncharge_cache_page+0x10/0x20
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff811100b0>] try_to_release_page+0x30/0x60
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8112a4f1>] shrink_page_list.clone.0+0x4f1/0x5c0
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8112a8bb>] shrink_inactive_list+0x2fb/0x740
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8112b5cf>] shrink_zone+0x38f/0x520
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8112c374>] zone_reclaim+0x354/0x410
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8112cfc0>] ? isolate_pages_global+0x0/0x350
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff81122874>] get_page_from_freelist+0x694/0x820
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff81123af1>] __alloc_pages_nodemask+0x111/0x940
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8116a728>] ? __mem_cgroup_try_charge+0x78/0x420
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff811586ca>] alloc_pages_vma+0x9a/0x150
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff81172015>] do_huge_pmd_anonymous_page+0x145/0x370
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8113c79a>] handle_mm_fault+0x25a/0x2b0
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff81042c29>] __do_page_fault+0x139/0x480
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8100988e>] ? __switch_to+0x26e/0x320
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff814ed250>] ? thread_return+0x4e/0x76e
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff814f2c8e>] do_page_fault+0x3e/0xa0
      Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff814f0045>] page_fault+0x25/0x30
      Aug 8 16:00:03 sand-4-52 kernel:
      Aug 8 16:00:03 sand-4-52 kernel: LustreError: dumping log to /tmp/lustre-log.1344438003.64082

      Attachments

        Activity

          [LU-1723] osc_page_delete()) ASSERTION(0) failed
          green Oleg Drokin added a comment -

          I just hit this running racer on a small local fs (That gets ENOSPC easily):

          [12126.733752] LustreError: 11222:0:(vvp_io.c:1038:vvp_io_commit_write()) Write page 0 of inode ffff88004431db20 failed -28
          [12126.734303] LustreError: 11222:0:(vvp_io.c:1038:vvp_io_commit_write()) Skipped 708 previous similar messages
          [12198.367486] LustreError: 26187:0:(osc_cache.c:2343:osc_teardown_async_page()) extent ffff88002223ef38@{[256 -> 402/511], [2|0|-|cache|wi|ffff880088af7eb0], [602112|147|+|-|ffff880032770ed8|256|(null)]} trunc at 256.
          [12198.368328] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) page@ffff880011a6def0[2 ffff8800169d7f88:256 ^(null)_ffff8800868dfef0 4 0 1 (null) (null) 0x0]
          [12198.369065] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) page@ffff8800868dfef0[1 ffff88004b855f98:256 ^ffff880011a6def0_(null) 4 0 1 (null) (null) 0x0]
          [12198.369829] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) vvp-page@ffff88009f775af8(0:0:0) vm@ffffea00011fc5f8 20000000000035 2:0 0 256 lru
          [12198.370386] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) lov-page@ffff880010cfbae8
          [12198.370856] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) osc-page@ffff88008e13ef08: 1< 0x845fed 258 0 + - > 2< 1048576 0 4096 0x0 0x520 | (null) ffff88006f5c8cb0 ffff880088af7eb0 > 3< + ffff880051112540 0 0 0 > 4< 0 0 8 35164160 - | - - + - > 5< - - + - | 0 - | 147 - ->
          [12198.371911] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) end page@ffff880011a6def0
          [12198.372377] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) Trying to teardown failed: -16
          [12198.372850] LustreError: 26187:0:(osc_page.c:412:osc_page_delete()) ASSERTION( 0 ) failed: 
          [12198.373329] LustreError: 26187:0:(osc_page.c:412:osc_page_delete()) LBUG
          [12198.373602] Pid: 26187, comm: ldlm_bl_21
          [12198.373826] 
          [12198.373827] Call Trace:
          [12198.374850]  [<ffffffffa0caa915>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
          [12198.375155]  [<ffffffffa0caaf27>] lbug_with_loc+0x47/0xb0 [libcfs]
          [12198.375436]  [<ffffffffa04cec1d>] osc_page_delete+0x29d/0x2a0 [osc]
          [12198.375752]  [<ffffffffa11d8e65>] cl_page_delete0+0xd5/0x500 [obdclass]
          [12198.376052]  [<ffffffffa11d92d2>] cl_page_delete+0x42/0x120 [obdclass]
          [12198.376357]  [<ffffffffa0ac37cd>] ll_invalidatepage+0x8d/0x180 [lustre]
          [12198.376692]  [<ffffffffa0ad592f>] vvp_page_discard+0x9f/0x140 [lustre]
          [12198.376989]  [<ffffffffa11d6358>] cl_page_invoid+0x78/0x170 [obdclass]
          [12198.377307]  [<ffffffffa11d68a8>] ? cl_page_invoke+0x158/0x1c0 [obdclass]
          [12198.377607]  [<ffffffffa11d6463>] cl_page_discard+0x13/0x20 [obdclass]
          [12198.377903]  [<ffffffffa11e08ae>] check_and_discard_cb+0x13e/0x160 [obdclass]
          [12198.378220]  [<ffffffffa11db54c>] cl_page_gang_lookup+0x1fc/0x410 [obdclass]
          [12198.378527]  [<ffffffffa11e0770>] ? check_and_discard_cb+0x0/0x160 [obdclass]
          [12198.378834]  [<ffffffffa11e0770>] ? check_and_discard_cb+0x0/0x160 [obdclass]
          [12198.379144]  [<ffffffffa11de03e>] cl_lock_discard_pages+0x11e/0x1f0 [obdclass]
          [12198.379607]  [<ffffffffa11d33d5>] ? cl_env_info+0x15/0x20 [obdclass]
          [12198.379888]  [<ffffffffa04cfb80>] osc_lock_flush+0x110/0x200 [osc]
          [12198.380167]  [<ffffffffa04cfcc9>] osc_lock_cancel+0x59/0x1a0 [osc]
          [12198.380454]  [<ffffffffa11dbda5>] cl_lock_cancel0+0x75/0x160 [obdclass]
          [12198.380752]  [<ffffffffa11dca0b>] cl_lock_cancel+0x13b/0x140 [obdclass]
          [12198.381045]  [<ffffffffa04d19ba>] osc_ldlm_blocking_ast+0x13a/0x380 [osc]
          [12198.381368]  [<ffffffffa12dbaf3>] ldlm_handle_bl_callback+0x123/0x2e0 [ptlrpc]
          [12198.381831]  [<ffffffffa12dbf31>] ldlm_bl_thread_main+0x281/0x3e0 [ptlrpc]
          [12198.382119]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
          [12198.382408]  [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
          [12198.382692]  [<ffffffff8100c14a>] child_rip+0xa/0x20
          [12198.382958]  [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
          [12198.383826]  [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
          [12198.384126]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
          [12198.384378] 
          [12198.455840] Kernel panic - not syncing: LBUG
          [12198.456115] Pid: 26187, comm: ldlm_bl_21 Not tainted 2.6.32-debug #6
          [12198.456423] Call Trace:
          [12198.456632]  [<ffffffff814f75e4>] ? panic+0xa0/0x168
          [12198.456887]  [<ffffffffa0caaf7b>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
          [12198.457201]  [<ffffffffa04cec1d>] ? osc_page_delete+0x29d/0x2a0 [osc]
          [12198.457516]  [<ffffffffa11d8e65>] ? cl_page_delete0+0xd5/0x500 [obdclass]
          [12198.457816]  [<ffffffffa11d92d2>] ? cl_page_delete+0x42/0x120 [obdclass]
          [12198.458127]  [<ffffffffa0ac37cd>] ? ll_invalidatepage+0x8d/0x180 [lustre]
          [12198.458420]  [<ffffffffa0ad592f>] ? vvp_page_discard+0x9f/0x140 [lustre]
          [12198.458718]  [<ffffffffa11d6358>] ? cl_page_invoid+0x78/0x170 [obdclass]
          [12198.459014]  [<ffffffffa11d68a8>] ? cl_page_invoke+0x158/0x1c0 [obdclass]
          [12198.459328]  [<ffffffffa11d6463>] ? cl_page_discard+0x13/0x20 [obdclass]
          [12198.459627]  [<ffffffffa11e08ae>] ? check_and_discard_cb+0x13e/0x160 [obdclass]
          [12198.460126]  [<ffffffffa11db54c>] ? cl_page_gang_lookup+0x1fc/0x410 [obdclass]
          [12198.460617]  [<ffffffffa11e0770>] ? check_and_discard_cb+0x0/0x160 [obdclass]
          [12198.460970]  [<ffffffffa11e0770>] ? check_and_discard_cb+0x0/0x160 [obdclass]
          [12198.462162]  [<ffffffffa11de03e>] ? cl_lock_discard_pages+0x11e/0x1f0 [obdclass]
          [12198.462646]  [<ffffffffa11d33d5>] ? cl_env_info+0x15/0x20 [obdclass]
          [12198.462932]  [<ffffffffa04cfb80>] ? osc_lock_flush+0x110/0x200 [osc]
          [12198.463225]  [<ffffffffa04cfcc9>] ? osc_lock_cancel+0x59/0x1a0 [osc]
          [12198.463533]  [<ffffffffa11dbda5>] ? cl_lock_cancel0+0x75/0x160 [obdclass]
          [12198.463829]  [<ffffffffa11dca0b>] ? cl_lock_cancel+0x13b/0x140 [obdclass]
          [12198.464120]  [<ffffffffa04d19ba>] ? osc_ldlm_blocking_ast+0x13a/0x380 [osc]
          [12198.464436]  [<ffffffffa12dbaf3>] ? ldlm_handle_bl_callback+0x123/0x2e0 [ptlrpc]
          [12198.464898]  [<ffffffffa12dbf31>] ? ldlm_bl_thread_main+0x281/0x3e0 [ptlrpc]
          [12198.465240]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
          [12198.465533]  [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
          [12198.465815]  [<ffffffff8100c14a>] ? child_rip+0xa/0x20
          [12198.466084]  [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
          [12198.466393]  [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
          [12198.466699]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
          

          I have vmcore for this occurence

          green Oleg Drokin added a comment - I just hit this running racer on a small local fs (That gets ENOSPC easily): [12126.733752] LustreError: 11222:0:(vvp_io.c:1038:vvp_io_commit_write()) Write page 0 of inode ffff88004431db20 failed -28 [12126.734303] LustreError: 11222:0:(vvp_io.c:1038:vvp_io_commit_write()) Skipped 708 previous similar messages [12198.367486] LustreError: 26187:0:(osc_cache.c:2343:osc_teardown_async_page()) extent ffff88002223ef38@{[256 -> 402/511], [2|0|-|cache|wi|ffff880088af7eb0], [602112|147|+|-|ffff880032770ed8|256|(null)]} trunc at 256. [12198.368328] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) page@ffff880011a6def0[2 ffff8800169d7f88:256 ^(null)_ffff8800868dfef0 4 0 1 (null) (null) 0x0] [12198.369065] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) page@ffff8800868dfef0[1 ffff88004b855f98:256 ^ffff880011a6def0_(null) 4 0 1 (null) (null) 0x0] [12198.369829] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) vvp-page@ffff88009f775af8(0:0:0) vm@ffffea00011fc5f8 20000000000035 2:0 0 256 lru [12198.370386] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) lov-page@ffff880010cfbae8 [12198.370856] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) osc-page@ffff88008e13ef08: 1< 0x845fed 258 0 + - > 2< 1048576 0 4096 0x0 0x520 | (null) ffff88006f5c8cb0 ffff880088af7eb0 > 3< + ffff880051112540 0 0 0 > 4< 0 0 8 35164160 - | - - + - > 5< - - + - | 0 - | 147 - -> [12198.371911] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) end page@ffff880011a6def0 [12198.372377] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) Trying to teardown failed: -16 [12198.372850] LustreError: 26187:0:(osc_page.c:412:osc_page_delete()) ASSERTION( 0 ) failed: [12198.373329] LustreError: 26187:0:(osc_page.c:412:osc_page_delete()) LBUG [12198.373602] Pid: 26187, comm: ldlm_bl_21 [12198.373826] [12198.373827] Call Trace: [12198.374850] [<ffffffffa0caa915>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [12198.375155] [<ffffffffa0caaf27>] lbug_with_loc+0x47/0xb0 [libcfs] [12198.375436] [<ffffffffa04cec1d>] osc_page_delete+0x29d/0x2a0 [osc] [12198.375752] [<ffffffffa11d8e65>] cl_page_delete0+0xd5/0x500 [obdclass] [12198.376052] [<ffffffffa11d92d2>] cl_page_delete+0x42/0x120 [obdclass] [12198.376357] [<ffffffffa0ac37cd>] ll_invalidatepage+0x8d/0x180 [lustre] [12198.376692] [<ffffffffa0ad592f>] vvp_page_discard+0x9f/0x140 [lustre] [12198.376989] [<ffffffffa11d6358>] cl_page_invoid+0x78/0x170 [obdclass] [12198.377307] [<ffffffffa11d68a8>] ? cl_page_invoke+0x158/0x1c0 [obdclass] [12198.377607] [<ffffffffa11d6463>] cl_page_discard+0x13/0x20 [obdclass] [12198.377903] [<ffffffffa11e08ae>] check_and_discard_cb+0x13e/0x160 [obdclass] [12198.378220] [<ffffffffa11db54c>] cl_page_gang_lookup+0x1fc/0x410 [obdclass] [12198.378527] [<ffffffffa11e0770>] ? check_and_discard_cb+0x0/0x160 [obdclass] [12198.378834] [<ffffffffa11e0770>] ? check_and_discard_cb+0x0/0x160 [obdclass] [12198.379144] [<ffffffffa11de03e>] cl_lock_discard_pages+0x11e/0x1f0 [obdclass] [12198.379607] [<ffffffffa11d33d5>] ? cl_env_info+0x15/0x20 [obdclass] [12198.379888] [<ffffffffa04cfb80>] osc_lock_flush+0x110/0x200 [osc] [12198.380167] [<ffffffffa04cfcc9>] osc_lock_cancel+0x59/0x1a0 [osc] [12198.380454] [<ffffffffa11dbda5>] cl_lock_cancel0+0x75/0x160 [obdclass] [12198.380752] [<ffffffffa11dca0b>] cl_lock_cancel+0x13b/0x140 [obdclass] [12198.381045] [<ffffffffa04d19ba>] osc_ldlm_blocking_ast+0x13a/0x380 [osc] [12198.381368] [<ffffffffa12dbaf3>] ldlm_handle_bl_callback+0x123/0x2e0 [ptlrpc] [12198.381831] [<ffffffffa12dbf31>] ldlm_bl_thread_main+0x281/0x3e0 [ptlrpc] [12198.382119] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20 [12198.382408] [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc] [12198.382692] [<ffffffff8100c14a>] child_rip+0xa/0x20 [12198.382958] [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc] [12198.383826] [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc] [12198.384126] [<ffffffff8100c140>] ? child_rip+0x0/0x20 [12198.384378] [12198.455840] Kernel panic - not syncing: LBUG [12198.456115] Pid: 26187, comm: ldlm_bl_21 Not tainted 2.6.32-debug #6 [12198.456423] Call Trace: [12198.456632] [<ffffffff814f75e4>] ? panic+0xa0/0x168 [12198.456887] [<ffffffffa0caaf7b>] ? lbug_with_loc+0x9b/0xb0 [libcfs] [12198.457201] [<ffffffffa04cec1d>] ? osc_page_delete+0x29d/0x2a0 [osc] [12198.457516] [<ffffffffa11d8e65>] ? cl_page_delete0+0xd5/0x500 [obdclass] [12198.457816] [<ffffffffa11d92d2>] ? cl_page_delete+0x42/0x120 [obdclass] [12198.458127] [<ffffffffa0ac37cd>] ? ll_invalidatepage+0x8d/0x180 [lustre] [12198.458420] [<ffffffffa0ad592f>] ? vvp_page_discard+0x9f/0x140 [lustre] [12198.458718] [<ffffffffa11d6358>] ? cl_page_invoid+0x78/0x170 [obdclass] [12198.459014] [<ffffffffa11d68a8>] ? cl_page_invoke+0x158/0x1c0 [obdclass] [12198.459328] [<ffffffffa11d6463>] ? cl_page_discard+0x13/0x20 [obdclass] [12198.459627] [<ffffffffa11e08ae>] ? check_and_discard_cb+0x13e/0x160 [obdclass] [12198.460126] [<ffffffffa11db54c>] ? cl_page_gang_lookup+0x1fc/0x410 [obdclass] [12198.460617] [<ffffffffa11e0770>] ? check_and_discard_cb+0x0/0x160 [obdclass] [12198.460970] [<ffffffffa11e0770>] ? check_and_discard_cb+0x0/0x160 [obdclass] [12198.462162] [<ffffffffa11de03e>] ? cl_lock_discard_pages+0x11e/0x1f0 [obdclass] [12198.462646] [<ffffffffa11d33d5>] ? cl_env_info+0x15/0x20 [obdclass] [12198.462932] [<ffffffffa04cfb80>] ? osc_lock_flush+0x110/0x200 [osc] [12198.463225] [<ffffffffa04cfcc9>] ? osc_lock_cancel+0x59/0x1a0 [osc] [12198.463533] [<ffffffffa11dbda5>] ? cl_lock_cancel0+0x75/0x160 [obdclass] [12198.463829] [<ffffffffa11dca0b>] ? cl_lock_cancel+0x13b/0x140 [obdclass] [12198.464120] [<ffffffffa04d19ba>] ? osc_ldlm_blocking_ast+0x13a/0x380 [osc] [12198.464436] [<ffffffffa12dbaf3>] ? ldlm_handle_bl_callback+0x123/0x2e0 [ptlrpc] [12198.464898] [<ffffffffa12dbf31>] ? ldlm_bl_thread_main+0x281/0x3e0 [ptlrpc] [12198.465240] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20 [12198.465533] [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc] [12198.465815] [<ffffffff8100c14a>] ? child_rip+0xa/0x20 [12198.466084] [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc] [12198.466393] [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc] [12198.466699] [<ffffffff8100c140>] ? child_rip+0x0/0x20 I have vmcore for this occurence

          Maybe the output was truncated, from the output you posted:

          Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) osc-page@ffff881067d54bc8: 1< 0x845fed 1 0 - - + > 2< 5173248 0 4096 0x0 0x8 | (null) ffff8806d7f24688 ffff88016958f700 ffffffffa07ff>
          

          only paragraph 1 <...> and 2 <...> were printed, but I expect to see all 5 paragraphs.

          How easy you can reproduce this problem, if possible can you please trigger it and post the output on jira again?

          jay Jinshan Xiong (Inactive) added a comment - Maybe the output was truncated, from the output you posted: Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) osc-page@ffff881067d54bc8: 1< 0x845fed 1 0 - - + > 2< 5173248 0 4096 0x0 0x8 | (null) ffff8806d7f24688 ffff88016958f700 ffffffffa07ff> only paragraph 1 <...> and 2 <...> were printed, but I expect to see all 5 paragraphs. How easy you can reproduce this problem, if possible can you please trigger it and post the output on jira again?

          Hi Jinshan,

          I don't understand why you say it mismatches, can you elaborate please?

          wjt27 Wojciech Turek added a comment - Hi Jinshan, I don't understand why you say it mismatches, can you elaborate please?
          jay Jinshan Xiong (Inactive) added a comment - - edited

          Hi Wojciech Turek, Thanks. It looks like the code mismatches the output from your first comment.

          BTW, you can use:

          {code} /* C code */ {code}

          to quote source code on jira.

          jay Jinshan Xiong (Inactive) added a comment - - edited Hi Wojciech Turek, Thanks. It looks like the code mismatches the output from your first comment. BTW, you can use: {code} /* C code */ {code} to quote source code on jira.
          wjt27 Wojciech Turek added a comment - - edited
          static int osc_page_print(const struct lu_env *env,
                                    const struct cl_page_slice *slice,
                                    void *cookie, lu_printer_t printer)
          {
                  struct osc_page       *opg = cl2osc_page(slice);
                  struct osc_async_page *oap = &opg->ops_oap;
                  struct osc_object     *obj = cl2osc(slice->cpl_obj);
                  struct client_obd     *cli = &osc_export(obj)->exp_obd->u.cli;
                  struct lov_oinfo      *loi = obj->oo_oinfo;
          
                  return (*printer)(env, cookie, LUSTRE_OSC_NAME"-page@%p: "
                                    "1< %#x %d %u %s %s %s > "
                                    "2< "LPU64" %u %u %#x %#x | %p %p %p %p %p > "
                                    "3< %s %p %d %lu %d > "
                                    "4< %d %d %d %lu %s | %s %s %s %s > "
                                    "5< %s %s %s %s | %d %s %s | %d %s %s>\n",
                                    opg,
                                    /* 1 */
                                    oap->oap_magic, oap->oap_cmd,
                                    oap->oap_interrupted,
                                    osc_list(&oap->oap_pending_item),
                                    osc_list(&oap->oap_urgent_item),
                                    osc_list(&oap->oap_rpc_item),
                                    /* 2 */
                                    oap->oap_obj_off, oap->oap_page_off, oap->oap_count,
                                    oap->oap_async_flags, oap->oap_brw_flags,
                                    oap->oap_request,
                                    oap->oap_cli, oap->oap_loi, oap->oap_caller_ops,
                                    oap->oap_caller_data,
                                    /* 3 */
                                    osc_list(&opg->ops_inflight),
                                    opg->ops_submitter, opg->ops_transfer_pinned,
                                    osc_submit_duration(opg), opg->ops_srvlock,
                                    /* 4 */
                                    cli->cl_r_in_flight, cli->cl_w_in_flight,
                                    cli->cl_max_rpcs_in_flight,
                                    cli->cl_avail_grant,
                                    osc_list(&cli->cl_cache_waiters),
                                    osc_list(&cli->cl_loi_ready_list),
                                    osc_list(&cli->cl_loi_hp_ready_list),
                                    osc_list(&cli->cl_loi_write_list),
                                    osc_list(&cli->cl_loi_read_list),
                                    /* 5 */
                                    osc_list(&loi->loi_ready_item),
                                    osc_list(&loi->loi_hp_ready_item),
                                    osc_list(&loi->loi_write_item),
                                    osc_list(&loi->loi_read_item),
                                    loi->loi_read_lop.lop_num_pending,
                                    osc_list(&loi->loi_read_lop.lop_pending),
                                    osc_list(&loi->loi_read_lop.lop_urgent),
                                    loi->loi_write_lop.lop_num_pending,
                                    osc_list(&loi->loi_write_lop.lop_pending),
                                    osc_list(&loi->loi_write_lop.lop_urgent));
          }
          
          wjt27 Wojciech Turek added a comment - - edited static int osc_page_print( const struct lu_env *env, const struct cl_page_slice *slice, void *cookie, lu_printer_t printer) { struct osc_page *opg = cl2osc_page(slice); struct osc_async_page *oap = &opg->ops_oap; struct osc_object *obj = cl2osc(slice->cpl_obj); struct client_obd *cli = &osc_export(obj)->exp_obd->u.cli; struct lov_oinfo *loi = obj->oo_oinfo; return (*printer)(env, cookie, LUSTRE_OSC_NAME "-page@%p: " "1< %#x %d %u %s %s %s > " "2< " LPU64 " %u %u %#x %#x | %p %p %p %p %p > " "3< %s %p %d %lu %d > " "4< %d %d %d %lu %s | %s %s %s %s > " "5< %s %s %s %s | %d %s %s | %d %s %s>\n" , opg, /* 1 */ oap->oap_magic, oap->oap_cmd, oap->oap_interrupted, osc_list(&oap->oap_pending_item), osc_list(&oap->oap_urgent_item), osc_list(&oap->oap_rpc_item), /* 2 */ oap->oap_obj_off, oap->oap_page_off, oap->oap_count, oap->oap_async_flags, oap->oap_brw_flags, oap->oap_request, oap->oap_cli, oap->oap_loi, oap->oap_caller_ops, oap->oap_caller_data, /* 3 */ osc_list(&opg->ops_inflight), opg->ops_submitter, opg->ops_transfer_pinned, osc_submit_duration(opg), opg->ops_srvlock, /* 4 */ cli->cl_r_in_flight, cli->cl_w_in_flight, cli->cl_max_rpcs_in_flight, cli->cl_avail_grant, osc_list(&cli->cl_cache_waiters), osc_list(&cli->cl_loi_ready_list), osc_list(&cli->cl_loi_hp_ready_list), osc_list(&cli->cl_loi_write_list), osc_list(&cli->cl_loi_read_list), /* 5 */ osc_list(&loi->loi_ready_item), osc_list(&loi->loi_hp_ready_item), osc_list(&loi->loi_write_item), osc_list(&loi->loi_read_item), loi->loi_read_lop.lop_num_pending, osc_list(&loi->loi_read_lop.lop_pending), osc_list(&loi->loi_read_lop.lop_urgent), loi->loi_write_lop.lop_num_pending, osc_list(&loi->loi_write_lop.lop_pending), osc_list(&loi->loi_write_lop.lop_urgent)); }
          jay Jinshan Xiong (Inactive) added a comment - - edited

          Hi Wojciech Turek,

          can you please post osc_page_print() function from your source code here? Also which pages have been applied. Thanks in advance.

          jay Jinshan Xiong (Inactive) added a comment - - edited Hi Wojciech Turek, can you please post osc_page_print() function from your source code here? Also which pages have been applied. Thanks in advance.

          The clients are running lustre-2.1.2

          lustre-client-2.1.2-2.6.32_220.23.1.el6.x86_64.x86_64
          lustre-client-modules-2.1.2-2.6.32_220.23.1.el6.x86_64.x86_64

          Lustre-2.1.2 changelog suggests that it already has a fix for LU-1320

          wjt27 Wojciech Turek added a comment - The clients are running lustre-2.1.2 lustre-client-2.1.2-2.6.32_220.23.1.el6.x86_64.x86_64 lustre-client-modules-2.1.2-2.6.32_220.23.1.el6.x86_64.x86_64 Lustre-2.1.2 changelog suggests that it already has a fix for LU-1320

          Can you please tell me what exactly version you installed on the client node?

          Also, this patch may help: "LU-1320 llite: fix a race between readpage and releasepage".

          jay Jinshan Xiong (Inactive) added a comment - Can you please tell me what exactly version you installed on the client node? Also, this patch may help: " LU-1320 llite: fix a race between readpage and releasepage".

          People

            jay Jinshan Xiong (Inactive)
            wjt27 Wojciech Turek
            Votes:
            2 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: