Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6271

(osc_cache.c:3150:discard_cb()) ASSERTION( (!(page->cp_type == CPT_CACHEABLE) || (!PageDirty(cl_page_vmpage(page)))) ) failed:

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.8.0
    • Lustre 2.7.0
    • Hyperon
    • 3
    • 17581

    Description

      While performing failover testing:

      LustreError: 167-0: lustre-OST0000-osc-ffff88083b339000: This client was evicted by lustre-OST0000; in progress operations using this service will fail.
      Feb 23 09:22:48 iwc260 kernel: LustreError: 51585:0:(ldlm_resource.c:777:ldlm_resource_complain()) lustre-OST0000-osc-ffff88083b339000: namespace resource [0x2d097bb:0x0:0x0].0 (ffff88106a992e40) refcount nonzero (1) after lock cleanup; forcing cleanup.
      Feb 23 09:22:48 iwc260 kernel: LustreError: 51585:0:(ldlm_resource.c:1374:ldlm_resource_dump()) --- Resource: [0x2d097bb:0x0:0x0].0 (ffff88106a992e40) refcount = 3
      Feb 23 09:22:48 iwc260 kernel: LustreError: 51585:0:(ldlm_resource.c:1377:ldlm_resource_dump()) Granted locks (in reverse order):
      Feb 23 09:22:48 iwc260 kernel: LustreError: 51585:0:(ldlm_resource.c:1380:ldlm_resource_dump()) ### ### ns: lustre-OST0000-osc-ffff88083b339000 lock: ffff88106a60d540/0xa1f402a8812988f4 lrc: 3/0,1 mode: PW/PW res: [0x2d097bb:0x0:0x0].0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->8191) flags: 0x126400020000 nid: local remote: 0x72a9f4c80e66e07b expref: -99 pid: 51570 timeout: 0 lvb_type: 1
      Feb 23 09:22:48 iwc260 kernel: Lustre: lustre-OST0000-osc-ffff88083b339000: Connection restored to lustre-OST0000 (at 192.168.120.13@o2ib)
      Feb 23 09:22:49 iwc260 kernel: LustreError: 90252:0:(osc_cache.c:3150:discard_cb()) ASSERTION( (!(page->cp_type == CPT_CACHEABLE) || (!PageDirty(cl_page_vmpage(page)))) ) failed:
      Feb 23 09:22:49 iwc260 kernel: LustreError: 90252:0:(osc_cache.c:3150:discard_cb()) ASSERTION( (!(page->cp_type == CPT_CACHEABLE) || (!PageDirty(cl_page_vmpage(page)))) ) failed:
      Feb 23 09:22:49 iwc260 kernel: LustreError: 90252:0:(osc_cache.c:3150:discard_cb()) LBUG
      Feb 23 09:22:49 iwc260 kernel: LustreError: 90252:0:(osc_cache.c:3150:discard_cb()) LBUG
      Feb 23 09:22:49 iwc260 kernel: Pid: 90252, comm: ldlm_bl_36
      Feb 23 09:22:49 iwc260 kernel:
      Feb 23 09:22:49 iwc260 kernel: Call Trace:
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0435895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0435e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b2ba56>] discard_cb+0x156/0x190 [osc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b2bdcc>] osc_page_gang_lookup+0x1ac/0x330 [osc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b2b900>] ? discard_cb+0x0/0x190 [osc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b2c094>] osc_lock_discard_pages+0x144/0x240 [osc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa04461c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b2b900>] ? discard_cb+0x0/0x190 [osc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b2298b>] osc_lock_flush+0x8b/0x260 [osc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b22e08>] osc_ldlm_blocking_ast+0x2a8/0x3c0 [osc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0761a6c>] ldlm_cancel_callback+0x6c/0x170 [ptlrpc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa077450a>] ldlm_cli_cancel_local+0x8a/0x470 [ptlrpc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0779120>] ldlm_cli_cancel+0x60/0x360 [ptlrpc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa0b22c3b>] osc_ldlm_blocking_ast+0xdb/0x3c0 [osc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa04461c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa077cb60>] ldlm_handle_bl_callback+0x130/0x400 [ptlrpc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa077d0c1>] ldlm_bl_thread_main+0x291/0x3f0 [ptlrpc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
      Feb 23 09:22:49 iwc260 kernel: [<ffffffffa077ce30>] ? ldlm_bl_thread_main+0x0/0x3f0 [ptlrpc]
      Feb 23 09:22:49 iwc260 kernel: [<ffffffff8109abf6>] kthread+0x96/0xa0
      Feb 23 09:22:49 iwc260 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
      Feb 23 09:22:49 iwc260 kernel: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
      Feb 23 09:22:49 iwc260 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      Attachments

        1. iwc189.lbug.log.txt.gz
          0.2 kB
        2. iwc260.log.gz
          4.81 MB
        3. l-23.LU-6271.txt.gz
          0.2 kB
        4. lustre-log.iwc37.txt
          329 kB
        5. r2i1n4.messages.gz
          96 kB
        6. server_messages.tar.gz
          924 kB

        Issue Links

          Activity

            [LU-6271] (osc_cache.c:3150:discard_cb()) ASSERTION( (!(page->cp_type == CPT_CACHEABLE) || (!PageDirty(cl_page_vmpage(page)))) ) failed:
            jay Jinshan Xiong (Inactive) added a comment - - edited

            To nip this in the bud - it seems like any ongoing work should move to a new ticket...

            To be clear, the customer was using a build with their own patches, there is no evidence that this issue can be seen on master.

            jay Jinshan Xiong (Inactive) added a comment - - edited To nip this in the bud - it seems like any ongoing work should move to a new ticket... To be clear, the customer was using a build with their own patches, there is no evidence that this issue can be seen on master.
            pjones Peter Jones added a comment -

            To nip this in the bud - it seems like any ongoing work should move to a new ticket...

            pjones Peter Jones added a comment - To nip this in the bud - it seems like any ongoing work should move to a new ticket...

            From the backtrace, the truncate process was waiting for i_alloc_sem while holding inode mutex but nobody should have held i_alloc_sem from the stack. Unfortunately I don't have a sle11-sp3 kernel source code on hand otherwise I could take a look.

            One alternative way is to get a kernel dump when you see this problem and search the variable address of alloc_sem probably you can find it on some proc's stack.

            jay Jinshan Xiong (Inactive) added a comment - From the backtrace, the truncate process was waiting for i_alloc_sem while holding inode mutex but nobody should have held i_alloc_sem from the stack. Unfortunately I don't have a sle11-sp3 kernel source code on hand otherwise I could take a look. One alternative way is to get a kernel dump when you see this problem and search the variable address of alloc_sem probably you can find it on some proc's stack.

            From the log, obviously the root cause was a deadlock problem. It has nothing to do with these patches.

            Nov 16 14:22:19 r2i1n4 kernel: [274907.716536] shald_dpd.out   D ffff88201d872010     0   825    772 0x00000000
            Nov 16 14:22:19 r2i1n4 kernel: [274907.716541]  ffff88201d873c30 0000000000000082 ffff88201d872010 0000000000010900
            Nov 16 14:22:19 r2i1n4 kernel: [274907.716548]  0000000000010900 0000000000010900 0000000000010900 ffff88201d873fd8
            Nov 16 14:22:19 r2i1n4 kernel: [274907.716554]  ffff88201d873fd8 0000000000010900 ffff881fbd5ac440 ffff881020932600
            Nov 16 14:22:19 r2i1n4 kernel: [274907.716561] Call Trace:
            Nov 16 14:22:19 r2i1n4 kernel: [274907.716568]  [<ffffffff814651a8>] __mutex_lock_slowpath+0xf8/0x150
            Nov 16 14:22:19 r2i1n4 kernel: [274907.716575]  [<ffffffff81464c3a>] mutex_lock+0x1a/0x40
            Nov 16 14:22:19 r2i1n4 kernel: [274907.716585]  [<ffffffff810fd10a>] generic_file_aio_write+0x3a/0xb0
            Nov 16 14:22:19 r2i1n4 kernel: [274907.716622]  [<ffffffffa0d51e2e>] vvp_io_write_start+0x12e/0x530 [lustre]
            Nov 16 14:22:19 r2i1n4 kernel: [274907.716707]  [<ffffffffa088d539>] cl_io_start+0x69/0x140 [obdclass]
            Nov 16 14:22:19 r2i1n4 kernel: [274907.716790]  [<ffffffffa0890993>] cl_io_loop+0xa3/0x190 [obdclass]
            Nov 16 14:22:19 r2i1n4 kernel: [274907.716860]  [<ffffffffa0cf0cc7>] ll_file_io_generic+0x757/0x830 [lustre]
            Nov 16 14:22:19 r2i1n4 kernel: [274907.716892]  [<ffffffffa0d01e14>] ll_file_aio_write+0x1b4/0x5d0 [lustre]
            Nov 16 14:22:19 r2i1n4 kernel: [274907.716926]  [<ffffffffa0d02415>] ll_file_write+0x1e5/0x270 [lustre]
            Nov 16 14:22:19 r2i1n4 kernel: [274907.716942]  [<ffffffff8115ed4e>] vfs_write+0xce/0x140
            Nov 16 14:22:19 r2i1n4 kernel: [274907.716951]  [<ffffffff8115eec3>] sys_write+0x53/0xa0
            Nov 16 14:22:19 r2i1n4 kernel: [274907.716958]  [<ffffffff8146e1f2>] system_call_fastpath+0x16/0x1b
            Nov 16 14:22:19 r2i1n4 kernel: [274907.716971]  [<00007fffec69579d>] 0x7fffec69579c
            
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717055] shald_dpd.out   D ffff88201dc04010     0   826    772 0x00000000
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717060]  ffff88201dc05b70 0000000000000086 ffff88201dc04010 0000000000010900
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717067]  0000000000010900 0000000000010900 0000000000010900 ffff88201dc05fd8
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717073]  ffff88201dc05fd8 0000000000010900 ffff881fd7fce600 ffffffff81a11020
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717080] Call Trace:
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717087]  [<ffffffff814651a8>] __mutex_lock_slowpath+0xf8/0x150
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717094]  [<ffffffff81464c3a>] mutex_lock+0x1a/0x40
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717121]  [<ffffffffa0d1adb0>] ll_setattr_raw+0x2c0/0xb60 [lustre]
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717143]  [<ffffffff81178faf>] notify_change+0x19f/0x2f0
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717151]  [<ffffffff8115d517>] do_truncate+0x57/0x80
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717160]  [<ffffffff8116c053>] do_last+0x603/0x800
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717168]  [<ffffffff8116ceb9>] path_openat+0xd9/0x420
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717175]  [<ffffffff8116d33c>] do_filp_open+0x4c/0xc0
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717183]  [<ffffffff8115de5f>] do_sys_open+0x17f/0x250
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717191]  [<ffffffff8146e1f2>] system_call_fastpath+0x16/0x1b
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717202]  [<00007fffeb9057bd>] 0x7fffeb9057bc
            
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717285] shald_dpd.out   D 0000000000000000     0   827    772 0x00000000
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717290]  ffff881fbd5afa60 0000000000000086 ffff881fbd5ae010 0000000000010900
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717297]  0000000000010900 0000000000010900 0000000000010900 ffff881fbd5affd8
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717303]  ffff881fbd5affd8 0000000000010900 ffff881fa07c20c0 ffff881020fa2080
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717310] Call Trace:
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717318]  [<ffffffff814661a5>] rwsem_down_failed_common+0xb5/0x160
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717329]  [<ffffffff81264533>] call_rwsem_down_write_failed+0x13/0x20
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717337]  [<ffffffff814654bc>] down_write+0x1c/0x20
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717370]  [<ffffffffa0d51159>] vvp_io_setattr_start+0x129/0x170 [lustre]
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717447]  [<ffffffffa088d539>] cl_io_start+0x69/0x140 [obdclass]
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717528]  [<ffffffffa0890993>] cl_io_loop+0xa3/0x190 [obdclass]
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717605]  [<ffffffffa0d4905f>] cl_setattr_ost+0x24f/0x2b0 [lustre]
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717663]  [<ffffffffa0d1b2e4>] ll_setattr_raw+0x7f4/0xb60 [lustre]
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717683]  [<ffffffff81178faf>] notify_change+0x19f/0x2f0
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717692]  [<ffffffff8115d517>] do_truncate+0x57/0x80
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717699]  [<ffffffff8116c053>] do_last+0x603/0x800
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717708]  [<ffffffff8116ceb9>] path_openat+0xd9/0x420
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717715]  [<ffffffff8116d33c>] do_filp_open+0x4c/0xc0
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717723]  [<ffffffff8115de5f>] do_sys_open+0x17f/0x250
            Nov 16 14:22:19 r2i1n4 kernel: [274907.717731]  [<ffffffff8146e1f2>] system_call_fastpath+0x16/0x1b 
            

            You could try patch 14915 but most likely you're going to end up with the same problem.

            jay Jinshan Xiong (Inactive) added a comment - From the log, obviously the root cause was a deadlock problem. It has nothing to do with these patches. Nov 16 14:22:19 r2i1n4 kernel: [274907.716536] shald_dpd.out D ffff88201d872010 0 825 772 0x00000000 Nov 16 14:22:19 r2i1n4 kernel: [274907.716541] ffff88201d873c30 0000000000000082 ffff88201d872010 0000000000010900 Nov 16 14:22:19 r2i1n4 kernel: [274907.716548] 0000000000010900 0000000000010900 0000000000010900 ffff88201d873fd8 Nov 16 14:22:19 r2i1n4 kernel: [274907.716554] ffff88201d873fd8 0000000000010900 ffff881fbd5ac440 ffff881020932600 Nov 16 14:22:19 r2i1n4 kernel: [274907.716561] Call Trace: Nov 16 14:22:19 r2i1n4 kernel: [274907.716568] [<ffffffff814651a8>] __mutex_lock_slowpath+0xf8/0x150 Nov 16 14:22:19 r2i1n4 kernel: [274907.716575] [<ffffffff81464c3a>] mutex_lock+0x1a/0x40 Nov 16 14:22:19 r2i1n4 kernel: [274907.716585] [<ffffffff810fd10a>] generic_file_aio_write+0x3a/0xb0 Nov 16 14:22:19 r2i1n4 kernel: [274907.716622] [<ffffffffa0d51e2e>] vvp_io_write_start+0x12e/0x530 [lustre] Nov 16 14:22:19 r2i1n4 kernel: [274907.716707] [<ffffffffa088d539>] cl_io_start+0x69/0x140 [obdclass] Nov 16 14:22:19 r2i1n4 kernel: [274907.716790] [<ffffffffa0890993>] cl_io_loop+0xa3/0x190 [obdclass] Nov 16 14:22:19 r2i1n4 kernel: [274907.716860] [<ffffffffa0cf0cc7>] ll_file_io_generic+0x757/0x830 [lustre] Nov 16 14:22:19 r2i1n4 kernel: [274907.716892] [<ffffffffa0d01e14>] ll_file_aio_write+0x1b4/0x5d0 [lustre] Nov 16 14:22:19 r2i1n4 kernel: [274907.716926] [<ffffffffa0d02415>] ll_file_write+0x1e5/0x270 [lustre] Nov 16 14:22:19 r2i1n4 kernel: [274907.716942] [<ffffffff8115ed4e>] vfs_write+0xce/0x140 Nov 16 14:22:19 r2i1n4 kernel: [274907.716951] [<ffffffff8115eec3>] sys_write+0x53/0xa0 Nov 16 14:22:19 r2i1n4 kernel: [274907.716958] [<ffffffff8146e1f2>] system_call_fastpath+0x16/0x1b Nov 16 14:22:19 r2i1n4 kernel: [274907.716971] [<00007fffec69579d>] 0x7fffec69579c Nov 16 14:22:19 r2i1n4 kernel: [274907.717055] shald_dpd.out D ffff88201dc04010 0 826 772 0x00000000 Nov 16 14:22:19 r2i1n4 kernel: [274907.717060] ffff88201dc05b70 0000000000000086 ffff88201dc04010 0000000000010900 Nov 16 14:22:19 r2i1n4 kernel: [274907.717067] 0000000000010900 0000000000010900 0000000000010900 ffff88201dc05fd8 Nov 16 14:22:19 r2i1n4 kernel: [274907.717073] ffff88201dc05fd8 0000000000010900 ffff881fd7fce600 ffffffff81a11020 Nov 16 14:22:19 r2i1n4 kernel: [274907.717080] Call Trace: Nov 16 14:22:19 r2i1n4 kernel: [274907.717087] [<ffffffff814651a8>] __mutex_lock_slowpath+0xf8/0x150 Nov 16 14:22:19 r2i1n4 kernel: [274907.717094] [<ffffffff81464c3a>] mutex_lock+0x1a/0x40 Nov 16 14:22:19 r2i1n4 kernel: [274907.717121] [<ffffffffa0d1adb0>] ll_setattr_raw+0x2c0/0xb60 [lustre] Nov 16 14:22:19 r2i1n4 kernel: [274907.717143] [<ffffffff81178faf>] notify_change+0x19f/0x2f0 Nov 16 14:22:19 r2i1n4 kernel: [274907.717151] [<ffffffff8115d517>] do_truncate+0x57/0x80 Nov 16 14:22:19 r2i1n4 kernel: [274907.717160] [<ffffffff8116c053>] do_last+0x603/0x800 Nov 16 14:22:19 r2i1n4 kernel: [274907.717168] [<ffffffff8116ceb9>] path_openat+0xd9/0x420 Nov 16 14:22:19 r2i1n4 kernel: [274907.717175] [<ffffffff8116d33c>] do_filp_open+0x4c/0xc0 Nov 16 14:22:19 r2i1n4 kernel: [274907.717183] [<ffffffff8115de5f>] do_sys_open+0x17f/0x250 Nov 16 14:22:19 r2i1n4 kernel: [274907.717191] [<ffffffff8146e1f2>] system_call_fastpath+0x16/0x1b Nov 16 14:22:19 r2i1n4 kernel: [274907.717202] [<00007fffeb9057bd>] 0x7fffeb9057bc Nov 16 14:22:19 r2i1n4 kernel: [274907.717285] shald_dpd.out D 0000000000000000 0 827 772 0x00000000 Nov 16 14:22:19 r2i1n4 kernel: [274907.717290] ffff881fbd5afa60 0000000000000086 ffff881fbd5ae010 0000000000010900 Nov 16 14:22:19 r2i1n4 kernel: [274907.717297] 0000000000010900 0000000000010900 0000000000010900 ffff881fbd5affd8 Nov 16 14:22:19 r2i1n4 kernel: [274907.717303] ffff881fbd5affd8 0000000000010900 ffff881fa07c20c0 ffff881020fa2080 Nov 16 14:22:19 r2i1n4 kernel: [274907.717310] Call Trace: Nov 16 14:22:19 r2i1n4 kernel: [274907.717318] [<ffffffff814661a5>] rwsem_down_failed_common+0xb5/0x160 Nov 16 14:22:19 r2i1n4 kernel: [274907.717329] [<ffffffff81264533>] call_rwsem_down_write_failed+0x13/0x20 Nov 16 14:22:19 r2i1n4 kernel: [274907.717337] [<ffffffff814654bc>] down_write+0x1c/0x20 Nov 16 14:22:19 r2i1n4 kernel: [274907.717370] [<ffffffffa0d51159>] vvp_io_setattr_start+0x129/0x170 [lustre] Nov 16 14:22:19 r2i1n4 kernel: [274907.717447] [<ffffffffa088d539>] cl_io_start+0x69/0x140 [obdclass] Nov 16 14:22:19 r2i1n4 kernel: [274907.717528] [<ffffffffa0890993>] cl_io_loop+0xa3/0x190 [obdclass] Nov 16 14:22:19 r2i1n4 kernel: [274907.717605] [<ffffffffa0d4905f>] cl_setattr_ost+0x24f/0x2b0 [lustre] Nov 16 14:22:19 r2i1n4 kernel: [274907.717663] [<ffffffffa0d1b2e4>] ll_setattr_raw+0x7f4/0xb60 [lustre] Nov 16 14:22:19 r2i1n4 kernel: [274907.717683] [<ffffffff81178faf>] notify_change+0x19f/0x2f0 Nov 16 14:22:19 r2i1n4 kernel: [274907.717692] [<ffffffff8115d517>] do_truncate+0x57/0x80 Nov 16 14:22:19 r2i1n4 kernel: [274907.717699] [<ffffffff8116c053>] do_last+0x603/0x800 Nov 16 14:22:19 r2i1n4 kernel: [274907.717708] [<ffffffff8116ceb9>] path_openat+0xd9/0x420 Nov 16 14:22:19 r2i1n4 kernel: [274907.717715] [<ffffffff8116d33c>] do_filp_open+0x4c/0xc0 Nov 16 14:22:19 r2i1n4 kernel: [274907.717723] [<ffffffff8115de5f>] do_sys_open+0x17f/0x250 Nov 16 14:22:19 r2i1n4 kernel: [274907.717731] [<ffffffff8146e1f2>] system_call_fastpath+0x16/0x1b You could try patch 14915 but most likely you're going to end up with the same problem.
            icostelloddn Ian Costello added a comment - - edited

            NOTE: the 2.5.39-ddn7 has the patches:

            20401c7 LU-6271 osc: faulty assertion in osc_object_prune()
            b08341e LU-6271 osc: further OSC cleanup after eviction
            42e9257 LU-6271 osc: handle osc eviction correctly

            I think we should seriously consider using patch (written by Dongyang Li):
            http://review.whamcloud.com/#/c/14915/

            I am going to test the above patch at NCI, where we are able to reproduce the crash and inactive OST (due to client eviction)...

            icostelloddn Ian Costello added a comment - - edited NOTE: the 2.5.39-ddn7 has the patches: 20401c7 LU-6271 osc: faulty assertion in osc_object_prune() b08341e LU-6271 osc: further OSC cleanup after eviction 42e9257 LU-6271 osc: handle osc eviction correctly I think we should seriously consider using patch (written by Dongyang Li): http://review.whamcloud.com/#/c/14915/ I am going to test the above patch at NCI, where we are able to reproduce the crash and inactive OST (due to client eviction)...

            In short, clients were evicted suddenly and the OSTs stayed INACTIVE for those clients until the clients were restarted forcibly.

            icostelloddn Ian Costello added a comment - In short, clients were evicted suddenly and the OSTs stayed INACTIVE for those clients until the clients were restarted forcibly.

            uploaded 2 logs file detailing the problem, customer is running the ddn-ieel branch 2.5.39-ddn7.

            icostelloddn Ian Costello added a comment - uploaded 2 logs file detailing the problem, customer is running the ddn-ieel branch 2.5.39-ddn7.

            server message

            icostelloddn Ian Costello added a comment - server message

            People

              jay Jinshan Xiong (Inactive)
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: