Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3655

Reoccurrence of permanent eviction scenario

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • None
    • Lustre 2.1.5
    • None
    • 3
    • 9404

    Description

      Hi,

      I am afraid we suffer again from the issue described in LU-2683 and LU-1690. But this time we are running Lustre 2.1.5, which includes the 4 patches from LU-874. We also backported patch http://review.whamcloud.com/5208 from LU-2683 in our sources.

      So those 5 patches might not be enough to fix this problem.

      Here is the information collected from the crash:

      crash>dmesg
      ...
      LustreError: 65257:0:(cl_io.c:1702:cl_sync_io_wait()) SYNC IO failed with error: -110, try to cancel 1 remaining pages
      LustreError: 65257:0:(cl_io.c:967:cl_io_cancel()) Canceling ongoing page trasmission
      ...
      
      crash> ps | grep 65257
        65257 2 5 ffff880fe2ac27d0 IN 0.0 0 0 [ldlm_bl_62]
      crash> bt 65257
      PID: 65257 TASK: ffff880fe2ac27d0 CPU: 5 COMMAND: "ldlm_bl_62"
       #0 [ffff880fe32a7ae0] schedule at ffffffff81484c15
       #1 [ffff880fe32a7ba8] cfs_waitq_wait at ffffffffa055a6de [libcfs]
       #2 [ffff880fe32a7bb8] cl_sync_io_wait at ffffffffa067f3cb [obdclass]
       #3 [ffff880fe32a7c58] cl_io_submit_sync at ffffffffa067f643 [obdclass]
       #4 [ffff880fe32a7cb8] cl_lock_page_out at ffffffffa0676997 [obdclass]
       #5 [ffff880fe32a7d28] osc_lock_flush at ffffffffa0a6abaf [osc]
       #6 [ffff880fe32a7d78] osc_lock_cancel at ffffffffa0a6acbf [osc]
       #7 [ffff880fe32a7dc8] cl_lock_cancel0 at ffffffffa0675575 [obdclass]
       #8 [ffff880fe32a7df8] cl_lock_cancel at ffffffffa067639b [obdclass]
       #9 [ffff880fe32a7e18] osc_ldlm_blocking_ast at ffffffffa0a6bd9a [osc]
      #10 [ffff880fe32a7e88] ldlm_handle_bl_callback at ffffffffa07a0293 [ptlrpc]
      #11 [ffff880fe32a7eb8] ldlm_bl_thread_main at ffffffffa07a06d1 [ptlrpc]
      #12 [ffff880fe32a7f48] kernel_thread at ffffffff8100412a
      
      
      crash> dmesg | grep 'SYNC IO'
      LustreError: 3140:0:(cl_io.c:1702:cl_sync_io_wait()) SYNC IO failed with error: -110, try to cancel 1 remaining pages
      LustreError: 63611:0:(cl_io.c:1702:cl_sync_io_wait()) SYNC IO failed with error: -110, try to cancel 1 remaining pages
      LustreError: 65257:0:(cl_io.c:1702:cl_sync_io_wait()) SYNC IO failed with error: -110, try to cancel 1 remaining pages
      LustreError: 65316:0:(cl_io.c:1702:cl_sync_io_wait()) SYNC IO failed with error: -110, try to cancel 1 remaining pages
      LustreError: 65235:0:(cl_io.c:1702:cl_sync_io_wait()) SYNC IO failed with error: -110, try to cancel 1 remaining pages
      LustreError: 65277:0:(cl_io.c:1702:cl_sync_io_wait()) SYNC IO failed with error: -110, try to cancel 1 remaining pages
      LustreError: 63605:0:(cl_io.c:1702:cl_sync_io_wait()) SYNC IO failed with error: -110, try to cancel 1 remaining pages
      

      Sebastien.

      Attachments

        1. oss.log
          1.0 kB
        2. sync_io.log
          6 kB

        Activity

          People

            niu Niu Yawei (Inactive)
            sebastien.buisson Sebastien Buisson (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: