Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13645

Various data corruptions possible in lustre.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.14.0
    • Lustre 2.14.0, Lustre 2.12.5
    • None
    • 3
    • 9223372036854775807

    Description

      Two groups data corruption cases possible with a lustre, but both is addressed to the lock cancel without osc object assigned to lock.
      This is possible for the DoM and for the Lock Ahead cases.
      Lock Ahead bug have a partial fix - it's LU-11670/LUS-6747.

      1) first bug is addressed to the situation when check_and_discard function can found lock without l_ast_data assigned, this block to discard a pages from page cache and leave as is.
      Next lock cancel will found this lock and page discard is skipped due lack of the osc object assigned. Pages can be read from page cache in next time by ll_do_fast_read which relies an page flags and provide a data from data cache.

      For the Lock Ahead case, it don't have a logs and other conformation - but it looks possible.
      for the DoM cases - this is confirmed case.
      second lock cancel trace

            ldlm_bl_13-35551 [034] 164201.591130: funcgraph_entry:                   |  ll_dom_lock_cancel() {
            ldlm_bl_13-35551 [034] 164201.591132: funcgraph_entry:                   |    cl_env_get() {
            ldlm_bl_13-35551 [034] 164201.591132: funcgraph_entry:        0.054 us   |      _raw_read_lock();
            ldlm_bl_13-35551 [034] 164201.591132: funcgraph_entry:        0.039 us   |      lu_env_refill();
            ldlm_bl_13-35551 [034] 164201.591133: funcgraph_entry:        0.046 us   |      cl_env_init0();
            ldlm_bl_13-35551 [034] 164201.591133: funcgraph_entry:        0.035 us   |      lu_context_enter();
            ldlm_bl_13-35551 [034] 164201.591133: funcgraph_entry:        0.034 us   |      lu_context_enter();
            ldlm_bl_13-35551 [034] 164201.591134: funcgraph_exit:         1.811 us   |    }
            ldlm_bl_13-35551 [034] 164201.591134: funcgraph_entry:                   |    cl_object_flush() {
            ldlm_bl_13-35551 [034] 164201.591134: funcgraph_entry:                   |      lov_object_flush() {
            ldlm_bl_13-35551 [034] 164201.591134: funcgraph_entry:        0.115 us   |        down_read();
            ldlm_bl_13-35551 [034] 164201.591135: funcgraph_entry:                   |        lov_flush_composite() {
            ldlm_bl_13-35551 [034] 164201.591135: funcgraph_entry:                   |          cl_object_flush() {
            ldlm_bl_13-35551 [034] 164201.591135: funcgraph_entry:                   |            mdc_object_flush() {
            ldlm_bl_13-35551 [034] 164201.591136: funcgraph_entry:                   |              mdc_dlm_blocking_ast0() {
            ldlm_bl_13-35551 [034] 164201.591136: funcgraph_entry:                   |                lock_res_and_lock() {
            ldlm_bl_13-35551 [034] 164201.591136: funcgraph_entry:        0.114 us   |                  _raw_spin_lock();
            ldlm_bl_13-35551 [034] 164201.591136: funcgraph_entry:        0.030 us   |                  _raw_spin_lock();
            ldlm_bl_13-35551 [034] 164201.591137: funcgraph_exit:         0.677 us   |                }
            ldlm_bl_13-35551 [034] 164201.591137: funcgraph_entry:        0.031 us   |                unlock_res_and_lock();
            ldlm_bl_13-35551 [034] 164201.591137: funcgraph_exit:         1.363 us   |              }
            ldlm_bl_13-35551 [034] 164201.591137: funcgraph_exit:         1.674 us   |            }
            ldlm_bl_13-35551 [034] 164201.591137: funcgraph_exit:         2.207 us   |          }
            ldlm_bl_13-35551 [034] 164201.591138: funcgraph_exit:         2.596 us   |        }
            ldlm_bl_13-35551 [034] 164201.591138: funcgraph_entry:        0.042 us   |        up_read();
            ldlm_bl_13-35551 [034] 164201.591138: funcgraph_exit:         3.714 us   |      }
            ldlm_bl_13-35551 [034] 164201.591138: funcgraph_exit:         4.279 us   |    }
            ldlm_bl_13-35551 [034] 164201.591138: funcgraph_entry:                   |    cl_env_put() {
            ldlm_bl_13-35551 [034] 164201.591138: funcgraph_entry:        0.034 us   |      lu_context_exit();
            ldlm_bl_13-35551 [034] 164201.591139: funcgraph_entry:        0.030 us   |      lu_context_exit();
            ldlm_bl_13-35551 [034] 164201.591139: funcgraph_entry:        0.030 us   |      _raw_read_lock();
            ldlm_bl_13-35551 [034] 164201.591139: funcgraph_exit:         0.990 us   |    }
            ldlm_bl_13-35551 [034] 164201.591140: funcgraph_exit:         8.253 us   |  }
      

      easy to see - mdc_dlm_blocking_ast0 skipped at begin, it mean lock isn't granted or no l_ast_data aka osc object assigned. Data was obtained from page cache later.

                <...>-40843 [000] 164229.430007: funcgraph_entry:                   |  ll_do_fast_read() {
                 <...>-40843 [000] 164229.430009: funcgraph_entry:                   |    generic_file_read_iter() {
                 <...>-40843 [000] 164229.430010: funcgraph_entry:        0.044 us   |      _cond_resched();
                 <...>-40843 [000] 164229.430010: funcgraph_entry:                   |      pagecache_get_page() {
                 <...>-40843 [000] 164229.430010: funcgraph_entry:        0.706 us   |        find_get_entry();
                 <...>-40843 [000] 164229.430011: funcgraph_exit:         1.078 us   |      }
                 <...>-40843 [000] 164229.430012: funcgraph_entry:                   |      mark_page_accessed() {
                 <...>-40843 [000] 164229.430012: funcgraph_entry:        0.088 us   |        activate_page();
                 <...>-40843 [000] 164229.430012: funcgraph_entry:        0.143 us   |        workingset_activation();
                 <...>-40843 [000] 164229.430013: funcgraph_exit:         0.925 us   |      }
                 <...>-40843 [000] 164229.430014: funcgraph_entry:        0.032 us   |      _cond_resched();
                 <...>-40843 [000] 164229.430014: funcgraph_entry:                   |      pagecache_get_page() {
                 <...>-40843 [000] 164229.430014: funcgraph_entry:        0.070 us   |        find_get_entry();
                 <...>-40843 [000] 164229.430014: funcgraph_exit:         0.401 us   |      }
                 <...>-40843 [000] 164229.430015: funcgraph_entry:                   |      mark_page_accessed() {
                 <...>-40843 [000] 164229.430015: funcgraph_entry:        0.037 us   |        activate_page();
                 <...>-40843 [000] 164229.430015: funcgraph_entry:        0.039 us   |        workingset_activation();
                 <...>-40843 [000] 164229.430015: funcgraph_exit:         0.649 us   |      }
      ....
      

      Short description - how it was hit.
      getattr_by_name provide a "DoM" bit in response while client have an DoM lock already, but no io under this lock was exist.

      2) DoM's read on open corruption.
      Scenario is near to same as previously. Open provide a data which moved to the page cache very early, with Uptodata flag set. But osc object isn't assigned to the lock.
      Data was read with ll_do_fast_read and no real IO + lock match in mdc_enqueue_send().
      Lock was canceled without pages flush, but client continue to read a stale data via ll_do_fast_read.

      ...

      Attachments

        Issue Links

          Activity

            People

              shadow Alexey Lyashkov
              shadow Alexey Lyashkov
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: