Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10413

Side-effect of 'stat' on data writeback

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      There is quote old code in ldlm_handle_gl_callback() which flushes data on glimpse:

      static void ldlm_handle_gl_callback()
      {
      ...
              if (lock->l_granted_mode == LCK_PW &&
                  !lock->l_readers && !lock->l_writers &&
      	    ktime_after(ktime_get(),
      			ktime_add(lock->l_last_used,
      				  ktime_set(10, 0)))) {
                      unlock_res_and_lock(lock);
                      if (ldlm_bl_to_thread_lock(ns, NULL, lock))
                              ldlm_handle_bl_callback(ns, NULL, lock);
      
                      EXIT;
                      return;
              }
      ...
      }
      

      It flushes lock data if lock stays on client more than 10 seconds. That means we have sort of additional and non-controlled flusher for a client dirty data. For example, regular 'stat' operation (stat-ahead?) on some client may work as 'flusher' for all other clients with dirty data. I see two problems with that:
      1. client usually has own flush policy and timings but they doesn't work as expected. E.g. client flusher time is 30 seconds but data will be still flushed often (on 10 second basis)
      2. Even if this feature is useful the value used for timeout is hardcoded as 10 seconds.

      Solutions:
      1. Remove this code. The kernel flusher works with Lustre and its default is 30 seconds, each Lustre client may be tuned by its administrator for any needed value, including 10 seconds if needed. We don't need another level of control on this.

      2. Keep the code but exclude glimpses from stat-ahead and similar things. The original idea of that code was to flush dirty data because there is a sign (glimpse) that somebody is going to use this file. With stat-ahead it is not true anymore.

      3. Keep the code but introduce parameter for it instead of hardcoded 10 seconds. Zero means this feature is disabled.

      Attachments

        Issue Links

          Activity

            People

              tappro Mikhail Pershin
              tappro Mikhail Pershin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: