Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10413

Side-effect of 'stat' on data writeback

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      There is quote old code in ldlm_handle_gl_callback() which flushes data on glimpse:

      static void ldlm_handle_gl_callback()
      {
      ...
              if (lock->l_granted_mode == LCK_PW &&
                  !lock->l_readers && !lock->l_writers &&
      	    ktime_after(ktime_get(),
      			ktime_add(lock->l_last_used,
      				  ktime_set(10, 0)))) {
                      unlock_res_and_lock(lock);
                      if (ldlm_bl_to_thread_lock(ns, NULL, lock))
                              ldlm_handle_bl_callback(ns, NULL, lock);
      
                      EXIT;
                      return;
              }
      ...
      }
      

      It flushes lock data if lock stays on client more than 10 seconds. That means we have sort of additional and non-controlled flusher for a client dirty data. For example, regular 'stat' operation (stat-ahead?) on some client may work as 'flusher' for all other clients with dirty data. I see two problems with that:
      1. client usually has own flush policy and timings but they doesn't work as expected. E.g. client flusher time is 30 seconds but data will be still flushed often (on 10 second basis)
      2. Even if this feature is useful the value used for timeout is hardcoded as 10 seconds.

      Solutions:
      1. Remove this code. The kernel flusher works with Lustre and its default is 30 seconds, each Lustre client may be tuned by its administrator for any needed value, including 10 seconds if needed. We don't need another level of control on this.

      2. Keep the code but exclude glimpses from stat-ahead and similar things. The original idea of that code was to flush dirty data because there is a sign (glimpse) that somebody is going to use this file. With stat-ahead it is not true anymore.

      3. Keep the code but introduce parameter for it instead of hardcoded 10 seconds. Zero means this feature is disabled.

      Attachments

        Issue Links

          Activity

            [LU-10413] Side-effect of 'stat' on data writeback
            pjones Peter Jones added a comment -

            Landed for 2.12

            pjones Peter Jones added a comment - Landed for 2.12

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/32113/
            Subject: LU-10413 ldlm: expose dirty age limit for flush-on-glimpse
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 69727e45b4c0194f97c74df65b45fbf6a23235c4

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/32113/ Subject: LU-10413 ldlm: expose dirty age limit for flush-on-glimpse Project: fs/lustre-release Branch: master Current Patch Set: Commit: 69727e45b4c0194f97c74df65b45fbf6a23235c4

            Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/32113
            Subject: LU-10413 ldlm: expose dirty age limit for flush-on-glimpse
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: deea36344bce23db938fa04f428da26283706886

            gerrit Gerrit Updater added a comment - Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/32113 Subject: LU-10413 ldlm: expose dirty age limit for flush-on-glimpse Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: deea36344bce23db938fa04f428da26283706886

            Patrick, the / 100 is because that parameter is stored in centisecond units, not seconds (AFAIK).

            adilger Andreas Dilger added a comment - Patrick, the / 100 is because that parameter is stored in centisecond units, not seconds (AFAIK).

            A further thought:
            Given the 10 second delay, it's actually going to be pretty rare there's any dirty data to flush here.  You'd have to have a situation where dirty data was still present 10 seconds after use.  The OSC writeback policies already guarantee that's a small amount of data, unless the system is extremely busy and unable to get the data written out quickly.  It's just hard to imagine a real workload where this would significantly change the effective writeback behavior.  At most it flushes some small files or trailing bits of data a bit faster.

            This seems like a good optimization.  Quite good, in fact.  I like Andreas' suggestion to hook it to the dirty_writeback_interval - 10 seconds vs 30 seconds doesn't seem like a huge change (both seem like fairly large numbers for jobs that are doing "create one place, open/stat in another" as sequential operations).  Not sure about the /100 part...?

            paf Patrick Farrell (Inactive) added a comment - A further thought: Given the 10 second delay, it's actually going to be pretty rare there's any dirty data to flush here.  You'd have to have a situation where dirty data was still present 10 seconds after use.  The OSC writeback policies already guarantee that's a small amount of data, unless the system is extremely busy and unable to get the data written out quickly.  It's just hard to imagine a real workload where this would significantly change the effective writeback behavior.  At most it flushes some small files or trailing bits of data a bit faster. This seems like a good optimization.  Quite good, in fact.  I like Andreas' suggestion to hook it to the dirty_writeback_interval - 10 seconds vs 30 seconds doesn't seem like a huge change (both seem like fairly large numbers for jobs that are doing "create one place, open/stat in another" as sequential operations).  Not sure about the /100 part...?
            tappro Mikhail Pershin added a comment - - edited

            So this is the question of balance between the ability to cache file's attributes on clients and the flushing of dirty data on other clients. The best strategy depends on the way cluster is being used and I think the only thing we can do here is to expose this parameter via procfs and describe this feature better somewhere, so administrators may use it as needed.

            tappro Mikhail Pershin added a comment - - edited So this is the question of balance between the ability to cache file's attributes on clients and the flushing of dirty data on other clients. The best strategy depends on the way cluster is being used and I think the only thing we can do here is to expose this parameter via procfs and describe this feature better somewhere, so administrators may use it as needed.

            People

              tappro Mikhail Pershin
              tappro Mikhail Pershin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: