[LU-10413] Side-effect of 'stat' on data writeback - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.12.0
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

There is quote old code in ldlm_handle_gl_callback() which flushes data on glimpse:

static void ldlm_handle_gl_callback()
{
...
        if (lock->l_granted_mode == LCK_PW &&
            !lock->l_readers && !lock->l_writers &&
	    ktime_after(ktime_get(),
			ktime_add(lock->l_last_used,
				  ktime_set(10, 0)))) {
                unlock_res_and_lock(lock);
                if (ldlm_bl_to_thread_lock(ns, NULL, lock))
                        ldlm_handle_bl_callback(ns, NULL, lock);

                EXIT;
                return;
        }
...
}

It flushes lock data if lock stays on client more than 10 seconds. That means we have sort of additional and non-controlled flusher for a client dirty data. For example, regular 'stat' operation (stat-ahead?) on some client may work as 'flusher' for all other clients with dirty data. I see two problems with that:
1. client usually has own flush policy and timings but they doesn't work as expected. E.g. client flusher time is 30 seconds but data will be still flushed often (on 10 second basis)
2. Even if this feature is useful the value used for timeout is hardcoded as 10 seconds.

Solutions:
1. Remove this code. The kernel flusher works with Lustre and its default is 30 seconds, each Lustre client may be tuned by its administrator for any needed value, including 10 seconds if needed. We don't need another level of control on this.

2. Keep the code but exclude glimpses from stat-ahead and similar things. The original idea of that code was to flush dirty data because there is a sign (glimpse) that somebody is going to use this file. With stat-ahead it is not true anymore.

3. Keep the code but introduce parameter for it instead of hardcoded 10 seconds. Zero means this feature is disabled.

Attachments

Issue Links

is related to

LU-10279 sanityn test_101c: FAIL: Found WRITE RPC but expect none

Resolved

Activity

People

Assignee:: Mikhail Pershin

Reporter:: Mikhail Pershin

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 20/Dec/17 10:28 AM

Updated:: 06/May/18 4:31 AM

Resolved:: 06/May/18 4:31 AM