Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This ticket describes a way we could potentially read stale data in Lustre, but to be clear, we are not exposed to this today. There is no bug in current Lustre versions, but there is a change we should make for the future. Please read on for the fine details.
Lustre's fast read functionality lets us do a read of cache data without doing a full IO. This is great, as it hugely speeds up reads of already cached data.
However, it's extremely important we make sure this cached data is safe. We flush the cache when the covering OST lock is cancelled, but we do not flush the cache when we lose the layout lock. We flush the cache when we notice the layout has changed, which is when we get a new layout lock.
This opens up a possibility of getting stale data - if our cached data is from a mirror which becomes stale, and we do not check the layout before reading, we would read stale data from our cache. We could also always flush the cache when we lose the layout lock, but this is undesirable for other reasons.
So this means it is essential to refresh the layout before doing a fast read, otherwise we could read stale data if the layout has changed and, eg, the mirror our data is from is now stale.
Today, we do this incidentally in file_read_confine_iter, which is good as it means we are not exposed, but not good because it is implicit. We need to explicitly recognize this case and handle it directly in the fast read code, otherwise the code is fragile to future changes.