|
comment
Isami, thanks for filing this bug. It is always useful to track issues like this, even if they do not appear to be causing any significant problems, since I suspect it is a sign that the timestamps aren't being handled correctly somewhere, and could lead to much larger skews in the timestamps. just a note about the "failure rate" of test results in Maloo. The percentage reported in Maloo means "1 of the past 100 tests failed" (i.e. the test just run and failed), so it isn't at all clear whether this means "every test from now on with this patch will fail" or "this was a fluke and the failure rate is 1 in 10000". If you want to get statistically significant numbers on how often this is failing, please try something like:
or similar, so that it runs sanityn test_39* several times. There typically shouldn't be any "drift" in the timestamps in this manner - everything should be derived only from the client clock. There are a couple of places that this might be introduced: - different nanosecond times on different CPU cores due to clock drift. I doubt this is the case anymore, but this used to happen in the past - skew between userspace and kernel timestamps. This could only be the case if the test is somehow using the clock from userspace - bug in the code allowing the MDS or OSS timestamps to affect the file. This is the most likely cause. It would be possible to test out the last hypothesis by setting wildly different times on the MDS and OSS nodes from the client (either or both in the past or future), and then tracing where the bad timestamp is coming from (which should be easy if the timestamps can clearly be distinguished as to which node they came from).
|