Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
Lustre 2.12.6 on the client, Lustre 2.12.8_6_g5457c37 on the servers
both on Scientific Linux 7.9
-
3
-
9223372036854775807
Description
On some clients (9 out of 242), the wrong version of a file is read, unless direct I/O is used. The file is small and completely stored on the mdt:
[wgs34] /root # lfs getstripe /lustre/fs24/files/test.py
/lustre/fs24/files/test.py
lcm_layout_gen: 2
lcm_mirror_count: 1
lcm_entry_count: 2
lcme_id: 1
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 524288
lmm_stripe_count: 0
lmm_stripe_size: 524288
lmm_pattern: mdt
lmm_layout_gen: 0
lmm_stripe_offset: 0
lcme_id: 2
lcme_mirror_id: 0
lcme_flags: 0
lcme_extent.e_start: 524288
lcme_extent.e_end: EOF
lmm_stripe_count: 1
lmm_stripe_size: 524288
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: -1
If I copy the file on one of the affected machines, it is the old overwritten version. If I use direct I/O, the file has the correct content:
[pax9-02] /root # dd if=/lustre/fs24/files/test.py of=bla iflag=direct bs=4096
0+1 records in
0+1 records out
642 bytes (642 B) copied, 0.000977625 s, 657 kB/s
There are no Lustre errors on the client or server. sync or lflush had no effect, but umounting and remounting the file system fixed this.