Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
Lustre 2.1.0
-
None
-
RHEL6
-
3
-
6592
Description
While porting zero copy patch to RHEL6 we have found some corruptions during IO while raid5/6 reconstruction. I think it's should be affect to RHEL5 also.
it's easy to replicated by
echo 32 > /sys/block/md0/md/stripe_cache_size
echo 0 > /proc/fs/lustre/obdfilter/<ost_name>/writethrough_cache_enable
echo 0 > /proc/fs/lustre/obdfilter/<ost_name>/read_cache_enable
and fail one of disks with
mdadm /dev/mdX --fail /dev/....
after it verify data is correct.
[root@sjlustre1-o1 ~]# dd if=/dev/urandom of=test.1 oflag=direct bs=128k
count=8
8+0 records in
8+0 records out
1048576 bytes (1.0 MB) copied, 0.157819 seconds, 6.6 MB/s
[root@sjlustre1-o1 ~]# md5sum test.1
4ec4d0b67a2b3341795706605e0b0a28 test.1
[root@sjlustre1-o1 ~]# md5sum test.1 > test.1.md5
[root@sjlustre1-o1 ~]# dd if=test.1 iflag=direct of=/lustre/stry/test.1
oflag=direct bs=128k
8+0 records in
8+0 records out
1048576 bytes (1.0 MB) copied, 0.319458 seconds, 3.3 MB/s
[root@sjlustre1-o1 ~]# dd if=/lustre/stry/test.1 iflag=direct of=test.2
oflag=direct bs=128k
8+0 records in
8+0 records out
1048576 bytes (1.0 MB) copied, 0.114691 seconds, 9.1 MB/s
[root@sjlustre1-o1 ~]# md5sum test.1 test.2
4ec4d0b67a2b3341795706605e0b0a28 test.1
426c976b75fa3ce5b5ae22b5195f85fd test.2
after work problem identified as two bugs in zcopy patch.
1) raid5 set a flag UPTODATE to stripe with staled pointers from DIO and try to copy data from these pointers during READ phase.
2) restoring pages from stripe cache issue.
please verify it's issue on RHEL5 env (we don't have it's now).
Attachments
Issue Links
- Trackbacks