Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-468

md-raid corruptions for zero copy patch.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • Lustre 2.1.0
    • Lustre 2.1.0
    • None
    • RHEL6
    • 3
    • 6592

    Description

      While porting zero copy patch to RHEL6 we have found some corruptions during IO while raid5/6 reconstruction. I think it's should be affect to RHEL5 also.

      it's easy to replicated by

      echo 32 > /sys/block/md0/md/stripe_cache_size
      echo 0 > /proc/fs/lustre/obdfilter/<ost_name>/writethrough_cache_enable
      echo 0 > /proc/fs/lustre/obdfilter/<ost_name>/read_cache_enable

      and fail one of disks with
      mdadm /dev/mdX --fail /dev/....

      after it verify data is correct.

      [root@sjlustre1-o1 ~]# dd if=/dev/urandom of=test.1 oflag=direct bs=128k
      count=8
      8+0 records in
      8+0 records out
      1048576 bytes (1.0 MB) copied, 0.157819 seconds, 6.6 MB/s
      [root@sjlustre1-o1 ~]# md5sum test.1
      4ec4d0b67a2b3341795706605e0b0a28 test.1
      [root@sjlustre1-o1 ~]# md5sum test.1 > test.1.md5
      [root@sjlustre1-o1 ~]# dd if=test.1 iflag=direct of=/lustre/stry/test.1
      oflag=direct bs=128k
      8+0 records in
      8+0 records out
      1048576 bytes (1.0 MB) copied, 0.319458 seconds, 3.3 MB/s

      [root@sjlustre1-o1 ~]# dd if=/lustre/stry/test.1 iflag=direct of=test.2
      oflag=direct bs=128k
      8+0 records in
      8+0 records out
      1048576 bytes (1.0 MB) copied, 0.114691 seconds, 9.1 MB/s
      [root@sjlustre1-o1 ~]# md5sum test.1 test.2
      4ec4d0b67a2b3341795706605e0b0a28 test.1
      426c976b75fa3ce5b5ae22b5195f85fd test.2

      after work problem identified as two bugs in zcopy patch.
      1) raid5 set a flag UPTODATE to stripe with staled pointers from DIO and try to copy data from these pointers during READ phase.

      2) restoring pages from stripe cache issue.

      please verify it's issue on RHEL5 env (we don't have it's now).

      Attachments

        1. 01.fix_uptodate_flag.patch
          0.9 kB
          Alexey Lyashkov
        2. 02.switch_page.patch
          3 kB
          Alexey Lyashkov

        Issue Links

          Activity

            People

              rhenwood Richard Henwood (Inactive)
              shadow Alexey Lyashkov
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: