Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12681

Data corruption - due incorrect KMS with SEL files

Details

    • 3
    • 9223372036854775807

    Description

      In testing with SEL port to the 2.12 branch, Grev found an data corruption issue.
      I checked it with last master on client side and issue still present.
      example of it

      [4] FAILED comparison of buffer containing 8-byte ints:
      [4]   File name = /mnt/lustre/SEL/IOR.o
      [4]   In transfer 392, 256 errors between buffer indices 212992 and 213247.
      [4]   File byte offset = 5454028800:
      [4]     Expected: 0x000000045d5e330f 00000000001a0008 000000045d5e330f 00000000001a0018
      [4]     Actual:   0x0000000000000000 0000000000000000 0000000000000000 0000000000000000
      ior ERROR: data check error, aborting execution, errno 0, Success (ior.c:414)
      

      current step of investigation, KMS don't valid in some cases and ll_prepare_partial_page fill a full page with zero, while part of them already send to the OST.
      this quick and dirty fix resolves an issue but KMS problem needs invested.

      @@ -598,25 +597,30 @@ static int ll_prepare_partial_page(const struct lu_env *env, struct cl_io *io,
                      GOTO(out, result);
              }
      
      +#if 0
              /*
               * If are writing to a new page, no need to read old data.
               * The extent locking will have updated the KMS, and for our
               * purposes here we can treat it like i_size.
               */
      -       if (attr->cat_kms <= offset) {
      +       if (attr->cat_kms < offset) {
                      char *kaddr = ll_kmap_atomic(vpg->vpg_page, KM_USER0);
      
                      memset(kaddr, 0, cl_page_size(obj));
                      ll_kunmap_atomic(kaddr, KM_USER0);
      +               CDEBUG(D_INFO, "kms-skip %llu <> %llu\n", attr->cat_kms, offset);
                      GOTO(out, result = 0);
              }
      +#endif
      
      00000080:00200000:1.0:1566400964.212391:0:28647:0:(rw26.c:833:ll_write_end()) pos 3891347456, len 2048, copied 2048
      00000080:00000040:1.0:1566400964.407924:0:28647:0:(rw26.c:611:ll_prepare_partial_page()) kms-skip 3643416576 <> 3891347456
      00000080:00200000:1.0:1566400964.407925:0:28647:0:(rw26.c:833:ll_write_end()) pos 3891349504, len 2048, copied 2048
      and brw sends a full page to the OST
      00000008:00000040:1.0:1566400964.653615:0:28647:0:(osc_request.c:1556:osc_brw_prep_request()) buff[1] [3891347456,966656,520]
      

      it not a problem with offset on same page, I tries to fix this check against real offset - but sometimes KMS result is very strange like KMS = 3G while offset point is 5G, so it's something like a more complex problem.

      Attachments

        Issue Links

          Activity

            [LU-12681] Data corruption - due incorrect KMS with SEL files

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40740/
            Subject: LU-12681 osc: wrong cache of LVB attrs, part2
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 8c74fe2c6c3f8472ddf39f6ccc5e9c3744de344d

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40740/ Subject: LU-12681 osc: wrong cache of LVB attrs, part2 Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 8c74fe2c6c3f8472ddf39f6ccc5e9c3744de344d

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40739/
            Subject: LU-12681 osc: wrong cache of LVB attrs
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: d9329ccb757f06f0ff9df2a2c1b627bb29015c81

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40739/ Subject: LU-12681 osc: wrong cache of LVB attrs Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: d9329ccb757f06f0ff9df2a2c1b627bb29015c81

            Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40740
            Subject: LU-12681 osc: wrong cache of LVB attrs, part2
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: e9580cc6f49d185e080ff4139176aa9369a4e3ca

            gerrit Gerrit Updater added a comment - Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40740 Subject: LU-12681 osc: wrong cache of LVB attrs, part2 Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: e9580cc6f49d185e080ff4139176aa9369a4e3ca

            Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40739
            Subject: LU-12681 osc: wrong cache of LVB attrs
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 5aa6767d63368e630531669e51b4c8e11e1788b8

            gerrit Gerrit Updater added a comment - Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40739 Subject: LU-12681 osc: wrong cache of LVB attrs Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 5aa6767d63368e630531669e51b4c8e11e1788b8
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36200/
            Subject: LU-12681 osc: wrong cache of LVB attrs, part2
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 40319db5bc649adaf3dad066e2c1bb49f7f1c04a

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36200/ Subject: LU-12681 osc: wrong cache of LVB attrs, part2 Project: fs/lustre-release Branch: master Current Patch Set: Commit: 40319db5bc649adaf3dad066e2c1bb49f7f1c04a

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36199/
            Subject: LU-12681 osc: wrong cache of LVB attrs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 8ac020df4592fc6e85edd75d54cb3795a4e50f8e

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36199/ Subject: LU-12681 osc: wrong cache of LVB attrs Project: fs/lustre-release Branch: master Current Patch Set: Commit: 8ac020df4592fc6e85edd75d54cb3795a4e50f8e

            Sorry, I meant the read required to do a partial page write.  I know what you're talking about.

            The read is of the full page, but then the write overwrites part of it.  So that's why I said a "partial page read", but that is incorrect, as you pointed out.

            My point about the corruption being limited in this case to just zeroes, not garbage data, still stands.  That's why LU-12786 doesn't match here.

            pfarrell Patrick Farrell (Inactive) added a comment - Sorry, I meant the read required to do a partial page write.  I know what you're talking about. The read is of the full page, but then the write overwrites part of it.  So that's why I said a "partial page read", but that is incorrect, as you pointed out. My point about the corruption being limited in this case to just zeroes, not garbage data, still stands.  That's why LU-12786 doesn't match here.

            Partick,

            you are wrong. This bug on partial page write. POSIX requests a zero page before modify.
            and lustre does it in the

            static int ll_prepare_partial_page(const struct lu_env *env, struct cl_io *io,
                                               struct cl_page *pg, struct file *file)
            {
                    /*
                     * If are writing to a new page, no need to read old data.
                     * The extent locking will have updated the KMS, and for our
                     * purposes here we can treat it like i_size.
                     */
                    if (attr->cat_kms <= offset) {
                            char *kaddr = ll_kmap_atomic(vpg->vpg_page, KM_USER0);
            
                            memset(kaddr, 0, cl_page_size(obj));
                            ll_kunmap_atomic(kaddr, KM_USER0);
                            GOTO(out, result = 0);
                    }
            

            it caused a zeros will send to OST and to the disk.

            shadow Alexey Lyashkov added a comment - Partick, you are wrong. This bug on partial page write . POSIX requests a zero page before modify. and lustre does it in the static int ll_prepare_partial_page( const struct lu_env *env, struct cl_io *io, struct cl_page *pg, struct file *file) { /* * If are writing to a new page, no need to read old data. * The extent locking will have updated the KMS, and for our * purposes here we can treat it like i_size. */ if (attr->cat_kms <= offset) { char *kaddr = ll_kmap_atomic(vpg->vpg_page, KM_USER0); memset(kaddr, 0, cl_page_size(obj)); ll_kunmap_atomic(kaddr, KM_USER0); GOTO(out, result = 0); } it caused a zeros will send to OST and to the disk.

            Ihara,

            I don't think so - This bug results mostly in zeroes in partial page reads.  In most scenarios, this is transient corruption that stays on the client, and doesn't go to disk.  (It can go to disk, but only under special scenarios.)  You have much larger regions of bad data (most of a write in all cases I saw), the data is bad on disk, and the data is not zeroes, instead it's mostly random garbage.  (It is in one case, but it's possible for unwritten regions of disk to contain zeroes rather than garbage.)

            pfarrell Patrick Farrell (Inactive) added a comment - Ihara, I don't think so - This bug results mostly in zeroes in partial page reads.  In most scenarios, this is transient corruption that stays on the client, and doesn't go to disk.  (It can go to disk, but only under special scenarios.)  You have much larger regions of bad data (most of a write in all cases I saw), the data is bad on disk, and the data is not zeroes, instead it's mostly random garbage.  (It is in one case, but it's possible for unwritten regions of disk to contain zeroes rather than garbage.)

            People

              vitaly_fertman Vitaly Fertman
              shadow Alexey Lyashkov
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: