Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9305

Running File System Aging create write checksum errors

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.10.0, Lustre 2.11.0
    • None
    • 1
    • 9223372036854775807

    Description

      My most recent re-production of this was:
      ZFS based on 0.7.0 RC4 fs/zfs:coral-rc1-combined
      Lustre tagged release 2.9.57(but 2.9.58 fails as well)
      Centos 7.3 3.10.0-514.16.1.el7.x86_64

      I have personally verified this fails on Lustre 2.8, 2.9 and latest tagged release, zfs 0.6.5-current ZOL Master and the most recent Centos 7.1, 7.2, and 7.3 kernels.

      This may well be a Lustre issue I need to try to reproduce on raidz, with out large RPCs, etc.

      On both the clients and OSS nodes we see checksum errors while the file aging test is running such as:
      [ 9354.968454] LustreError: 168-f: BAD WRITE CHECKSUM: lsdraid-OST0000 from 12345-192.168.1.6@o2ib inode [0x200000401:0x254:0x0] object 0x0:292 extent [117440512-125698047]: client csum de357896, server csum 5cd77893

      [ 9394.315856] LustreError: 168-f: BAD WRITE CHECKSUM: lsdraid-OST0000 from 12345-192.168.1.6@o2ib inode [0x200000401:0x28c:0x0] object 0x0:320 extent [67108864-82968575]: client csum df6bd34a, server csum 8480d352
      [ 9404.371609] LustreError: 168-f: BAD WRITE CHECKSUM: lsdraid-OST0000 from 12345-192.168.1.6@o2ib inode [0x200000401:0x298:0x0] object 0x0:326 extent [67108864-74448895]: client csum 2ced4ec0, server csum 1f814ec4

      Attachments

        1. BasicLibs.py
          6 kB
        2. debug_info.20170406_143409_48420_wolf-3.wolf.hpdd.intel.com.tgz
          3.45 MB
        3. debug_vmalloc_lustre.patch
          6 kB
        4. debug_vmalloc_spl.patch
          14 kB
        5. debug_vmalloc.patch
          22 kB
        6. FileAger-wolf6.py
          6 kB
        7. FileAger-wolf7.py
          6 kB
        8. FileAger-wolf8.py
          6 kB
        9. FileAger-wolf9.py
          6 kB
        10. Linux_x64_Memory_Address_Mapping.pdf
          224 kB
        11. wolf-6_client.tgz
          5.67 MB

        Issue Links

          Activity

            [LU-9305] Running File System Aging create write checksum errors
            jay Jinshan Xiong (Inactive) added a comment - - edited

            This is indeed a race condition. I wonder why I couldn't catch the race by enabling VM_BUG_ON_PAGE() in put_page_testzero().

            jay Jinshan Xiong (Inactive) added a comment - - edited This is indeed a race condition. I wonder why I couldn't catch the race by enabling VM_BUG_ON_PAGE() in put_page_testzero() .

            with https://review.whamcloud.com/27950 I can't reproduce the issue.

            bzzz Alex Zhuravlev added a comment - with https://review.whamcloud.com/27950 I can't reproduce the issue.

            Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: https://review.whamcloud.com/27950
            Subject: LU-9305 osd: do not release pages twice
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 12d3b3dcfbc17fc201dc9de463720e3a3a994f49

            gerrit Gerrit Updater added a comment - Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: https://review.whamcloud.com/27950 Subject: LU-9305 osd: do not release pages twice Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 12d3b3dcfbc17fc201dc9de463720e3a3a994f49

            No, i'm not familiar with wolf cluster.

            bzzz Alex Zhuravlev added a comment - No, i'm not familiar with wolf cluster.

            wolf cluster should be available to use. Do you have access to wolf?

            jay Jinshan Xiong (Inactive) added a comment - wolf cluster should be available to use. Do you have access to wolf?

            may I ask to run https://review.whamcloud.com/27913 on a real iron, please?
            I'm not sure about the real sequence, but my theory is that osd_bufs_put() may try to release pages already released by dmu_assign_arcbuf() when blocksize mismatch.

            bzzz Alex Zhuravlev added a comment - may I ask to run https://review.whamcloud.com/27913 on a real iron, please? I'm not sure about the real sequence, but my theory is that osd_bufs_put() may try to release pages already released by dmu_assign_arcbuf() when blocksize mismatch.

            Alex:

            That's a good finding. What would be the next step?

            jay Jinshan Xiong (Inactive) added a comment - Alex: That's a good finding. What would be the next step?
            bzzz Alex Zhuravlev added a comment - - edited

            I added few CDEBUG() and BUG() in tgt_warn_on_cksum() ..

            00080000:00080000:1.0:1498681682.290436:0:13445:0:(osd_io.c:541:osd_bufs_get()) pages: ffffea00003ac3c0/4096@12582912
            00000020:00080000:1.0:1498681682.363189:0:13445:0:(tgt_handler.c:2121:tgt_warn_on_cksum()) pages: ffffea00003ac3c0/4096@12582912

            00080000:00080000:0.0:1498681682.396783:0:14108:0:(osd_io.c:541:osd_bufs_get()) pages: ffffea00003ac3c0/4096@20086784

            so tgt_warn_on_cksum in thread 13445 did hit BUG and couldn't proceed to osd_bufs_put() to release page ffffea00003ac3c0 - in theory. but thread 14108 was able to get this page few cycles after.

            bzzz Alex Zhuravlev added a comment - - edited I added few CDEBUG() and BUG() in tgt_warn_on_cksum() .. 00080000:00080000:1.0:1498681682.290436:0:13445:0:(osd_io.c:541:osd_bufs_get()) pages: ffffea00003ac3c0/4096@12582912 00000020:00080000:1.0:1498681682.363189:0:13445:0:(tgt_handler.c:2121:tgt_warn_on_cksum()) pages: ffffea00003ac3c0/4096@12582912 00080000:00080000:0.0:1498681682.396783:0:14108:0:(osd_io.c:541:osd_bufs_get()) pages: ffffea00003ac3c0/4096@20086784 so tgt_warn_on_cksum in thread 13445 did hit BUG and couldn't proceed to osd_bufs_put() to release page ffffea00003ac3c0 - in theory. but thread 14108 was able to get this page few cycles after.
            sarah Sarah Liu added a comment -

            it is automatically generated if a node crashes. please refer to https://wiki.hpdd.intel.com/display/TEI/Core+dump+location+for+autotest+nodes

            sarah Sarah Liu added a comment - it is automatically generated if a node crashes. please refer to https://wiki.hpdd.intel.com/display/TEI/Core+dump+location+for+autotest+nodes

            thanks. did you upload those files or autotest did automatically?

            bzzz Alex Zhuravlev added a comment - thanks. did you upload those files or autotest did automatically?
            sarah Sarah Liu added a comment - - edited

            Alex, please go to trevis and check the following dir, hope it has what you are looking for

            -sh-4.1$ ls /scratch/dumps/trevis-36vm4.trevis.hpdd.intel.com
            10.9.5.195-2017-06-22-22:11:54  10.9.5.195-2017-06-23-12:20:38
            10.9.5.195-2017-06-23-06:33:26  2016-06-26-05:38
            

            When a autotest node crashes, the core dump location will be in /scratch/dumps/<hostname>

            sarah Sarah Liu added a comment - - edited Alex, please go to trevis and check the following dir, hope it has what you are looking for -sh-4.1$ ls /scratch/dumps/trevis-36vm4.trevis.hpdd.intel.com 10.9.5.195-2017-06-22-22:11:54 10.9.5.195-2017-06-23-12:20:38 10.9.5.195-2017-06-23-06:33:26 2016-06-26-05:38 When a autotest node crashes, the core dump location will be in /scratch/dumps/<hostname>

            People

              bzzz Alex Zhuravlev
              jsalians_intel John Salinas (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: