Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11187

MMP updated sometimes failes T10PI checks

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.12.0, Lustre 2.10.7
    • Lustre 2.10.3
    • None
    • 2
    • 9223372036854775807

    Description

      We had seen this before. LU-5481. At time we just removed MMP from the OST, because we didn't use hos failover. But our new filesystem does use host failover. We are seeing the same error on a ISER+T10PI connect storage. This error can happen at mount time and random times during IO.

       [ 3520.840977] mlx5_3:mlx5_poll_one:657:(pid 0): CQN: 0xc05 Got SIGERR on key: 0x80007b0b err_type 0 err_offset 207 expected 9b3c actual a13c
      [ 3520.878451] PI error found type 0 at sector 1337928 expected 953c vs actual 9b3c
      [ 3520.900800] PI error found type 0 at sector 1337928 expected 9b3c vs actual a13c
      [ 3520.923968] blk_update_request: I/O error, dev sdai, sector 20150568
      [ 3520.943377] blk_update_request: I/O error, dev sdae, sector 20150568
      [ 3520.963067] blk_update_request: I/O error, dev dm-15, sector 20150568
      [ 3520.982436] Buffer I/O error on dev dm-15, logical block 2518821, lost async page write
      [ 3521.006511] Buffer I/O error on dev dm-15, logical block 2518822, lost async page write
      [ 3521.006558] blk_update_request: I/O error, dev dm-13, sector 20150568
      [ 3521.006559] Buffer I/O error on dev dm-13, logical block 2518821, lost async page write
      [ 3521.006563] Buffer I/O error on dev dm-13, logical block 2518822, lost async 
      
      device /dev/dm-15 mounted by lustre
      Filesystem volume name:   nbp10-OST001d
      Last mounted on:          /
      Filesystem UUID:          08b337bb-b3b1-48b0-925b-0bf5d3ba7253
      Filesystem magic number:  0xEF53
      Filesystem revision #:    1 (dynamic)
      Filesystem features:      has_journal ext_attr dir_index filetype needs_recovery extent 64bit mmp flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize quota
      Filesystem flags:         signed_directory_hash 
      Default mount options:    user_xattr acl
      Filesystem state:         clean
      Errors behavior:          Continue
      Filesystem OS type:       Linux
      Inode count:              9337344
      Block count:              19122880512
      Reserved block count:     0
      Free blocks:              19120188065
      Free inodes:              9337011
      First block:              0
      Block size:               4096
      Fragment size:            4096
      Group descriptor size:    64
      Blocks per group:         32768
      Fragments per group:      32768
      Inodes per group:         16
      Inode blocks per group:   2
      Flex block group size:    64
      Filesystem created:       Fri Jul 27 10:21:56 2018
      Last mount time:          Fri Jul 27 10:44:14 2018
      Last write time:          Fri Jul 27 10:44:15 2018
      Mount count:              4
      Maximum mount count:      -1
      Last checked:             Fri Jul 27 10:21:56 2018
      Check interval:           0 (<none>)
      Lifetime writes:          7774 kB
      Reserved blocks uid:      0 (user root)
      Reserved blocks gid:      0 (group root)
      First inode:              11
      Inode size:               512
      Required extra isize:     32
      Desired extra isize:      32
      Journal inode:            8
      Default directory hash:   half_md4
      Directory Hash Seed:      2ebd542d-9757-456f-b597-43fae5c542c0
      Journal backup:           inode blocks
      MMP block number:         2518821
      MMP update interval:      5
      User quota inode:         3
      Group quota inode:        4
      

      Note block with the error is the MMP block.

      Attachments

        1. dm20.hexdump
          1.76 MB
        2. trace.dat
          4.73 MB

        Issue Links

          Activity

            [LU-11187] MMP updated sometimes failes T10PI checks

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34063/
            Subject: LU-11187 ldiskfs: update rhel7.6 series
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: b5ad8a06a6b092e38800987debdba5b3e1ee8b29

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34063/ Subject: LU-11187 ldiskfs: update rhel7.6 series Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: b5ad8a06a6b092e38800987debdba5b3e1ee8b29

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34063
            Subject: LU-11187 ldiskfs: update rhel7.6 series
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 740c8b5b3b0c7419a53d84fd4d19ecffbbfd28f3

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34063 Subject: LU-11187 ldiskfs: update rhel7.6 series Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 740c8b5b3b0c7419a53d84fd4d19ecffbbfd28f3

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33336/
            Subject: LU-11187 ldiskfs: don't mark mmp buffer head dirty
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: d63cd9f9795848c03c5882b76e971dfcd00433e6

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33336/ Subject: LU-11187 ldiskfs: don't mark mmp buffer head dirty Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: d63cd9f9795848c03c5882b76e971dfcd00433e6

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33336
            Subject: LU-11187 ldiskfs: don't mark mmp buffer head dirty
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: d11dd446facea523803d4767b69c799286ef01f4

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33336 Subject: LU-11187 ldiskfs: don't mark mmp buffer head dirty Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: d11dd446facea523803d4767b69c799286ef01f4
            pjones Peter Jones added a comment -

            Landed for 2.12

            pjones Peter Jones added a comment - Landed for 2.12

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33038/
            Subject: LU-11187 ldiskfs: don't mark mmp buffer head dirty
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: dd02d32c978ad95c9e2a3703ad6be7511c257a4d

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33038/ Subject: LU-11187 ldiskfs: don't mark mmp buffer head dirty Project: fs/lustre-release Branch: master Current Patch Set: Commit: dd02d32c978ad95c9e2a3703ad6be7511c257a4d

            This does fix the issue in ldiskfs. We can move forward with the patch.

            mhanafi Mahmoud Hanafi added a comment - This does fix the issue in ldiskfs. We can move forward with the patch.
            dongyang Dongyang Li added a comment -

            so it worked for ldiskfs as well?

            Then I need to refresh the patch, we need to apply it to every supported distro, and I will push it to upstream.

            dongyang Dongyang Li added a comment - so it worked for ldiskfs as well? Then I need to refresh the patch, we need to apply it to every supported distro, and I will push it to upstream.

            Mahmoud said that patch worked in our environment.
            Before I cherry-pick the patch, are you sure you still want to name the patch "ldiskfs: add mmp debug patch"? The patchset 6 is no longer a debug patch.

            jaylan Jay Lan (Inactive) added a comment - Mahmoud said that patch worked in our environment. Before I cherry-pick the patch, are you sure you still want to name the patch "ldiskfs: add mmp debug patch"? The patchset 6 is no longer a debug patch.

            The patch work in vanilla centos7 and ext4. I will test ldiskfs next.

            mhanafi Mahmoud Hanafi added a comment - The patch work in vanilla centos7 and ext4. I will test ldiskfs next.

            People

              dongyang Dongyang Li
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: