Details

    • 3
    • 9223372036854775807

    Description

      During upgrade process from a old version, we saw errors

      Jun 13 18:16:48 282 kernel: LustreError: 24685:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0005-mdtlov: invalid FID [0x0:0x1cf60001:0x0]
      Jun 13 18:17:01 280 kernel: LustreError: 26388:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0003-mdtlov: invalid FID [0x0:0x2050100:0x0]
      Jun 13 18:20:30 279 kernel: LustreError: 26741:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0002-mdtlov: invalid FID [0x0:0x1cf60001:0x0]
      Jun 13 18:26:48 282 kernel: LustreError: 24685:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0005-mdtlov: invalid FID [0x0:0x1cf60001:0x0]
      Jun 13 18:27:01 280 kernel: LustreError: 26388:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0003-mdtlov: invalid FID [0x0:0x2050100:0x0]
      Jun 13 18:30:30 279 kernel: LustreError: 26741:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0002-mdtlov: invalid FID [0x0:0x1cf60001:0x0]
      Jun 13 19:03:39 281 kernel: LustreError: 17588:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0004-mdtlov: invalid FID [0x0:0x1d:0x0]
      Jun 13 19:03:41 281 kernel: LustreError: 17588:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0004-mdtlov: invalid FID [0x0:0x1d:0x0]
      Jun 13 19:03:43 281 kernel: LustreError: 17588:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0004-mdtlov: invalid FID [0x0:0x1d:0x0]
      

      Lustre logs shows the next info

      00000004:00080000:12.0:1560446017.274394:0:17588:0:(lod_dev.c:423:lod_sub_recovery_thread()) lustre02-MDT0005-osp-MDT0004 get update log failed -5, retry
      00000040:01000000:12.0:1560446017.274395:0:17588:0:(llog_osd.c:2027:llog_osd_get_cat_list()) cat list: disk size=192, read=32
      00000004:00020000:12.0:1560446017.274467:0:17588:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0004-mdtlov: invalid FID [0x0:0x1d:0x0]
      00000004:00080000:12.0:1560446017.274468:0:17588:0:(lod_dev.c:423:lod_sub_recovery_thread()) lustre02-MDT0005-osp-MDT0004 get update log failed -5, retry
      00000040:01000000:12.0:1560446017.274469:0:17588:0:(llog_osd.c:2027:llog_osd_get_cat_list()) cat list: disk size=192, read=32
      00000004:00020000:12.0:1560446017.274548:0:17588:0:(lod_dev.c:131:lod_fld_lookup()) lustre02-MDT0004-mdtlov: invalid FID [0x0:0x1d:0x0]
      00000004:00080000:12.0:1560446017.274549:0:17588:0:(lod_dev.c:423:lod_sub_recovery_thread()) lustre02-MDT0005-osp-MDT0004 get update log failed -5, retry
      00000040:01000000:12.0:1560446017.274549:0:17588:0:(llog_osd.c:2027:llog_osd_get_cat_list()) cat list: disk size=192, read=32
      
      00000040:00000001:1.0:1560447502.778180:0:17588:0:(llog_osd.c:1968:llog_osd_get_cat_list()) Process entered
      00000004:00000001:1.0:1560447502.778181:0:17588:0:(osp_object.c:546:osp_attr_get()) Process entered
      00000004:00000001:1.0:1560447502.778181:0:17588:0:(osp_object.c:556:osp_attr_get()) Process leaving (rc=0 : 0 : 0)
      00000040:01000000:1.0:1560447502.778182:0:17588:0:(llog_osd.c:2027:llog_osd_get_cat_list()) cat list: disk size=192, read=32
      ....
      00000004:00000040:1.0:1560447502.778187:0:17588:0:(osp_md_object.c:1148:osp_md_read()) lustre02-MDT0005-osp-MDT0004 [0x200000009:0x5:0x0] read offset 128 size 32
      ...
      00000040:00000001:1.0:1560447502.778489:0:17588:0:(llog_osd.c:2056:llog_osd_get_cat_list()) Process leaving (rc=0 : 0 : 0)
      ...
      00000004:00000040:1.0:1560447502.778495:0:17588:0:(mdt_handler.c:5441:mdt_object_init()) object init, fid = [0x0:0x1d:0x0]
      00000004:00000010:1.0:1560447502.778495:0:17588:0:(mdd_object.c:282:mdd_object_alloc()) slab-alloced 'mdd_obj': 96 at ffff880f5ce8cc60.
      00000004:00000001:1.0:1560447502.778496:0:17588:0:(mdt_handler.c:5450:mdt_object_init()) Process leaving (rc=0 : 0 : 0)
      

      the above was a remote read of update_log from MDT0005 on 282, 32 bytes on offset 128,
      however the received data is interpreted as FID=[0x0:0x1d:0x0], invalid fid and the operation fails.

      The part of update_log from MDT0005 fetched by MDT0004 :

      $ od -X -j 128 -N 32 -A d update_log
      0000128          00000000        00000003        0000001d        00000004
      0000144          0000001d        00000000        7473756c        32306572
      0000160
      

      The issue is reproducible,  reproducer do the next steps

      • umount MDTS/OSTs/Clients
      • for every MDT debugfs -w -R "rm update_log"
      • mount MDTS in parallel
      • mount OSTs and client
      • create some files
      • repeat from the beginning
        I used 4 MDTs. I guess more MDTs is better. Issue happens not often, one time in a hour.

      The root cause for update_log corruption.
      When the first write to file happen at some offset, the data before offset is not zeroed. It happens in rare case.
      The path is osd_ldiskfs_write_record()>__ldiskfs_bread()>ldiskfs_getblk()

         if (map.m_flags & LDISKFS_MAP_NEW) {
                      J_ASSERT(create != 0);
                      J_ASSERT(handle != NULL);
      
                      /*
                       * Now that we do not always journal data, we should
                       * keep in mind whether this should always journal the
                       * new buffer as metadata.  For now, regular file
                       * writes use ldiskfs_get_block instead, so it's not a
                       * problem.
                       */
                      lock_buffer(bh);
                      BUFFER_TRACE(bh, "call get_create_access");
                      fatal = ldiskfs_journal_get_create_access(handle, bh);
                      if (!fatal && !buffer_uptodate(bh)) {
                              memset(bh->b_data, 0, inode->i_sb->s_blocksize);
                              set_buffer_uptodate(bh);
                      }
                      unlock_buffer(bh);
                      BUFFER_TRACE(bh, "call ldiskfs_handle_dirty_metadata");
                      err = ldiskfs_handle_dirty_metadata(handle, inode, bh);
                      if (!fatal)
                              fatal = err;
              } else {
       

       Some times NEW bh is uptodate, in this case data is not zeroed, and we get corruption for update_log.

      Attachments

        Issue Links

          Activity

            [LU-12593] update_log corruption

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37445/
            Subject: LU-12593 osd: up i_append_sem during errors
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: f223dd255a4bb884b6013f3b69cb24c1da6c5d27

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37445/ Subject: LU-12593 osd: up i_append_sem during errors Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: f223dd255a4bb884b6013f3b69cb24c1da6c5d27

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37406/
            Subject: LU-12593 osd: up i_append_sem during errors
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 7599dd3d20d6bb4ee89634c5a76730481ca62470

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37406/ Subject: LU-12593 osd: up i_append_sem during errors Project: fs/lustre-release Branch: master Current Patch Set: Commit: 7599dd3d20d6bb4ee89634c5a76730481ca62470

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37445
            Subject: LU-12593 osd: up i_append_sem during errors
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: dd89b75768028123f38b40cd8a048d18bcc8f8cd

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37445 Subject: LU-12593 osd: up i_append_sem during errors Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: dd89b75768028123f38b40cd8a048d18bcc8f8cd

            Alexandr Boyko (c17825@cray.com) uploaded a new patch: https://review.whamcloud.com/37406
            Subject: LU-12593 osd: up i_append_sem during errors
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 3fe8b86ae9f2061d26e0328f39f63012849f212d

            gerrit Gerrit Updater added a comment - Alexandr Boyko (c17825@cray.com) uploaded a new patch: https://review.whamcloud.com/37406 Subject: LU-12593 osd: up i_append_sem during errors Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3fe8b86ae9f2061d26e0328f39f63012849f212d

            It looks like there is a bug in this patch pointed out by Misc Code Checks Robot that it is leaking i_alloc_sem in some error paths that may result in an OST deadlock in some cases, see my comments in https://review.whamcloud.com/#/c/36976/2/lustre/osd-ldiskfs/osd_io.c,unified

            adilger Andreas Dilger added a comment - It looks like there is a bug in this patch pointed out by Misc Code Checks Robot that it is leaking i_alloc_sem in some error paths that may result in an OST deadlock in some cases, see my comments in https://review.whamcloud.com/#/c/36976/2/lustre/osd-ldiskfs/osd_io.c,unified

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36709/
            Subject: LU-12593 osd: zeroing a freshly allocated block buffer
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: e196376e841113c47423639b0f5f09f46cdfa25c

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36709/ Subject: LU-12593 osd: zeroing a freshly allocated block buffer Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: e196376e841113c47423639b0f5f09f46cdfa25c

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36709
            Subject: LU-12593 osd: zeroing a freshly allocated block buffer
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: a468cb238833ee6f1cf854f58f3d6b0123d2be69

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36709 Subject: LU-12593 osd: zeroing a freshly allocated block buffer Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: a468cb238833ee6f1cf854f58f3d6b0123d2be69

            osd_ldiskfs_write_record() -> __ldiskfs_bread() -> ldiskfs_bread() -> ldiskfs_getblk() -> ldiskfs_map_blocks() - so far it's common, no?

            bzzz Alex Zhuravlev added a comment - osd_ldiskfs_write_record() -> __ldiskfs_bread() -> ldiskfs_bread() -> ldiskfs_getblk() -> ldiskfs_map_blocks() - so far it's common, no?

            Yes, I saw it. It does what we need, but for extents. osd_ldiskfs_write_record is based on another codepath.

            aboyko Alexander Boyko added a comment - Yes, I saw it. It does what we need, but for extents. osd_ldiskfs_write_record is based on another codepath.

            Alexandr, have you seen LDISKFS_GET_BLOCKS_ZERO ? looks like it does what we needed?

            bzzz Alex Zhuravlev added a comment - Alexandr, have you seen LDISKFS_GET_BLOCKS_ZERO ? looks like it does what we needed?

            People

              aboyko Alexander Boyko
              aboyko Alexander Boyko
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: