Details

    • 3
    • 9223372036854775807

    Description

      We faced multiple missing OST objects mdtest job failure during failover/failback test(stripe count 2).

      V-1: Entering unique_dir_access...
      V-1: Entering mdtest_stat...
      08/14/2018 21:46:25: Process 25(nid00016): FAILED in mdtest_stat, unable to stat file: No such file or directory
      08/14/2018 21:46:25: Process 30(nid00045): FAILED in mdtest_stat, unable to stat file: No such file or directory

      Failure occurred because of absent the range of objects on one of the OSTs.

      The marker of the failure could be the following message after recovery on OST:

      Aug 14 21:46:07 snx11205n004 kernel: format at ofd_dev.c:1713:ofd_create_hdl doesn't end in newline 

      Attachments

        Issue Links

          Activity

            [LU-11760] formatted OST recognition change

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35951/
            Subject: LU-11760 ofd: limit num of objects to create in 1 transaction
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 963559b3087bcbb0bdd541c983085eff7feca882

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35951/ Subject: LU-11760 ofd: limit num of objects to create in 1 transaction Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 963559b3087bcbb0bdd541c983085eff7feca882

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35951
            Subject: LU-11760 ofd: limit num of objects to create in 1 transaction
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: e08f46ed1e8fc4daf8ac092ca305438ea354ec24

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35951 Subject: LU-11760 ofd: limit num of objects to create in 1 transaction Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: e08f46ed1e8fc4daf8ac092ca305438ea354ec24
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35373/
            Subject: LU-11760 ofd: limit num of objects to create in 1 transaction
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 4485ee8be4cf224e2543f6344efc6e1cb295a0a7

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35373/ Subject: LU-11760 ofd: limit num of objects to create in 1 transaction Project: fs/lustre-release Branch: master Current Patch Set: Commit: 4485ee8be4cf224e2543f6344efc6e1cb295a0a7

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35388/
            Subject: Revert "LU-11760 ofd: formatted OST recognition change"
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 8065d44c0a2b29885ca429674ccab7785d2db08b

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35388/ Subject: Revert " LU-11760 ofd: formatted OST recognition change" Project: fs/lustre-release Branch: master Current Patch Set: Commit: 8065d44c0a2b29885ca429674ccab7785d2db08b

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35388
            Subject: Revert "LU-11760 ofd: formatted OST recognition change"
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 563277b8d728c45bd89074c07a22f1432beea344

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35388 Subject: Revert " LU-11760 ofd: formatted OST recognition change" Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 563277b8d728c45bd89074c07a22f1432beea344

            Sergey Cheremencev (c17829@cray.com) uploaded a new patch: https://review.whamcloud.com/35373
            Subject: LU-11760 ofd: limit num of objects to create in 1 transaction
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e75f4977695657710a9144a31a0d149fa1925ef8

            gerrit Gerrit Updater added a comment - Sergey Cheremencev (c17829@cray.com) uploaded a new patch: https://review.whamcloud.com/35373 Subject: LU-11760 ofd: limit num of objects to create in 1 transaction Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e75f4977695657710a9144a31a0d149fa1925ef8

            I don't mind to have a change that is increasing the maximum number of objects created per commit, but this needs to be negotiated between the MDS and OSS at connect time, with a new OBD_CONNECT2_MAX_PRECREATE flag and an ocd_max_precreate field (probably using __u32 padding1) that passes the current OST_MAX_PRECREATE value, and use OST_MAX_PRECREATE_OLD = 20000 if the feature is not available. This also allows fixing the problem properly in older releases.

            If we increase OST_MAX_PRECREATE on MDS also, I am afraid MDS will ask OST to precreate too rare. It may cause several different problems. For example OST after failover needs to precreate more objects that takes extra time(500 000 instead of 20 000).

            I suggest to use the 2nd approach with commit callback and revert https://review.whamcloud.com/33833.
            I will push a patch.

            scherementsev Sergey Cheremencev added a comment - I don't mind to have a change that is increasing the maximum number of objects created per commit, but this needs to be negotiated between the MDS and OSS at connect time, with a new  OBD_CONNECT2_MAX_PRECREATE  flag and an  ocd_max_precreate  field (probably using  __u32 padding1 ) that passes the current  OST_MAX_PRECREATE  value, and use  OST_MAX_PRECREATE_OLD = 20000  if the feature is not available. This also allows fixing the problem properly in older releases. If we increase OST_MAX_PRECREATE on MDS also, I am afraid MDS will ask OST to precreate too rare. It may cause several different problems. For example OST after failover needs to precreate more objects that takes extra time(500 000 instead of 20 000). I suggest to use the 2nd approach with commit callback and revert https://review.whamcloud.com/33833 . I will push a patch.
            adilger Andreas Dilger added a comment - - edited

            I think that this is not a full solution to the problem, and is causing a lot of test failures for conf-sanity test_69 for OSTs that do not have 500k free inodes.

            I think there are two problems with the patch that was landed:

            • this essentially is changing OST_MAX_PRECREATE on the OST side, but not on the MDS, so isn't properly handling the recovery
            • the root of the problem is not the number of objects created per RPC but rather the number of objects created per commit

            I don't mind to have a change that is increasing the maximum number of objects created per commit, but this needs to be negotiated between the MDS and OSS at connect time, with a new OBD_CONNECT2_MAX_PRECREATE flag and an ocd_max_precreate field (probably using __u32 padding1) that passes the current OST_MAX_PRECREATE value, and use OST_MAX_PRECREATE_OLD = 20000 if the feature is not available. This also allows fixing the problem properly in older releases.

            Until this is fixed (and even afterward), the proper solution is to track on the OST how many objects have been created by each MDT/sequence within the current transaction (use a commit callback to reset the "created this transaction" counter to zero), and force a sync journal commit during precreate if the number exceeds OST_MAX_PRECREATE. If there are multiple MDTs creating, or the create rate is not too large, or if the clients do some IO and force a transaction commit anyway, then there is no need for a commit. This also avoids the future bug when 500k+ precreates can happen within a single commit, since we are already over 150k/s creates on the MDS, and that is processing separate RPCs while the OST is in a fast local loop precreating objects.

            I was going to say that the if (diff > 500000) check in ofd_create_hdl() could be modified to also check if LAST_ID < 5 or similar, to detect if this is a newly-formatted filesystem, but I realize that this case may also happen if e.g. the OST is restored from a backup or a snapshot and has an old LAST_ID but is not new. It would be useful to update the comment to reflect this.

            adilger Andreas Dilger added a comment - - edited I think that this is not a full solution to the problem, and is causing a lot of test failures for conf-sanity test_69 for OSTs that do not have 500k free inodes. I think there are two problems with the patch that was landed: this essentially is changing OST_MAX_PRECREATE on the OST side, but not on the MDS, so isn't properly handling the recovery the root of the problem is not the number of objects created per RPC but rather the number of objects created per commit I don't mind to have a change that is increasing the maximum number of objects created per commit, but this needs to be negotiated between the MDS and OSS at connect time, with a new OBD_CONNECT2_MAX_PRECREATE flag and an ocd_max_precreate field (probably using __u32 padding1 ) that passes the current OST_MAX_PRECREATE value, and use OST_MAX_PRECREATE_OLD = 20000 if the feature is not available. This also allows fixing the problem properly in older releases. Until this is fixed (and even afterward), the proper solution is to track on the OST how many objects have been created by each MDT/sequence within the current transaction (use a commit callback to reset the "created this transaction" counter to zero), and force a sync journal commit during precreate if the number exceeds OST_MAX_PRECREATE . If there are multiple MDTs creating, or the create rate is not too large, or if the clients do some IO and force a transaction commit anyway, then there is no need for a commit. This also avoids the future bug when 500k+ precreates can happen within a single commit, since we are already over 150k/s creates on the MDS, and that is processing separate RPCs while the OST is in a fast local loop precreating objects. I was going to say that the if (diff > 500000) check in ofd_create_hdl() could be modified to also check if LAST_ID < 5 or similar, to detect if this is a newly-formatted filesystem, but I realize that this case may also happen if e.g. the OST is restored from a backup or a snapshot and has an old LAST_ID but is not new. It would be useful to update the comment to reflect this.
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            People

              scherementsev Sergey Cheremencev
              scherementsev Sergey Cheremencev
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: