Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11089

Performance improvements for lu_object locking

Details

    • Improvement
    • Resolution: Fixed
    • Major
    • Lustre 2.13.0, Lustre 2.12.5
    • Lustre 2.12.0
    • None
    • 9223372036854775807

    Description

      While porting the LU-6800 work upstream the reaction to the approach was disliked since it wasn't a real improvement. Neil has created a patch series to break up the global lock to increase its performance.

      Attachments

        Issue Links

          Activity

            [LU-11089] Performance improvements for lu_object locking

            James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33673
            Subject: LU-11089 obd: rename lu_keys_guard to lu_context_remembered_guard
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0c790df73f57138e554326d95566de04618c1e93

            gerrit Gerrit Updater added a comment - James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33673 Subject: LU-11089 obd: rename lu_keys_guard to lu_context_remembered_guard Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0c790df73f57138e554326d95566de04618c1e93

            James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33668
            Subject: LU-11089 obd: remove lock from key register/degister
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 837c5d44d10daf04353e900939d9977feed064c5

            gerrit Gerrit Updater added a comment - James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33668 Subject: LU-11089 obd: remove lock from key register/degister Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 837c5d44d10daf04353e900939d9977feed064c5

            James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33667
            Subject: LU-11089 obd: use wait_event_var() in lu_context_key_degister()
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 37bb534e3779c4cfcf46a0206583ce3a88be69d1

            gerrit Gerrit Updater added a comment - James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33667 Subject: LU-11089 obd: use wait_event_var() in lu_context_key_degister() Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 37bb534e3779c4cfcf46a0206583ce3a88be69d1

            Patrick, sorry! forgot to update this ticket. Here is new ticket. LU-11624.

            sihara Shuichi Ihara added a comment - Patrick, sorry! forgot to update this ticket. Here is new ticket. LU-11624 .

            Ihara,

            Could you link that ticket here?  I'm interested in tracking it.  Our MDSses running 2.12 are crashing when we fail them over under load.  Pretty reliably.

            paf Patrick Farrell (Inactive) added a comment - Ihara, Could you link that ticket here?  I'm interested in tracking it.  Our MDSses running 2.12 are crashing when we fail them over under load.  Pretty reliably.

            Thanks. I have a patch based on LU-8130 work that should fix this.

            simmonsja James A Simmons added a comment - Thanks. I have a patch based on LU-8130 work that should fix this.

            James, crashing servers were not related to your patches (LU-11089), but looks like more general problem in master. me open new jira ticket for this.

            sihara Shuichi Ihara added a comment - James, crashing servers were not related to your patches ( LU-11089 ), but looks like more general problem in master. me open new jira ticket for this.
            pjones Peter Jones added a comment -

            Could we please have a separate ticket for any instances seen on 2.12 or earlier releases without James's unlanded patches being applied? Is there any suggestion that this is happening more frequently on 2.12 compared to 2.11 and earlier releases?

            pjones Peter Jones added a comment - Could we please have a separate ticket for any instances seen on 2.12 or earlier releases without James's unlanded patches being applied? Is there any suggestion that this is happening more frequently on 2.12 compared to 2.11 and earlier releases?

            Ah, I see now that none of the patches have landed.  So it is definitely a pre-existing bug.  Interesting.

            paf Patrick Farrell (Inactive) added a comment - Ah, I see now that none of the patches have landed.  So it is definitely a pre-existing bug.  Interesting.

            Seen here during recovery as well.  Interesting.  I imagine even if the bug was already there, the changes made it easier to hit.  (Doesn't mean the changes are wrong, just that there's probably a reason we're suddenly seeing it.)

            paf Patrick Farrell (Inactive) added a comment - Seen here during recovery as well.  Interesting.  I imagine even if the bug was already there, the changes made it easier to hit.  (Doesn't mean the changes are wrong, just that there's probably a reason we're suddenly seeing it.)
            simmonsja James A Simmons added a comment - - edited

            Since the NID hash seems to be broken in general I did a port to rhashtables. Still need to work on the /proc entries to display hash stats. Please try it out to see if no longer crashes your nodes. Patch is at:

            https://review.whamcloud.com/#/c/33518

            The build breakage is only on SLES12SP3.

            simmonsja James A Simmons added a comment - - edited Since the NID hash seems to be broken in general I did a port to rhashtables. Still need to work on the /proc entries to display hash stats. Please try it out to see if no longer crashes your nodes. Patch is at: https://review.whamcloud.com/#/c/33518 The build breakage is only on SLES12SP3.

            People

              simmonsja James A Simmons
              simmonsja James A Simmons
              Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: