Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7915

investigate heuristics for SPARK client getting MDS openlock

Details

    • Improvement
    • Resolution: Duplicate
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      During discussion with LBNL on their Hadoop Spark project, they said that there was considerable overhead when running on Lustre because of repeated open+close of the same files causing extra RPC traffic to the MDS.

      Lustre has the ability to cache opens on the client with a DLM openlock, but this isn't done for regular opens by default because it has extra overhead compared to uncached opens, but only for NFS opens because the knfsd repeatedly opens the same file.

      It would be worthwhile to firstly implement a tunable to enable opencache on a per-client basis (LU-5426) and then measure the performance impact of this tunable for normal usage and for Spark.

      Attachments

        Issue Links

          Activity

            [LU-7915] investigate heuristics for SPARK client getting MDS openlock

            Fixed via LU-10948

            adilger Andreas Dilger added a comment - Fixed via LU-10948

            Emoly Liu (emoly.liu@intel.com) uploaded a new patch: http://review.whamcloud.com/19664
            Subject: LU-7915 mdc: add a tunable to enable opencache
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 770f30f801417bcc24b39f0f58fe0abe35bc2bcd

            gerrit Gerrit Updater added a comment - Emoly Liu (emoly.liu@intel.com) uploaded a new patch: http://review.whamcloud.com/19664 Subject: LU-7915 mdc: add a tunable to enable opencache Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 770f30f801417bcc24b39f0f58fe0abe35bc2bcd
            adilger Andreas Dilger added a comment - The following pages describe their testing: http://crd.lbl.gov/departments/computer-science/CLaSS/staff/costin-iancu/intel-parallel-computing-center-big-data-support-on-hpc-systems/ http://crd.lbl.gov/assets/pubs_presos/spark-on-cray.pdf
            emoly.liu Emoly Liu added a comment -

            Can anybody show me any more details about this Spark performance testing? e.g. Spark version, benchmark tool, lustre version and how large scale. Thanks.

            emoly.liu Emoly Liu added a comment - Can anybody show me any more details about this Spark performance testing? e.g. Spark version, benchmark tool, lustre version and how large scale. Thanks.
            jgmitter Joseph Gmitter (Inactive) added a comment - - edited

            Hi Emoly,

            Could you please have a look at measuring the performance as indicated by Andreas' final comment:

            It would be worthwhile to firstly implement a tunable to enable opencache on a per-client basis (LU-5426) and then measure the performance impact of this tunable for normal usage and for Spark.
            

            Thanks.
            Joe

            jgmitter Joseph Gmitter (Inactive) added a comment - - edited Hi Emoly, Could you please have a look at measuring the performance as indicated by Andreas' final comment: It would be worthwhile to firstly implement a tunable to enable opencache on a per-client basis (LU-5426) and then measure the performance impact of this tunable for normal usage and for Spark. Thanks. Joe

            Hi adilger
            Could you please ask more details on their Spark project? I run benchmark test on Spark and I didn't notice this issue. I notice an extensive use of getxattr() solved by the new (2.5) lustre local xattr cache. Thank you

            gabriele.paciucci Gabriele Paciucci (Inactive) added a comment - Hi adilger Could you please ask more details on their Spark project? I run benchmark test on Spark and I didn't notice this issue. I notice an extensive use of getxattr() solved by the new (2.5) lustre local xattr cache. Thank you

            People

              emoly.liu Emoly Liu
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: