Details

    • Technical task
    • Resolution: Fixed
    • Major
    • Lustre 2.8.0
    • Lustre 2.8.0
    • 8902

    Description

      During recovery, if one update replay is failed, all of related updates might be failed in the following replay process. For example, client1 creates a remote directory on MDT1, and its name entry is on MDT0, then other clients will create files under this remote directory on MDT1. If MDT0 fails and the name entry insertion has not been committed to disk yet. If the recovery is failed for some reason, i.e. the directory is not being connected to the name space at all, all of the files under this directory will not be able to be accessed. To avoid this, commit on share will be applied to cross-MDT operation. i.e. If the MDT finds the object being updated was modified by some previous cross-MDT operation, this cross-MDT operation needs to be committed first. So in the previous example, before creating any files under remote directory, the creation of the remote directory must be committed to disk first.
      Commit on Share (COS) will be implemented by COS lock, which is similar as current local COS implementation. During cross-MDT operation, all locks of remote objects(remote locks) will be hold on the master MDT, and all of remote locks will be COS lock. If these COS locks are being revoked, the master MDT will not only do sync on itself, but also do sync on remote MDTs.

      Attachments

        Issue Links

          Activity

            [LU-3538] commit on share for cross-MDT operation.

            Landed for 2.8

            jgmitter Joseph Gmitter (Inactive) added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12530/
            Subject: LU-3538 dne: Commit-on-Sharing for DNE
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: b50bb830f92e87da9bfdc84d14e4f3f78c80c9ac

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12530/ Subject: LU-3538 dne: Commit-on-Sharing for DNE Project: fs/lustre-release Branch: master Current Patch Set: Commit: b50bb830f92e87da9bfdc84d14e4f3f78c80c9ac

            Lai, can you please post the performance results into this ticket.

            adilger Andreas Dilger added a comment - Lai, can you please post the performance results into this ticket.

            Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/17304
            Subject: LU-3538 cos: reuse cos lock if compatible
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 825bf4d3c71d9c3bcd583db15e834b7839a7f258

            gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/17304 Subject: LU-3538 cos: reuse cos lock if compatible Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 825bf4d3c71d9c3bcd583db15e834b7839a7f258

            Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/16984
            Subject: LU-3538 cos: CoS takes effect for cross-MDT op only now
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: db7a784fcb8e594896d4f624d409e6acfb700308

            gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/16984 Subject: LU-3538 cos: CoS takes effect for cross-MDT op only now Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: db7a784fcb8e594896d4f624d409e6acfb700308

            The patches for this ticket need to be refreshed for landing on master for 2.8.0.

            adilger Andreas Dilger added a comment - The patches for this ticket need to be refreshed for landing on master for 2.8.0.
            di.wang Di Wang added a comment -

            I just run a few tests on OpenSFS to see if COS impact
            the performance.

            8 clients, 4 MDS (8MDTs) and 2 OSSs(4 OSTs)

            According to the result, the performance is not impacted

            mpirun -np 64 -machinefile /home/di.wang/machine_file
            /usr/lib64/lustre/tests/mdsrate ―-mknod --nfiles 1048576 --dir
            /mnt/lustre/tests --filefmt 'f%%d'
            mpirun -np 64 -machinefile /home/di.wang/machine_file
            /usr/lib64/lustre/tests/mdsrate --unlink --nfiles 1048576 --dir
            /mnt/lustre/tests --filefmt 'f%%d'

            With the patch

            Rate: 23998.42 eff 23997.34 aggr 374.96 avg client mknods/sec (total: 64
            threads 1048576 mknods 1 dirs 64 threads/dir 43.69 secs)
            0: c01 finished at Tue Sep 22 20:30:24 2015

            Rate: 7637.39 eff 7637.19 aggr 119.33 avg client unlinks/sec (total: 64
            threads 1048576 unlinks 1 dirs 64 threads/dir 137.30 secs)
            0: c01 finished at Tue Sep 22 20:33:51 2015

            Without the patch

            0: c01 starting at Tue Sep 22 22:03:45 2015
            Rate: 24108.73 eff 24106.99 aggr 376.67 avg client mknods/sec (total: 64
            threads 1048576 mknods 1 dirs 64 threads/dir 43.49 secs)
            0: c01 finished at Tue Sep 22 22:04:29 2015

            Rate: 7505.92 eff 7505.87 aggr 117.28 avg client unlinks/sec (total: 64
            threads 1048576 unlinks 1 dirs 64 threads/dir 139.70 secs)
            0: c01 finished at Tue Sep 22 22:07:54 2015

            di.wang Di Wang added a comment - I just run a few tests on OpenSFS to see if COS impact the performance. 8 clients, 4 MDS (8MDTs) and 2 OSSs(4 OSTs) According to the result, the performance is not impacted mpirun -np 64 -machinefile /home/di.wang/machine_file /usr/lib64/lustre/tests/mdsrate ―-mknod --nfiles 1048576 --dir /mnt/lustre/tests --filefmt 'f%%d' mpirun -np 64 -machinefile /home/di.wang/machine_file /usr/lib64/lustre/tests/mdsrate --unlink --nfiles 1048576 --dir /mnt/lustre/tests --filefmt 'f%%d' With the patch Rate: 23998.42 eff 23997.34 aggr 374.96 avg client mknods/sec (total: 64 threads 1048576 mknods 1 dirs 64 threads/dir 43.69 secs) 0: c01 finished at Tue Sep 22 20:30:24 2015 Rate: 7637.39 eff 7637.19 aggr 119.33 avg client unlinks/sec (total: 64 threads 1048576 unlinks 1 dirs 64 threads/dir 137.30 secs) 0: c01 finished at Tue Sep 22 20:33:51 2015 Without the patch 0: c01 starting at Tue Sep 22 22:03:45 2015 Rate: 24108.73 eff 24106.99 aggr 376.67 avg client mknods/sec (total: 64 threads 1048576 mknods 1 dirs 64 threads/dir 43.49 secs) 0: c01 finished at Tue Sep 22 22:04:29 2015 Rate: 7505.92 eff 7505.87 aggr 117.28 avg client unlinks/sec (total: 64 threads 1048576 unlinks 1 dirs 64 threads/dir 139.70 secs) 0: c01 finished at Tue Sep 22 22:07:54 2015
            di.wang Di Wang added a comment -

            Hmm, The patch seems ok, and right now we want to see how bad it will impact the performance before land it. But I never get time to run the performance test with the patch.

            Richard: Could you please find some one run a performance test to compare this build https://build.hpdd.intel.com/job/lustre-reviews/34289/ (build with COS) and https://build.hpdd.intel.com/job/lustre-master/3187/ (current master) ?

            di.wang Di Wang added a comment - Hmm, The patch seems ok, and right now we want to see how bad it will impact the performance before land it. But I never get time to run the performance test with the patch. Richard: Could you please find some one run a performance test to compare this build https://build.hpdd.intel.com/job/lustre-reviews/34289/ (build with COS) and https://build.hpdd.intel.com/job/lustre-master/3187/ (current master) ?

            Is this targeted for 2.8 or 2.9?

            simmonsja James A Simmons added a comment - Is this targeted for 2.8 or 2.9?

            Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/16140
            Subject: LU-3538 cos: combo patch for DNE COS support
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 06d51df4e17491db55ddb0c6843f0af2a71d4e4a

            gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/16140 Subject: LU-3538 cos: combo patch for DNE COS support Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 06d51df4e17491db55ddb0c6843f0af2a71d4e4a

            People

              laisiyao Lai Siyao
              di.wang Di Wang
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: