async update cross-MDTs
(LU-3534)
|
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Technical task | Priority: | Major |
| Reporter: | Di Wang | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | dne2 | ||
| Issue Links: |
|
||||||||||||
| Rank (Obsolete): | 8902 | ||||||||||||
| Description |
|
During recovery, if one update replay is failed, all of related updates might be failed in the following replay process. For example, client1 creates a remote directory on MDT1, and its name entry is on MDT0, then other clients will create files under this remote directory on MDT1. If MDT0 fails and the name entry insertion has not been committed to disk yet. If the recovery is failed for some reason, i.e. the directory is not being connected to the name space at all, all of the files under this directory will not be able to be accessed. To avoid this, commit on share will be applied to cross-MDT operation. i.e. If the MDT finds the object being updated was modified by some previous cross-MDT operation, this cross-MDT operation needs to be committed first. So in the previous example, before creating any files under remote directory, the creation of the remote directory must be committed to disk first. |
| Comments |
| Comment by Di Wang [ 19/Sep/14 ] |
|
http://review.whamcloud.com/#/c/8356/ This is the patch I did a year ago, may not work anymore, but may give you some ideas. |
| Comment by Lai Siyao [ 10/Nov/14 ] |
|
Patch is on http://review.whamcloud.com/#/c/12530/ |
| Comment by Gerrit Updater [ 19/Dec/14 ] |
|
Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/13139 |
| Comment by Gerrit Updater [ 04/Aug/15 ] |
|
Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/15844 |
| Comment by Gerrit Updater [ 31/Aug/15 ] |
|
Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/16140 |
| Comment by James A Simmons [ 17/Sep/15 ] |
|
Is this targeted for 2.8 or 2.9? |
| Comment by Di Wang [ 17/Sep/15 ] |
|
Hmm, The patch seems ok, and right now we want to see how bad it will impact the performance before land it. But I never get time to run the performance test with the patch. Richard: Could you please find some one run a performance test to compare this build https://build.hpdd.intel.com/job/lustre-reviews/34289/ (build with COS) and https://build.hpdd.intel.com/job/lustre-master/3187/ (current master) ? |
| Comment by Di Wang [ 23/Sep/15 ] |
|
I just run a few tests on OpenSFS to see if COS impact 8 clients, 4 MDS (8MDTs) and 2 OSSs(4 OSTs) According to the result, the performance is not impacted mpirun -np 64 -machinefile /home/di.wang/machine_file With the patch Rate: 23998.42 eff 23997.34 aggr 374.96 avg client mknods/sec (total: 64 Rate: 7637.39 eff 7637.19 aggr 119.33 avg client unlinks/sec (total: 64 Without the patch 0: c01 starting at Tue Sep 22 22:03:45 2015 Rate: 7505.92 eff 7505.87 aggr 117.28 avg client unlinks/sec (total: 64 |
| Comment by Andreas Dilger [ 28/Sep/15 ] |
|
The patches for this ticket need to be refreshed for landing on master for 2.8.0. |
| Comment by Gerrit Updater [ 29/Oct/15 ] |
|
Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/16984 |
| Comment by Gerrit Updater [ 20/Nov/15 ] |
|
Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/17304 |
| Comment by Andreas Dilger [ 04/Dec/15 ] |
|
Lai, can you please post the performance results into this ticket. |
| Comment by Gerrit Updater [ 28/Jan/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12530/ |
| Comment by Joseph Gmitter (Inactive) [ 28/Jan/16 ] |
|
Landed for 2.8 |