[LU-21] Allow parallel modifying metadata operations on MDS Created: 06/Dec/10 Updated: 08/Dec/12 Resolved: 08/Dec/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.6 |
| Fix Version/s: | Lustre 2.1.0 |
| Type: | Improvement | Priority: | Major |
| Reporter: | Oleg Drokin | Assignee: | Oleg Drokin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Epic: | performance |
| Rank (Obsolete): | 10451 |
| Description |
|
Description of the problem: Proposed solution: Introduce an extra 'hash' property for inodebit locks. The users will pass it along with bits they want to lock. The locks with intersecting bits and incompatible lock modes would only be declared incompatible and block each other if the hash values match, or any of the locks have the special hash value of zero. Downsides: Still in the absence of COS logic the end-result should be better compared to current behavior as a forward progress could be made on multiple CPUs. Alternatives: Out of scope: Useful information: |
| Comments |
| Comment by Oleg Drokin [ 06/Dec/10 ] |
|
Additional concern with such a project for 1.8 that I envision is Oracle would be very unlikely to pick an improvement like that into their 1.8 release. |
| Comment by Liang Zhen (Inactive) [ 06/Dec/10 ] |
|
Oleg, nice to see you, Out of scope: You probably remember I did some metadata performance test over ldiskfs directly several months ago, results show pdirops patch can help on file creation, it's a nice increasing but not a dramatically boost because I think there are still quite a lot serialized operations in diskfs operations (i.e: contention on buffer_head, contention on dynlock itself which is used by pdirops patch), also, file removal performance will drop significantly (I think parallel removal will increase chance of disk seek). I'm worrying things can be worse on super large SMP system. How do you think? As you know, I'm working on MDD dir-cache + schedulers to localize FS operations on a few cores and reduce data pingpong(already got a working prototype), however, we don't have such a layer on 1.8, do you think it would be possible to build a very simple layer like that on 1.8 instead of using pdirops patch? Liang |
| Comment by Oleg Drokin [ 06/Dec/10 ] |
|
The back-end fs modifications are out of scope just because this is a totally different kind of work that could be worked on and tracked separately if desired. The idea for now is to boost 1.8 performance to reach certain goals and just allowing us to scale in shared create/unlink case is going to significantly boost this usecase. Perhaps significantly enough that no further improvements in 1.8 codebase would be necessary. I agree big smp systems might be a concern. How big of a concern is to be determined by testing. As for porting mdd layer, - let's see the numbers from just allowing MDS threads to scale into the ldiskfs and see if we need any more performance or not. If we do need, then we can start looking into what additional measures we can implement. It's not like we have infinite resources to push forward multiple releases for infinite time, unfortunately, so let's carefully pick our priorities. |
| Comment by Oleg Drokin [ 08/Dec/12 ] |
|
This is not going to be fixed in 1.8 after all. |