[LU-7861] MDS Contention during unlinks due to llog spinlock Created: 10/Mar/16 Updated: 09/Feb/17 Resolved: 22/Jun/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.5 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Matt Ezell | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
2.5.5-g1241c21-CHANGED-2.6.32-573.12.1.el6.atlas.x86_64 |
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
We see intermittent periods of interactive "slowness" on our production systems. Looking on the MDS, load can go quite high. Running perf we see that osp_sync_add_rec is chewing a lot of time in a spin_lock. I believe this is serially adding the unlink records to the llog so the OST objects will be removed. - 29.60% 29.60% [kernel] [k] _spin_lock ▒ - _spin_lock ▒ - 77.51% osp_sync_add_rec ▒ osp_sync_add ▒ + 7.79% task_rq_lock ▒ + 6.68% try_to_wake_up ▒ + 1.32% osp_statfs ▒ + 1.26% kmem_cache_free ▒ + 1.12% cfs_percpt_lock I used jobstats to confirm that we had at least two jobs doing a significant number of unlinks at the time. When multiple MDS threads attempt to do unlinks they serialize, but they spin and block the CPUs in the meantime. I believe the following code is responsible: osp_sync.c:421 spin_lock(&d->opd_syn_lock);
d->opd_syn_changes++;
spin_unlock(&d->opd_syn_lock);
How can we improve this situation:
|
| Comments |
| Comment by James A Simmons [ 10/Mar/16 ] |
|
Looking at the use of opd_syn_changes I noticed in several places it is not protected by opd_syn_lock. |
| Comment by Joseph Gmitter (Inactive) [ 10/Mar/16 ] |
|
Hi Alex, |
| Comment by Alex Zhuravlev [ 15/Mar/16 ] |
|
i'm trying to reproduce the case. also, I don't think osp_sync_add_rec() is the issue itself, I'd rather suspect osp_sync_inflight_conflict() .. |
| Comment by Matt Ezell [ 16/Mar/16 ] |
|
Hi Alex- b2_5_fe doesn't have osp_sync_inflight_conflict(). Let us know if there is any additional information we can provide to help. Thanks, |
| Comment by Alex Zhuravlev [ 29/Mar/16 ] |
|
well, I can't reproduce that locally, but I've got a proto patch which is in testing now. |
| Comment by Gerrit Updater [ 30/Mar/16 ] |
|
Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/19211 |
| Comment by Alex Zhuravlev [ 25/Apr/16 ] |
|
Matt, can you tell how did you measure (or noticed) that slowness? the patch above should improve this specific case as there will be less contention on that lock, but it'd be great to have a reproducer. my understanding is that you were running few jobs, then two jobs were doing lots of unlinks concurrently (many clients involved), right? then some jobs doing something different (open/create, ls, stat?) were getting noticable higher latency? |
| Comment by Matt Ezell [ 26/Apr/16 ] |
|
Your description of the situation is accurate. I would guess this would be hard to reproduce on a small system. With 2.5, you only get a single metadata-modifying RPC per client. You might want to either do multiple mounts per client, set fail_loc=0x804, or try a newer server that supports I'm not sure we have a good reproducer, since we observed this due to user behavior in production. I would expect that a parallel mdtest (especially if the files are pre-created and you just use -r) would show this. |
| Comment by Gerrit Updater [ 22/Jun/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19211/ |
| Comment by Joseph Gmitter (Inactive) [ 22/Jun/16 ] |
|
Patch has landed to master for 2.9.0 |