[LU-8927] osp-syn processes contending for osq_lock drives system cpu usage > 80% Created: 08/Dec/16 Updated: 18/Sep/17 Resolved: 05/Sep/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Olaf Faaland | Assignee: | Alex Zhuravlev |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | llnl, zfs | ||
| Environment: |
lustre-2.8.0_5.chaos-2.ch6.x86_64 |
||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
Ran jobs which created remote directories (not striped) and then ran mdtest within them, several MDS nodes are using >80% of their cpu time for osp-syn-* processes. There are 36 osp-syn-* processes. The processes are spending almost all their time contending for osq_lock. According to perf, the offending stack is: osq_lock |
| Comments |
| Comment by Olaf Faaland [ 08/Dec/16 ] |
|
Our stack is available to Intel engineers via repository named "lustre-release-fe-llnl" hosted on your gerritt server. |
| Comment by Olaf Faaland [ 08/Dec/16 ] |
|
I see that llog_cancel_rec() contains the following: rc = llog_declare_write_rec(env, loghandle, &llh->llh_hdr, index,
if (rc < 0)
GOTO(out_trans, rc);
if ((llh->llh_flags & LLOG_F_ZAP_WHEN_EMPTY))
rc = llog_declare_destroy(env, loghandle, th);
th->th_wait_submit = 1;
rc = dt_trans_start_local(env, dt, th);
So it seems to declare that it will destroy the llog object every time it cancels a record, as if every record is the last one. Why is that? Shouldn't it also depend on how many active records the llog contains? |
| Comment by Alex Zhuravlev [ 09/Dec/16 ] |
|
when we declare llog cancelation we don't known whether it will be last one or not, otherwise we'd have to lock llog since declaration upto transaction stop killing concurrency. the newer versions of Lustre will fix this problem in osd-zfs module. |
| Comment by Peter Jones [ 09/Dec/16 ] |
|
Alex Could you please elaborate about the work underway in this area? Thanks Peter |
| Comment by Olaf Faaland [ 09/Dec/16 ] |
|
Yes, please elaborate. I know there are many ways to work on this and it would be great to know the nature and scope of the fix you have in mind. I looked again and the MDTs are still working to clear llog records from jobs run about 45 hours ago (contended the entire time). I don't think we can go into production without a fix for this. |
| Comment by Alex Zhuravlev [ 13/Dec/16 ] |
|
in very few words - I've been working to make declarations with ZFS cheap. right now those are quite expensive because DMU API works with dnode numbers, so every time it needs to translate dnode number into dnode structure using the global hash table. few patches have been landed already onto master branch and released as a part of 2.9 (e.g.
this way the declarations should become mostly lockless and much cheaper. |
| Comment by Olaf Faaland [ 13/Dec/16 ] |
|
OK, thanks. In the above list of tickets, you include LU-8893. Did you mean I looked briefly at the |
| Comment by Alex Zhuravlev [ 13/Dec/16 ] |
|
you're right, I mean |
| Comment by Olaf Faaland [ 03/Jan/17 ] |
|
Alex, I applied The full set of patches above would be too much for a stable branch, I would think. So I've rewritten llog_cancel_rec() to destroy the llog in a second transaction, if it's necessary. Maybe this is a poor approach; feedback or an alternative would be welcome. In any case I've pushed it to gerrit and will do local testing after it passes maloo. |
| Comment by Peter Jones [ 04/Sep/17 ] |
|
I would like a level set on this ticket. All of the planned work to improve metadata performance for ZFS has now landed to master (and b2_10). Are there any specific tasks identified and remaining beyond that? |
| Comment by Olaf Faaland [ 05/Sep/17 ] |
|
This lock contention has not resulted in problems in production, and there is so much related change in 2.10 and master that it's quite possible the problem does not occur there. Closing the ticket. |