[LU-6823] Performance regression on servers with LU-5264 Created: 09/Jul/15 Updated: 10/Jul/15 Resolved: 10/Jul/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Bruno Travouillon (Inactive) | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RHEL 6.6 w/ Bull kernel 2.6.32-504.16.2.el6.Bull.74.x86_64 1 MDT, 480 OSTs, 5000+ clients. |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Since the introduction of The perf report recorded during a slow-down window of the FS is attached. Here is a sample from perf.report-dso: #
# Overhead Shared Object
# ........ ........................
#
98.08% [kernel.kallsyms]
|
--- _spin_lock
|
|--97.33%-- 0xffffffffa05c12dc
| |
| |--53.24%-- 0xffffffffa075b629
| | kthread
| | child_rip
| |
| |--45.63%-- 0xffffffffa075b676
| | kthread
| | child_rip
| |
| --1.12%-- 0xffffffffa076aa58
| kthread
| child_rip
|
Callers: crash> kmem 0xffffffffa05c12dc
ffffffffa05c12dc (t) lu_context_exit+188 [obdclass]
VM_STRUCT ADDRESS RANGE SIZE
ffff881877c91a40 ffffffffa0571000 - ffffffffa06a5000 1261568
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0055920d28 1872df3000 0 e4d1079 1 140000000000000
crash> kmem 0xffffffffa075b676
ffffffffa075b676 (t) ptlrpc_main+2806 [ptlrpc] ../debug/lustre-2.5.3.90/lustre/ptlrpc/service.c: 2356
VM_STRUCT ADDRESS RANGE SIZE
ffff881877c91400 ffffffffa06fb000 - ffffffffa0895000 1679360
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea004b685228 158b853000 0 110 1 140000000000000
crash> kmem 0xffffffffa075b629
ffffffffa075b629 (t) ptlrpc_main+2729 [ptlrpc] ../debug/lustre-2.5.3.90/lustre/ptlrpc/service.c: 2534
VM_STRUCT ADDRESS RANGE SIZE
ffff881877c91400 ffffffffa06fb000 - ffffffffa0895000 1679360
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea004b685228 158b853000 0 110 1 140000000000000
This _spin_lock() has been introduced by For now, as a workaround, we removed both patches from My guess is that it should happen as well on OSS. Could you help us to fix this? Do we miss some patches on top of 2.5.3.90? Is there any other conflicting patch? I attach some debug logs (dump_log.tgz) from the MDS during the observed issue, if this can help. If you need further information, please let me know. |
| Comments |
| Comment by Bruno Faccini (Inactive) [ 09/Jul/15 ] |
|
Hello Bruno, |
| Comment by Bruno Travouillon (Inactive) [ 09/Jul/15 ] |
|
Indeed, this is a duplicate of |
| Comment by Andreas Dilger [ 10/Jul/15 ] |
|
Closing as a duplicate of |