[LU-4495] client evicted on parallel append write to the shared file. Created: 16/Jan/14 Updated: 03/Jun/14 Resolved: 03/Jun/14 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0, Lustre 2.5.0, Lustre 2.6.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Alexey Lyashkov | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | patch | ||
| Environment: |
system with 8 sot's. |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 12299 | ||||||||
| Description |
|
client sometimes evicted with simple workload. |
| Comments |
| Comment by Alexey Lyashkov [ 16/Jan/14 ] |
|
logs. |
| Comment by Alexey Lyashkov [ 16/Jan/14 ] |
0x68020d373efde35c - remote handle
ldlm lock ffff88007aa25b40
cl_lock - ffff88007b403ed0 ??
2487 + 2485 both rw to single file.
2487 - finished fine at
00000080:00000001:2.0:1389187621.776726:0:2485:0:(file.c:1352:ll_file_write()) Process entered
00000020:00000010:1.0:1389187621.776955:0:2487:0:(cl_lock.c:381:cl_lock_alloc()) slab-alloced 'lock': 216 at ffff88007b403ed0.
00000080:00200000:0.0:1389187621.795589:0:2487:0:(file.c:1216:ll_file_io_generic()) iot: 1, result: 133
00000020:00000010:2.0:1389187621.823858:0:2485:0:(cl_lock.c:381:cl_lock_alloc()) slab-alloced 'lock': 216 at ffff88007a95a228.
00000008:00010000:2.0:1389187621.823865:0:2485:0:(osc_lock.c:1743:osc_lock_init()) ### lock ffff88007a95a228, osc lock ffff88007000dd98, flags 0
00010000:00000010:2.0:1389187621.823949:0:2485:0:(ldlm_lock.c:477:ldlm_lock_new()) slab-alloced 'lock': 504 at ffff88007aa25b40.
00010000:00010000:1.0:1389187621.825396:0:1673:0:(ldlm_lockd.c:1696:ldlm_handle_cp_callback()) ### client completion callback handler START ns: centfs01-OST0003-osc-ffff880037b29400 lock: ffff88007aa25b40/0x68020d373efde35c lrc: 4/0,1 mode: --/PW res: [0xa2:0x0:0x0].0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20000 nid: local remote: 0xb66b896f3d8df895 expref: -99 pid: 2485 timeout: 0 lvb_type: 1
1389187621.825396 - cp for all parents (ffff88007a95a228 + ffff88007b403ed0)
00000020:00010000:1.0:1389187621.825466:0:1673:0:(cl_lock.c:1608:cl_lock_modify()) lock@ffff88007a95a228[2 2 0 1 1 00000000] W(2):[0, 18446744073709551615]@[0x100030000:0xa2:0x0] {
00000020:00010000:1.0:1389187621.825469:0:1673:0:(cl_lock.c:1608:cl_lock_modify()) lovsub@ffff88007ac19ba0: [1 ffff88007ce93af0 W(2):[0, 18446744073709551615]@[0x380002340:0x7f:0x0]]
00000020:00010000:1.0:1389187621.825471:0:1673:0:(cl_lock.c:1608:cl_lock_modify()) osc@ffff88007000dd98: ffff88007aa25b40 0x20000020002 0x68020d373efde35c 3 (null) size: 0 mtime: 0 atime: 0 ctime: 0 blocks: 0
00000020:00010000:1.0:1389187621.825472:0:1673:0:(cl_lock.c:1608:cl_lock_modify()) } lock@ffff88007a95a228
00000020:00010000:1.0:1389187621.825473:0:1673:0:(cl_lock.c:1608:cl_lock_modify()) -> W(2):[0, 18446744073709551615]@[0x100030000:0xa2:0x0]
00010000:00010000:1.0:1389187621.838412:0:2485:0:(ldlm_lock.c:918:ldlm_lock_decref_internal()) ### add lock into lru list ns: centfs01-OST0003-osc-ffff880037b29400 lock: ffff88007aa25b40/0x68020d373efde35c lrc: 3/0,0 mode: PW/PW res: [0xa2:0x0:0x0].0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x10020000020000 nid: local remote: 0xb66b896f3d8df895 expref: -99 pid: 2485 timeout: 0 lvb_type: 1
00000020:00000010:1.0:1389187621.839272:0:2485:0:(cl_lock.c:286:cl_lock_free()) slab-freed '(lock)': 216 at ffff88007b403ed0.
=====================
00000020:00000010:1.0:1389187621.839276:0:2485:0:(cl_lock.c:381:cl_lock_alloc()) slab-alloced 'lock': 216 at ffff88007b403ed0
00000020:00000001:1.0:1389187621.839519:0:2485:0:(cl_lock.c:1288:cl_enqueue_locked()) Process entered
connection from ldlm <> osc <> cl ?
00010000:00010000:1.0:1389187621.840735:0:1884:0:(ldlm_lockd.c:1654:ldlm_handle_bl_callback()) ### client blocking AST callback handler ns: centfs01-OST0003-osc-ffff880037b29400 lock: ffff88007aa25b40/0x68020d373efde35c lrc: 3/0,0 mode: PW/PW res: [0xa2:0x0:0x0].0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x420000020000 nid: local remote: 0xb66b896f3d8df895 expref: -99 pid: 2485 timeout: 0 lvb_type: 1
1389187621.840734 - bl ast for ffff88007a95a228 (?!!) in enclose with ffff88007b403ed0
....
00000020:00000001:1.0:1389187621.841322:0:2485:0:(cl_lock.c:953:cl_lock_state_wait()) Process entered
00000020:00010000:1.0:1389187621.841322:0:2485:0:(cl_lock.c:151:cl_lock_trace0()) state wait lock: ffff88007b403ed0@(9 ffff88007ba56ae0 1 1 0 1 1 0)(ffff88007d7c4140/0/1) at cl_lock_state_wait():959
00000020:00000001:1.0:1389187621.841324:0:2485:0:(cl_lock.c:151:cl_lock_trace0()) put mutex: ffff88007b403ed0@(9 ffff88007ba56ae0 1 1 0 1 1 0)(ffff88007d7c4140/0/1) at cl_lock_mutex_put():754
00000020:00000001:1.0:1389187772.690095:0:2485:0:(cl_lock.c:151:cl_lock_trace0()) got mutex: ffff88007b403ed0@(9 ffff88007ba56ae0 1 1 0 1 1 0)(ffff88007d7c4140/0/1) at cl_lock_mutex_tail():662
00000020:00000001:1.0:1389187772.690097:0:2485:0:(cl_lock.c:990:cl_lock_state_wait()) Process leaving (rc=0 : 0 : 0)
---
00000020:00000001:1.0:1389187772.690198:0:2485:0:(cl_lock.c:2072:cl_lock_unhold()) Process entered
00000020:00000001:1.0:1389187772.690198:0:2485:0:(cl_lock.c:900:cl_lock_hold_release()) Process entered
00000020:00010000:1.0:1389187772.690198:0:2485:0:(cl_lock.c:151:cl_lock_trace0()) hold release lock: ffff88007a95a228@(2 ffff88007ba56ae0 1 5 0 1 0 6)(ffff88007d7c4140/1/1) at cl_lock_hold_release():901
00000020:00000001:1.0:1389187772.690200:0:2485:0:(cl_lock.c:798:cl_lock_cancel0()) Process entered
00010000:00010000:1.0:1389187772.690370:0:2485:0:(ldlm_request.c:1186:ldlm_cancel_pack()) ### packing ns: centfs01-OST0003-osc-ffff880037b29400 lock: ffff88007aa25b40/0x68020d373efde35c lrc: 1/0,0 mode: --/PW res: [0xa2:0x0:0x0].0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x4c69400020000 nid: local remote: 0xb66b896f3d8df895 expref: -99 pid: 2485 timeout: 0 lvb_type: 1
00000020:00000010:1.0:1389187772.690421:0:2485:0:(cl_lock.c:286:cl_lock_free()) slab-freed '(lock)': 216 at ffff88007a95a228.
|
| Comment by Alexey Lyashkov [ 16/Jan/14 ] |
|
per discussion with Nikita, he point me - sub level lock in hold state it's prevent to send an bl ast notification to top level lock and lock don't wakeup. but it's looks a bug as hold state should be set only for short time when top level lock don't able to receive any notification from sub locks. Jay what you think about it ? |
| Comment by Alexey Lyashkov [ 18/Jan/14 ] |
|
with Nikita and Jay assistance - i found a root cause of bug. that all. client's deadlocked. Looks bug exist on any 2.x code. |
| Comment by John Forgan (Inactive) [ 18/Jan/14 ] |
|
Answers to questions posed by Peter Jones by email: 2. Which neo branch on https://github.com/Xyratex/lustre-stable contains the specific code in use? The server code is based on Lustre 2.1 and is here: https://github.com/Xyratex/lustre-stable/tree/b_neo_stable_1.3 3. Which exact version of 2.4.x are running on the clients and are any patches applied? The issue was reported on a Customer client based on 2.4, although I believe they have also recreated on vanilla 2.4 and master. I've pinged them to answer more fully. Note Alexey's comment above on his belief that it present in any 2.x Client |
| Comment by Patrick Farrell (Inactive) [ 18/Jan/14 ] |
|
I can confirm that we've reproduced (Cray is the customer referenced above) it on master clients running to master servers. Our original customer report was on a 2.4 derived client, but as I said, it's present in master. The logs and reproducer Alexey attached to this ticket are from our master/master test system. I wasn't able to replicate with our reproducer with 2.2 clients, but was able to with 2.3 and master. Alexey says the bug exists in any 2.x version, so it's likely the timings are different and the reproducer misses it on 2.2. |
| Comment by Alexey Lyashkov [ 18/Jan/14 ] |
|
I confirm but it's needs situation when second process start IO while one or more locks for a multi stripe file still in cache. so that bug will be never replicated for lustre default striping (1 stripe per file). |
| Comment by Patrick Farrell (Inactive) [ 21/Jan/14 ] |
|
First patch from Alexey L. |
| Comment by Patrick Farrell (Inactive) [ 21/Jan/14 ] |
|
A number of updates here. Alexey L. provided a patch for Cray to test, and we confirmed it fixes the deadlock. (We've tested on Cray's version of 2.5, and on current master.) I've attached that patch. Essentially, in lov_lock_sub_init, it removes a for loop which has this comment on it:
Here's Alexey's explanation of the deadlock with a slight edit by me: , client2 has no locks. 2) client1 finished an io and put locks to inactive states, 3) client1 enters new IO loop (different process) 4) client2 tries to take a stripe3 lock but 5) client1 enter to lock enqueue loop to take a stripe1 lock, If sub locks are not created in lov_lock_sub_init, We've run our general Lustre IO performance test suite (multiple clients, various IO patterns, various stripe counts) and weren't able to measure any difference between with the patch and without the patch. We're currently running it through sanity and some other stress testing. Going forward, we're wondering if this is expected to be the final version of the patch. Essentially, this fixes the problem with this performance optimization by removing the optimization. As far as I can tell, the optimization seems to be basically allowing better re-use of cl_locks, and as I understand it, caching & re-use of cl_locks is slated for removal in the upcoming CLIO simplification anyway. So it seems like it may be OK to leave this optimization out. Is this the plan, or is the hope for a patch that actually fixes the performance optimization rather than removes it? |
| Comment by Alexey Lyashkov [ 23/Jan/14 ] |
|
http://review.whamcloud.com/8971 Xyratex-bug-id: CLSTR-2003, MRP-1603. |
| Comment by Cory Spitz [ 28/Mar/14 ] |
|
The patch is still pending review. Can it land? |
| Comment by Jinshan Xiong (Inactive) [ 02/Apr/14 ] |
|
Hi Cory, Patrick confirmed that the patch couldn't fix the problem so this patch can't be landed. Did I miss anything here? Jinshan |
| Comment by Patrick Farrell (Inactive) [ 02/Apr/14 ] |
|
Jinshan - This is my fault, actually. I said to Cory today that it seemed like getting something landed at Intel for this was dragging on and on. I wasn't specific about it - It's on Alexey to update the patch, rather than anyone at Intel to do something. Sorry about that! I am curious: How do you feel about the approach found in patch set 1, rather than patch set 2/3? IE, disable the code, rather than try to fix it? That approach is working fine for Cray, Alexey was just hoping to do better. |
| Comment by Jinshan Xiong (Inactive) [ 03/Apr/14 ] |
|
Take it easy, Patrick. To be honest, I didn't look into the patch v2 and v3. Anyway, can you please create a patch to comment the code out, as I mentioned in patch v1? |
| Comment by Gregoire Pichon [ 08/Apr/14 ] |
|
Hello, I would like to mention that, on a lustre 2.4.2 version we were seeing client evictions when running a testcase which makes tasks append write to a single shared file. After applying patch http://review.whamcloud.com/9876 |
| Comment by Patrick Farrell (Inactive) [ 08/Apr/14 ] |
|
Jinshan: Actually, Alexey's original patch (commenting out code) removes exactly the same code as your patch: Otherwise, it will cause deadlock because it essentially holds Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com> Which has been landed to master. |
| Comment by Patrick Farrell (Inactive) [ 08/Apr/14 ] |
|
This is a duplicate of: |
| Comment by Jinshan Xiong (Inactive) [ 09/Apr/14 ] |
|
duplicate of |
| Comment by Jodi Levi (Inactive) [ 03/Jun/14 ] |
|
Reopening to close as duplicate as indicated in the comments. |