[LU-4495] client evicted on parallel append write to the shared file. - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.4.0, Lustre 2.5.0, Lustre 2.6.0
Labels:
- patch
Environment:
system with 8 sot's.

Severity:
3
Rank (Obsolete):
12299

Description

client sometimes evicted with simple workload.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

1590-ptlrpcd
1.93 MB
16/Jan/14 7:55 AM
1673-cp_ast
18 kB
16/Jan/14 7:55 AM
1884-bl_ast
4 kB
16/Jan/14 7:55 AM
2485-main
1.23 MB
16/Jan/14 7:55 AM
2487-main
1.66 MB
16/Jan/14 7:55 AM
mpi_log.c
0.8 kB
16/Jan/14 7:55 AM
patch
2 kB
21/Jan/14 7:39 PM

Issue Links

is duplicated by

LU-4381 clio deadlock from truncate

Resolved

Activity

[LU-4495] client evicted on parallel append write to the shared file.

Patrick Farrell (Inactive) added a comment - 02/Apr/14 5:56 PM

Jinshan - This is my fault, actually. I said to Cory today that it seemed like getting something landed at Intel for this was dragging on and on. I wasn't specific about it - It's on Alexey to update the patch, rather than anyone at Intel to do something.

Sorry about that!

I am curious: How do you feel about the approach found in patch set 1, rather than patch set 2/3? IE, disable the code, rather than try to fix it? That approach is working fine for Cray, Alexey was just hoping to do better.

Patrick Farrell (Inactive) added a comment - 02/Apr/14 5:56 PM Jinshan - This is my fault, actually. I said to Cory today that it seemed like getting something landed at Intel for this was dragging on and on. I wasn't specific about it - It's on Alexey to update the patch, rather than anyone at Intel to do something. Sorry about that! I am curious: How do you feel about the approach found in patch set 1, rather than patch set 2/3? IE, disable the code, rather than try to fix it? That approach is working fine for Cray, Alexey was just hoping to do better.

Jinshan Xiong (Inactive) added a comment - 02/Apr/14 4:46 PM

Hi Cory, Patrick confirmed that the patch couldn't fix the problem so this patch can't be landed. Did I miss anything here?

Jinshan

Jinshan Xiong (Inactive) added a comment - 02/Apr/14 4:46 PM Hi Cory, Patrick confirmed that the patch couldn't fix the problem so this patch can't be landed. Did I miss anything here? Jinshan

Cory Spitz added a comment - 28/Mar/14 9:18 PM

The patch is still pending review. Can it land?

Cory Spitz added a comment - 28/Mar/14 9:18 PM The patch is still pending review. Can it land?

Alexey Lyashkov added a comment - 23/Jan/14 8:08 AM

http://review.whamcloud.com/8971

Xyratex-bug-id: CLSTR-2003, MRP-1603.

Alexey Lyashkov added a comment - 23/Jan/14 8:08 AM http://review.whamcloud.com/8971 Xyratex-bug-id: CLSTR-2003, MRP-1603.

Patrick Farrell (Inactive) added a comment - 21/Jan/14 7:47 PM

A number of updates here. Alexey L. provided a patch for Cray to test, and we confirmed it fixes the deadlock. (We've tested on Cray's version of 2.5, and on current master.)

I've attached that patch.

Essentially, in lov_lock_sub_init, it removes a for loop which has this comment on it:
/*

Then, create sub-locks. Once at least one sub-lock was created,
top-lock can be reached by other threads.
*/

Here's Alexey's explanation of the deadlock with a slight edit by me:
—
1) client1 has a full set of locks for a file

{ stripe1, stripe2, stripe3}

, client2 has no locks.

2) client1 finished an io and put locks to inactive states,
so client2 is able to take 2 of 3 locks from client1 and reference them.

3) client1 enters new IO loop (different process)
and start to take a lock quorum - in top lock allocation time
it finds a stripe3 lock in cache and holds it.

4) client2 tries to take a stripe3 lock but
it's protected by hold flag now and BL AST ignored
(just mark a lock a callback pending).
client2 enters the waiting for grant loop.

5) client1 enter to lock enqueue loop to take a stripe1 lock,
but client2 can't return that lock because it referenced.
... so deadlock.
—
This patch removes sub lock creation from lov_lock_sub_init.

If sub locks are not created in lov_lock_sub_init,
they are created in the enqueue process. This avoids the
deadlock described above by avoiding step 3.
—

We've run our general Lustre IO performance test suite (multiple clients, various IO patterns, various stripe counts) and weren't able to measure any difference between with the patch and without the patch.

We're currently running it through sanity and some other stress testing.

Going forward, we're wondering if this is expected to be the final version of the patch. Essentially, this fixes the problem with this performance optimization by removing the optimization. As far as I can tell, the optimization seems to be basically allowing better re-use of cl_locks, and as I understand it, caching & re-use of cl_locks is slated for removal in the upcoming CLIO simplification anyway.

So it seems like it may be OK to leave this optimization out.

Is this the plan, or is the hope for a patch that actually fixes the performance optimization rather than removes it?

Patrick Farrell (Inactive) added a comment - 21/Jan/14 7:47 PM A number of updates here. Alexey L. provided a patch for Cray to test, and we confirmed it fixes the deadlock. (We've tested on Cray's version of 2.5, and on current master.) I've attached that patch. Essentially, in lov_lock_sub_init, it removes a for loop which has this comment on it: /* Then, create sub-locks. Once at least one sub-lock was created, top-lock can be reached by other threads. */ Here's Alexey's explanation of the deadlock with a slight edit by me: — 1) client1 has a full set of locks for a file { stripe1, stripe2, stripe3} , client2 has no locks. 2) client1 finished an io and put locks to inactive states, so client2 is able to take 2 of 3 locks from client1 and reference them. 3) client1 enters new IO loop (different process) and start to take a lock quorum - in top lock allocation time it finds a stripe3 lock in cache and holds it. 4) client2 tries to take a stripe3 lock but it's protected by hold flag now and BL AST ignored (just mark a lock a callback pending). client2 enters the waiting for grant loop. 5) client1 enter to lock enqueue loop to take a stripe1 lock, but client2 can't return that lock because it referenced. ... so deadlock. — This patch removes sub lock creation from lov_lock_sub_init. If sub locks are not created in lov_lock_sub_init, they are created in the enqueue process. This avoids the deadlock described above by avoiding step 3. — We've run our general Lustre IO performance test suite (multiple clients, various IO patterns, various stripe counts) and weren't able to measure any difference between with the patch and without the patch. We're currently running it through sanity and some other stress testing. Going forward, we're wondering if this is expected to be the final version of the patch. Essentially, this fixes the problem with this performance optimization by removing the optimization. As far as I can tell, the optimization seems to be basically allowing better re-use of cl_locks, and as I understand it, caching & re-use of cl_locks is slated for removal in the upcoming CLIO simplification anyway. So it seems like it may be OK to leave this optimization out. Is this the plan, or is the hope for a patch that actually fixes the performance optimization rather than removes it?

Patrick Farrell (Inactive) added a comment - 21/Jan/14 7:39 PM

First patch from Alexey L.

Patrick Farrell (Inactive) added a comment - 21/Jan/14 7:39 PM First patch from Alexey L.

Alexey Lyashkov added a comment - 18/Jan/14 6:21 PM

I confirm
cl_lock_alloc ~~> lov_lock_sub_init -> lov_sublock_alloc~~> .. ->cl_lock_hold exist on
/lov_lock.c/1.19/Wed Nov 25 02:10:38 2009//
so it's very close to the original clio code.

but it's needs situation when second process start IO while one or more locks for a multi stripe file still in cache. so that bug will be never replicated for lustre default striping (1 stripe per file).

Alexey Lyashkov added a comment - 18/Jan/14 6:21 PM I confirm cl_lock_alloc > lov_lock_sub_init -> lov_sublock_alloc > .. ->cl_lock_hold exist on /lov_lock.c/1.19/Wed Nov 25 02:10:38 2009// so it's very close to the original clio code. but it's needs situation when second process start IO while one or more locks for a multi stripe file still in cache. so that bug will be never replicated for lustre default striping (1 stripe per file).

Patrick Farrell (Inactive) added a comment - 18/Jan/14 6:05 PM

I can confirm that we've reproduced (Cray is the customer referenced above) it on master clients running to master servers. Our original customer report was on a 2.4 derived client, but as I said, it's present in master.

The logs and reproducer Alexey attached to this ticket are from our master/master test system.

I wasn't able to replicate with our reproducer with 2.2 clients, but was able to with 2.3 and master. Alexey says the bug exists in any 2.x version, so it's likely the timings are different and the reproducer misses it on 2.2.

Patrick Farrell (Inactive) added a comment - 18/Jan/14 6:05 PM I can confirm that we've reproduced (Cray is the customer referenced above) it on master clients running to master servers. Our original customer report was on a 2.4 derived client, but as I said, it's present in master. The logs and reproducer Alexey attached to this ticket are from our master/master test system. I wasn't able to replicate with our reproducer with 2.2 clients, but was able to with 2.3 and master. Alexey says the bug exists in any 2.x version, so it's likely the timings are different and the reproducer misses it on 2.2.

John Forgan (Inactive) added a comment - 18/Jan/14 5:32 PM

Answers to questions posed by Peter Jones by email:

2. Which neo branch on https://github.com/Xyratex/lustre-stable contains the specific code in use?

The server code is based on Lustre 2.1 and is here: https://github.com/Xyratex/lustre-stable/tree/b_neo_stable_1.3

3. Which exact version of 2.4.x are running on the clients and are any patches applied?

The issue was reported on a Customer client based on 2.4, although I believe they have also recreated on vanilla 2.4 and master. I've pinged them to answer more fully. Note Alexey's comment above on his belief that it present in any 2.x Client

John Forgan (Inactive) added a comment - 18/Jan/14 5:32 PM Answers to questions posed by Peter Jones by email: 2. Which neo branch on https://github.com/Xyratex/lustre-stable contains the specific code in use? The server code is based on Lustre 2.1 and is here: https://github.com/Xyratex/lustre-stable/tree/b_neo_stable_1.3 3. Which exact version of 2.4.x are running on the clients and are any patches applied? The issue was reported on a Customer client based on 2.4, although I believe they have also recreated on vanilla 2.4 and master. I've pinged them to answer more fully. Note Alexey's comment above on his belief that it present in any 2.x Client

Alexey Lyashkov added a comment - 18/Jan/14 3:10 PM

with Nikita and Jay assistance - i found a root cause of bug.
it's deadlock between two nodes.
file have a some stripes, cli1 have locks for full stripes set, but second client want to write to same file and they tried to enqueue a locks.
enquiring locks produce a cancel of two first locks on client1. ...
client1 take a lock1, and HOLD a lock5.
but client 2 have a lock1 referenced and need take lock5, which hold on client1.
client2 don't able to release a lock1 as it's referenced in that time, but client1 ignore a bl sat request as lock in hold state.

that all. client's deadlocked. Looks bug exist on any 2.x code.

Alexey Lyashkov added a comment - 18/Jan/14 3:10 PM with Nikita and Jay assistance - i found a root cause of bug. it's deadlock between two nodes. file have a some stripes, cli1 have locks for full stripes set, but second client want to write to same file and they tried to enqueue a locks. enquiring locks produce a cancel of two first locks on client1. ... client1 take a lock1, and HOLD a lock5. but client 2 have a lock1 referenced and need take lock5, which hold on client1. client2 don't able to release a lock1 as it's referenced in that time, but client1 ignore a bl sat request as lock in hold state. that all. client's deadlocked. Looks bug exist on any 2.x code.

Alexey Lyashkov added a comment - 16/Jan/14 1:22 PM

per discussion with Nikita, he point me - sub level lock in hold state it's prevent to send an bl ast notification to top level lock and lock don't wakeup. but it's looks a bug as hold state should be set only for short time when top level lock don't able to receive any notification from sub locks.
So we should be don't have any locks in hold state over cl_lock_state_wait() as it's prevent an have any notification from sub locks.

Jay what you think about it ?

Alexey Lyashkov added a comment - 16/Jan/14 1:22 PM per discussion with Nikita, he point me - sub level lock in hold state it's prevent to send an bl ast notification to top level lock and lock don't wakeup. but it's looks a bug as hold state should be set only for short time when top level lock don't able to receive any notification from sub locks. So we should be don't have any locks in hold state over cl_lock_state_wait() as it's prevent an have any notification from sub locks. Jay what you think about it ?

People

Assignee:: Jinshan Xiong (Inactive)

Reporter:: Alexey Lyashkov

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 16/Jan/14 7:54 AM

Updated:: 03/Jun/14 3:12 PM

Resolved:: 03/Jun/14 3:12 PM