[LU-613] Lustre-Client dead-lock during binary exec() over Lustre FS Created: 19/Aug/11 Updated: 19/Nov/12 Resolved: 04/Jan/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.0.0 |
| Fix Version/s: | Lustre 2.2.0, Lustre 2.1.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Alexandre Louvet | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre 2.0.0.1 |
||
| Severity: | 3 |
| Rank (Obsolete): | 4784 |
| Description |
|
At Tera-100 we have experienced several Lustre-Client pseudo-hang situations where Console/Kernel messages/stacks started to be reported, for a 1st task as following : then for multiple others, as per the following : forced cras-dumps were taken, and each time shows the same dead-lock between 2 tasks of the same MPI application exec/page-fault'ing over the same binary standing on Lustre FS. The exact stacks showing the dead-lock look like : PID: 41138 TASK: ffff8810414d0c60 CPU: 28 COMMAND: "APPN77" Both locks involved are the "top" object's cl_object_header->coh_attr_guard spin-lock and the ll_inode_info->lli_size_sem mutex. Here is the "story" : Task 41138 from ccc_object_size_lock()->cl_object_attr_lock() takes We may encounter a new problem/scenario related to binary exec() over Lustre, or some regression introduced by patches for And a quick (and dirty ?) fix for this problem could be to always call ccc_prep_size() from vvp_io_fault_start() with "vfslock" param set ... |
| Comments |
| Comment by Jinshan Xiong (Inactive) [ 19/Aug/11 ] |
|
looking at this issue. |
| Comment by Jinshan Xiong (Inactive) [ 19/Aug/11 ] |
|
From the comment in code, we can't hold isize sem in fault path because of deadlocking with truncate path(bug 6077), but I don't know the exact reason. Please apply this patch as a workaround: diff --git a/lustre/lclient/lcommon_cl.c b/lustre/lclient/lcommon_cl.c index b083a89..1e1abab 100644 --- a/lustre/lclient/lcommon_cl.c +++ b/lustre/lclient/lcommon_cl.c @@ -919,10 +919,7 @@ int ccc_prep_size(const struct lu_env *env, struct cl_objec * is not critical that the size be correct. */ if (cl_isize_read(inode) < kms) { - if (vfslock) - cl_isize_write_nolock(inode, kms); - else - cl_isize_write(inode, kms); + cl_isize_write_nolock(inode, kms); CDEBUG(D_VFSTRACE, DFID" updating i_size "LPU64"\n", PFID(lu_object_fid(&obj->co_lu)), I'll post a new patch after I sort out those hairy locks. |
| Comment by Jinshan Xiong (Inactive) [ 24/Aug/11 ] |
|
I pushed another patch at: http://review.whamcloud.com/1281 |
| Comment by Sebastien Piechurski [ 21/Sep/11 ] |
|
Any news on the approval of this patch ? |
| Comment by Peter Jones [ 22/Sep/11 ] |
|
please go ahead and try the patch in production at CEA |
| Comment by Peter Jones [ 13/Oct/11 ] |
|
This patch has been running in production at CEA since Oct 4th without a reoccurrence |
| Comment by Christopher Morrone [ 02/Nov/11 ] |
|
Do we have an ETA for when the test will be written so the fix can land? |
| Comment by Sarah Liu [ 23/Nov/11 ] |
|
After the racer with the patch http://review.whamcloud.com/#change,1281, I still see a few "D" process root 13931 0.4 0.0 105164 856 ttyS0 S 16:06 0:12 dd if=/dev/zero of=/mnt/lustre/racer/11 bs=1k count=46840 Here is the trace: |
| Comment by Jinshan Xiong (Inactive) [ 28/Nov/11 ] |
|
From the log, those two processes were waiting for close_lock inside mdc_get_rpc_lock(), and that lock was held by process 13931 which was waiting the response from MDS. How long did you wait for racer to finish? From my opinion, if the MDS ran into problem, that process 13931 should return with error and other processes can move forward. |
| Comment by Sarah Liu [ 28/Nov/11 ] |
|
I think more than 15 minutes,sorry I cannot remember the exact time since it's a week ago I ran this test. If it's important I can try it again. |
| Comment by Jinshan Xiong (Inactive) [ 29/Nov/11 ] |
|
Sure, please do it once again. And if you meet the same problem, please check the status on the MDS and OSS to see if something is wrong. |
| Comment by Sarah Liu [ 30/Nov/11 ] |
|
Hi Jinshan I reran this test twice and hit the same issue once. After checking both MDS and OSS, cannot find anything abnormal. First time I waited for more than 30 minutes and the second time more than 15 minutes then I saw the similar trace message |
| Comment by Jinshan Xiong (Inactive) [ 30/Nov/11 ] |
|
I checked the nodes and find the servers have umounted, this is why the close of file can't be finished. I can verify this is a problem of auster since it ran well manually. I'm not an expert of auster. Sarah, can you please file a bug for this? |
| Comment by Sarah Liu [ 10/Dec/11 ] |
|
I've run this test manually and it can pass. |
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Peter Jones [ 04/Jan/12 ] |
|
Landed for 2.2 |
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|