[LU-4840] Deadlock when truncating file during lfs migrate Created: 31/Mar/14 Updated: 09/Oct/16 Resolved: 14/Sep/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.2 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Patrick Valentin (Inactive) | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | cea | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 13336 | ||||||||||||||||||||
| Description |
|
While migrating a file with "lfs migrate", if a process tries to truncate the file, both lfs migrate and truncating processes will deadlock. This will result in both processes never finishing (unless it is killed) and watchdog messages saying that the processes did not progress for the last XXX seconds. Here is a reproducer: [root@lustre24cli ~]# cat reproducer.sh
#!/bin/sh
FS=/test
FILE=${FS}/file
rm -f ${FILE}
# Create a file on OST 1 of size 512M
lfs setstripe -o 1 -c 1 ${FILE}
dd if=/dev/zero of=${FILE} bs=1M count=512
echo 3 > /proc/sys/vm/drop_caches
# Launch a migrate to OST 0 and a bit later open it for write
lfs migrate -i 0 --block ${FILE} &
sleep 2
dd if=/dev/zero of=${FILE} bs=1M count=512
Once the last dd tries to open the file, both lfs and dd processes stay forever with this stack: lfs stack: [<ffffffff8128e864>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffffa08d98dd>] ll_file_io_generic+0x29d/0x600 [lustre] [<ffffffffa08d9d7f>] ll_file_aio_read+0x13f/0x2c0 [lustre] [<ffffffffa08da61c>] ll_file_read+0x16c/0x2a0 [lustre] [<ffffffff811896b5>] vfs_read+0xb5/0x1a0 [<ffffffff811897f1>] sys_read+0x51/0x90 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff dd stack: [<ffffffffa03436fe>] cfs_waitq_wait+0xe/0x10 [libcfs] [<ffffffffa04779fa>] cl_lock_state_wait+0x1aa/0x320 [obdclass] [<ffffffffa04781eb>] cl_enqueue_locked+0x15b/0x1f0 [obdclass] [<ffffffffa0478d6e>] cl_lock_request+0x7e/0x270 [obdclass] [<ffffffffa047e00c>] cl_io_lock+0x3cc/0x560 [obdclass] [<ffffffffa047e242>] cl_io_loop+0xa2/0x1b0 [obdclass] [<ffffffffa092a8c8>] cl_setattr_ost+0x208/0x2c0 [lustre] [<ffffffffa08f8a0e>] ll_setattr_raw+0x9ce/0x1000 [lustre] [<ffffffffa08f909b>] ll_setattr+0x5b/0xf0 [lustre] [<ffffffff811a7348>] notify_change+0x168/0x340 [<ffffffff81187074>] do_truncate+0x64/0xa0 [<ffffffff8119bcc1>] do_filp_open+0x861/0xd20 [<ffffffff81185d39>] do_sys_open+0x69/0x140 [<ffffffff81185e50>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff |
| Comments |
| Comment by Peter Jones [ 31/Mar/14 ] |
|
Bobijam Could you please comment? thanks Peter |
| Comment by Oleg Drokin [ 01/Apr/14 ] |
|
So I tried the script on master and it also happens there. |
| Comment by Bob Glossman (Inactive) [ 01/Apr/14 ] |
|
I can also reproduce it on master. The exact details of the lfs and dd task stacks are a little different but the deadlock is still there. |
| Comment by Zhenyu Xu [ 03/Apr/14 ] |
|
It relates to the lfs migrate implementation ( In this case, lfs migrate gets the group lock, then the other client from the same node tries to truncate the file which takes the inode truncate semaphore (lli_trunc_sem) and enqueues OST lock and waits for it to be granted. The lfs migrate comes to the read phase, it tries get the truncate semaphore as well. The other client cannot get its OST lock granted, since OST cannot revoke it from the lfs migrate process. Deadlock happens. |
| Comment by Jinshan Xiong (Inactive) [ 03/Apr/14 ] |
|
It's a good chance to reimplement migration with open lease. Aurelien and JC, do you have any inputs on this? |
| Comment by Aurelien Degremont (Inactive) [ 09/Apr/14 ] |
|
I see no objection to replacing grouplock-based code with an open-lease mechanism. Current code was using this lock because open lease did not exist as that time and we were advised to do this this way as they was no mechanism to protect from concurrent access. I think it will be cleaner. |
| Comment by Jinshan Xiong (Inactive) [ 10/Apr/14 ] |
|
That's true, Aurelien. As long as the migration was implemented by CEA, would it be possible for CEA to pick it up again to reimplement it with open lease? Jinshan |
| Comment by Aurelien Degremont (Inactive) [ 17/Apr/14 ] |
|
OK, CEA will do it. |
| Comment by Henri Doreau (Inactive) [ 18/Apr/14 ] |
|
Here's a patch aimed to solve it: |
| Comment by Jinshan Xiong (Inactive) [ 22/Apr/14 ] |
|
Use file lease to implement migration. |
| Comment by Jinshan Xiong (Inactive) [ 22/Apr/14 ] |
|
Please check the attachment for the implementation of migration. The procedure is a little bit like HSM release where close and swap layout should be an atomic operation. Also, you need to check if the lease is valid in the middle of data copying periodically therefore data copying can abort if the file is being opened by others. Please take a look at it and I'll be happy to answer questions. |
| Comment by Henri Doreau (Inactive) [ 22/Apr/14 ] |
|
Thanks Jinshan. I still have a question regarding when to check data version. In which case could it fail if we both get and check it under the file lease? Also swap layout will perform a dataversion check. Is the following sequence flawed?
|
| Comment by Jinshan Xiong (Inactive) [ 23/Apr/14 ] |
|
We have to make `put lease', `swap layout', and `close source file' in one atomic operation. Otherwise, if the source is opened for writing after `put lease' and generate some dirty pages, it will produce data corruption. |
| Comment by Henri Doreau (Inactive) [ 23/Apr/14 ] |
|
Cf. the new patchset. Indeed, I do see the race window between llapi_lease_put() and llapi_fswap_layouts() but I can't see any userland API that would allow us to get rid of it. Am I missing something? You're stressing the need to do operations atomically, do you have something in mind, like making the SWAP_LAYOUT ioctl lease-aware? |
| Comment by Jinshan Xiong (Inactive) [ 24/Apr/14 ] |
|
No, there is no API in user space ready to use.
Yes, similar. Instead of making SWAP_LAYOUT lease-aware, what we need is to make lease SWAP-LAYOUT aware. When we release a lease, a CLOSE RPC will be sent to MDT. We're going to pack the FID of volatile file into the RPC, with a special bias (similar to MDS_HSM_RELEASE, please check ll_close_inode_openhandle() for 'op_data->op_bias |= MDS_HSM_RELEASE', and mdt_hsm_release()), for example, MDS_CLOSE_SWAP_LAYOUT to tell MDT that we want to unlock the release and swap the layout. We're going to extend `struct close_data' to include those information. And the ioctl() of LL_IOC_SET_LEASE with F_UNLCK will be extended as well. It's totally fine for me to write a design but I think you have understood the problem and are capable of doing it yourself, based on your questions above. |
| Comment by Henri Doreau (Inactive) [ 04/Jun/14 ] |
|
It took me a little while but I buckled down and just pushed a patch that follows the guidelines you gave me. In userland I've extended the swap layouts ioctl to leverage the existing API and preserve compatibility (I was reluctant to change the parameters of the set lease ioctl). In kernel land it's very close to hsm_release. Hope this is fine conceptually. |
| Comment by Henri Doreau (Inactive) [ 22/Jul/14 ] |
|
Any update on this? |
| Comment by Jinshan Xiong (Inactive) [ 22/Jul/14 ] |
|
I'll look at the patch this week, really sorry for delay. |
| Comment by Frank Zago (Inactive) [ 06/Nov/14 ] |
|
I tested rev 12 of this patch on top of head of tree. Nothing fancy: ./llmount.sh cd /mnt/lustre cp /bin/ls . ~/lustre-release/lustre/utils/lfs migrate -o 0 ls After lfs decided to output the content of ls (???) to stdout, the node proceeded to not like me and crash. crash> bt
PID: 6463 TASK: ffff88007ac1b500 CPU: 1 COMMAND: "lfs"
#0 [ffff880012b89820] machine_kexec at ffffffff81038f3b
#1 [ffff880012b89880] crash_kexec at ffffffff810c5b62
#2 [ffff880012b89950] oops_end at ffffffff8152c8a0
#3 [ffff880012b89980] no_context at ffffffff8104a00b
#4 [ffff880012b899d0] __bad_area_nosemaphore at ffffffff8104a295
#5 [ffff880012b89a20] bad_area at ffffffff8104a3be
#6 [ffff880012b89a50] __do_page_fault at ffffffff8104ab6f
#7 [ffff880012b89b70] do_page_fault at ffffffff8152e7ee
#8 [ffff880012b89ba0] page_fault at ffffffff8152bba5
[exception RIP: ll_mdscapa_get+65]
RIP: ffffffffa08cdc91 RSP: ffff880012b89c58 RFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff88000ccf9200 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff88001fb0e138 RDI: ffff88007c8303d8
RBP: ffff880012b89c68 R8: 0000000000000000 R9: 0000000000000000
R10: ffff88000ccf9200 R11: 0000000000000200 R12: ffff88007c8303d8
R13: ffff88007c8303d8 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff880012b89c70] ll_prep_md_op_data at ffffffffa08a0e45 [lustre]
#10 [ffff880012b89ce0] ll_prepare_close at ffffffffa0881098 [lustre]
#11 [ffff880012b89d30] ll_close_inode_openhandle at ffffffffa088a6f2 [lustre]
#12 [ffff880012b89db0] ll_file_ioctl at ffffffffa0894fc8 [lustre]
#13 [ffff880012b89e60] vfs_ioctl at ffffffff8119e422
#14 [ffff880012b89ea0] do_vfs_ioctl at ffffffff8119e8ea
#15 [ffff880012b89f30] sys_ioctl at ffffffff8119eb41
#16 [ffff880012b89f80] system_call_fastpath at ffffffff8100b072
RIP: 0000003a522e0b37 RSP: 00007fff083e48d0 RFLAGS: 00010292
RAX: 0000000000000010 RBX: ffffffff8100b072 RCX: 00000000545ab5fe
RDX: 00007fff083e48c0 RSI: 00000000402066db RDI: 0000000000000003
RBP: 0000000000000001 R8: 000000002e1aa2e5 R9: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 000000002e1aa2e5
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000003
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b
Without this patch, I still get the junk output, but not the crash. |
| Comment by Henri Doreau (Inactive) [ 06/Nov/14 ] |
|
Thanks Frank. Null pointer (sbi) dereference in ll_mdscapa_get(). Fixed in patchset #13. The file content ending up in the console remains unexplained to me so far. You said it was present before, is there an open ticket for that? |
| Comment by Frank Zago (Inactive) [ 06/Nov/14 ] |
|
Thanks. The fix works, and I can migrate a file between osts now. Regarding the junk output, I found the bug in llapi_file_open_param(). I'll submit a patch soon. |
| Comment by Henri Doreau (Inactive) [ 07/Nov/14 ] |
|
Follow-up patch, fixes numerous issues with the first one: http://review.whamcloud.com/#/c/12616/ |
| Comment by Andreas Dilger [ 07/Nov/14 ] |
|
Henri, I agree with Frank that we should not be landing a patch with significant known defects,since this would break the code for anyone testing this. Please merge the patches. |
| Comment by Patrick Farrell (Inactive) [ 18/Nov/14 ] |
|
One advantage to the old approach of using group locks for migration was that it was theoretically possible to create a version of lfs migrate that could migrate a file in parallel using multiple clients. Is this still possible with the new approach? |
| Comment by Henri Doreau (Inactive) [ 19/Nov/14 ] |
|
Yes, it is still possible. Though an early version of the patch removed grouplock-protected migration, it has now been re-introduced. Migration can be either grouplock-protected and blocking (as before), or based on exclusive open and non-blocking (would safely abort if a concurrent process opens the file). We would need file leases to provide a notion of "group" to be able to implement non-blocking parallel migration too. |
| Comment by Patrick Farrell (Inactive) [ 19/Nov/14 ] |
|
Thanks for the response, Henri. I'm glad to hear the group lock option was retained, and I see the deadlock with truncate was resolved as well. |
| Comment by Oleg Drokin [ 06/Jan/15 ] |
|
Just to draw attention to my comment in gerrit. 8.832781] LNet: Service thread pid 26108 was inactive for 62.00s. The thread mig 8.833657] Pid: 26108, comm: mdt00_007 8.833906] 8.833907] Call Trace: 8.834350] [<ffffffffa13be503>] ? _ldlm_lock_debug+0x2e3/0x670 [ptlrpc] 8.834649] [<ffffffff81516894>] ? _spin_lock_irqsave+0x24/0x30 8.834934] [<ffffffff81514231>] schedule_timeout+0x191/0x2e0 8.835310] [<ffffffff81081e50>] ? process_timeout+0x0/0x10 8.835629] [<ffffffffa13decf0>] ? ldlm_expired_completion_wait+0x0/0x370 [ptlrpc 8.836070] [<ffffffffa13e3841>] ldlm_completion_ast+0x5e1/0x9b0 [ptlrpc] 8.836267] [<ffffffff8105de00>] ? default_wake_function+0x0/0x20 8.836475] [<ffffffffa13e2c8e>] ldlm_cli_enqueue_local+0x21e/0x7f0 [ptlrpc] 8.836735] [<ffffffffa13e3260>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] 8.836950] [<ffffffffa056bbc0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] 8.837190] [<ffffffffa0574805>] mdt_object_local_lock+0x3c5/0xa80 [mdt] 8.837391] [<ffffffffa056bbc0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] 8.837638] [<ffffffffa13e3260>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] 8.837852] [<ffffffffa0575245>] mdt_object_lock_internal+0x65/0x360 [mdt] 8.838074] [<ffffffffa0575604>] mdt_object_lock+0x14/0x20 [mdt] 8.838265] [<ffffffffa059328a>] mdt_reint_unlink+0x20a/0x10c0 [mdt] 8.838485] [<ffffffffa120fa80>] ? lu_ucred+0x20/0x30 [obdclass] 8.838676] [<ffffffffa056ad25>] ? mdt_ucred+0x15/0x20 [mdt] 8.838898] [<ffffffffa05858bc>] ? mdt_root_squash+0x2c/0x3f0 [mdt] 8.839232] [<ffffffffa1434e02>] ? __req_capsule_get+0x162/0x6d0 [ptlrpc] 8.839566] [<ffffffffa0589aad>] mdt_reint_rec+0x5d/0x200 [mdt] 8.839881] [<ffffffffa056f5ab>] mdt_reint_internal+0x4cb/0x7a0 [mdt] 8.840205] [<ffffffffa056fe0b>] mdt_reint+0x6b/0x120 [mdt] 8.840550] [<ffffffffa146e85e>] tgt_request_handle+0x8be/0x1000 [ptlrpc] 8.840915] [<ffffffffa141fd64>] ptlrpc_main+0xdf4/0x1940 [ptlrpc] 8.841304] [<ffffffffa141ef70>] ? ptlrpc_main+0x0/0x1940 [ptlrpc] 8.841624] [<ffffffff81098c06>] kthread+0x96/0xa0 8.841917] [<ffffffff8100c24a>] child_rip+0xa/0x20 8.842199] [<ffffffff81098b70>] ? kthread+0x0/0xa0 8.842492] [<ffffffff8100c240>] ? child_rip+0x0/0x20 8.842780] 8.842997] LustreError: dumping log to /tmp/lustre-log.1420492523.26108 9.015282] Pid: 9643, comm: mdt00_006 9.015565] 9.015566] Call Trace: 9.016033] [<ffffffffa13be503>] ? _ldlm_lock_debug+0x2e3/0x670 [ptlrpc] 9.017088] [<ffffffff81516894>] ? _spin_lock_irqsave+0x24/0x30 9.017352] [<ffffffff81514231>] schedule_timeout+0x191/0x2e0 9.017687] [<ffffffff81081e50>] ? process_timeout+0x0/0x10 9.018074] [<ffffffffa13decf0>] ? ldlm_expired_completion_wait+0x0/0x370 [ptlrpc 9.018593] [<ffffffffa13e3841>] ldlm_completion_ast+0x5e1/0x9b0 [ptlrpc] 9.018880] [<ffffffff8105de00>] ? default_wake_function+0x0/0x20 9.019196] [<ffffffffa13e2c8e>] ldlm_cli_enqueue_local+0x21e/0x7f0 [ptlrpc] 9.019615] [<ffffffffa13e3260>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] 9.019940] [<ffffffffa056bbc0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] 9.020220] [<ffffffffa05745fb>] mdt_object_local_lock+0x1bb/0xa80 [mdt] 9.020558] [<ffffffffa056bbc0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] 9.020860] [<ffffffffa13e3260>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] 9.021150] [<ffffffffa0575245>] mdt_object_lock_internal+0x65/0x360 [mdt] 9.021477] [<ffffffffa0575604>] mdt_object_lock+0x14/0x20 [mdt] 9.022987] [<ffffffffa05806fc>] mdt_getattr_name_lock+0x103c/0x1ab0 [mdt] 9.023297] [<ffffffff8128863a>] ? strlcpy+0x4a/0x60 9.023573] [<ffffffffa140ff84>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc] 9.023888] [<ffffffffa14116d0>] ? lustre_swab_ldlm_reply+0x0/0x40 [ptlrpc] 9.024182] [<ffffffffa0581692>] mdt_intent_getattr+0x292/0x470 [mdt] 9.024493] [<ffffffffa056e064>] mdt_intent_policy+0x494/0xce0 [mdt] 9.024789] [<ffffffffa13c305f>] ldlm_lock_enqueue+0x12f/0x950 [ptlrpc] 9.025140] [<ffffffffa10b9201>] ? cfs_hash_for_each_enter+0x1/0xa0 [libcfs] 9.025454] [<ffffffffa13eedeb>] ldlm_handle_enqueue0+0x51b/0x13e0 [ptlrpc] 9.025747] [<ffffffffa146dc72>] tgt_enqueue+0x62/0x1d0 [ptlrpc] 9.026616] [<ffffffffa146e85e>] tgt_request_handle+0x8be/0x1000 [ptlrpc] 9.026970] [<ffffffffa141fd64>] ptlrpc_main+0xdf4/0x1940 [ptlrpc] 9.027313] [<ffffffffa141ef70>] ? ptlrpc_main+0x0/0x1940 [ptlrpc] 9.027622] [<ffffffff81098c06>] kthread+0x96/0xa0 9.028084] [<ffffffff8100c24a>] child_rip+0xa/0x20 9.029247] Pid: 6818, comm: mdt01_002 9.029453] 9.029453] Call Trace: 9.029739] [<ffffffffa13be503>] ? _ldlm_lock_debug+0x2e3/0x670 [ptlrpc] 9.029980] [<ffffffff81516894>] ? _spin_lock_irqsave+0x24/0x30 9.030164] [<ffffffff81514231>] schedule_timeout+0x191/0x2e0 9.030341] [<ffffffff81081e50>] ? process_timeout+0x0/0x10 9.030579] [<ffffffffa13decf0>] ? ldlm_expired_completion_wait+0x0/0x370 [ptlrpc 9.031058] [<ffffffffa13e3841>] ldlm_completion_ast+0x5e1/0x9b0 [ptlrpc] 9.031335] [<ffffffff8105de00>] ? default_wake_function+0x0/0x20 9.031630] [<ffffffffa13e2c8e>] ldlm_cli_enqueue_local+0x21e/0x7f0 [ptlrpc] 9.032198] [<ffffffffa13e3260>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] 9.032491] [<ffffffffa056bbc0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] 9.032773] [<ffffffffa05745fb>] mdt_object_local_lock+0x1bb/0xa80 [mdt] 9.033094] [<ffffffffa056bbc0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] 9.033387] [<ffffffffa13e3260>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] 9.033675] [<ffffffffa0575245>] mdt_object_lock_internal+0x65/0x360 [mdt] 9.033988] [<ffffffffa0575604>] mdt_object_lock+0x14/0x20 [mdt] 9.034264] [<ffffffffa05806fc>] mdt_getattr_name_lock+0x103c/0x1ab0 [mdt] 9.034544] [<ffffffff8128863a>] ? strlcpy+0x4a/0x60 9.034891] [<ffffffffa140ff84>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc] 9.035203] [<ffffffffa14116d0>] ? lustre_swab_ldlm_reply+0x0/0x40 [ptlrpc] 9.035495] [<ffffffffa0581692>] mdt_intent_getattr+0x292/0x470 [mdt] 9.035774] [<ffffffffa056e064>] mdt_intent_policy+0x494/0xce0 [mdt] 9.036192] [<ffffffffa13c305f>] ldlm_lock_enqueue+0x12f/0x950 [ptlrpc] 9.036479] [<ffffffffa10b9201>] ? cfs_hash_for_each_enter+0x1/0xa0 [libcfs] 9.036788] [<ffffffffa13eedeb>] ldlm_handle_enqueue0+0x51b/0x13e0 [ptlrpc] 9.037136] [<ffffffffa146dc72>] tgt_enqueue+0x62/0x1d0 [ptlrpc] 9.037429] [<ffffffffa146e85e>] tgt_request_handle+0x8be/0x1000 [ptlrpc] 9.037735] [<ffffffffa141fd64>] ptlrpc_main+0xdf4/0x1940 [ptlrpc] 9.039938] [<ffffffffa141ef70>] ? ptlrpc_main+0x0/0x1940 [ptlrpc] 9.040213] [<ffffffff81098c06>] kthread+0x96/0xa0 9.040456] [<ffffffff8100c24a>] child_rip+0xa/0x20 9.040707] [<ffffffff81098b70>] ? kthread+0x0/0xa0 9.040980] [<ffffffff8100c240>] ? child_rip+0x0/0x20 9.041230] 9.041415] Pid: 6815, comm: mdt00_002 9.041637] 9.041638] Call Trace: 9.042070] [<ffffffffa13be503>] ? _ldlm_lock_debug+0x2e3/0x670 [ptlrpc] 9.042351] [<ffffffff81516894>] ? _spin_lock_irqsave+0x24/0x30 9.042615] [<ffffffff81514231>] schedule_timeout+0x191/0x2e0 9.042890] [<ffffffff81081e50>] ? process_timeout+0x0/0x10 9.043183] [<ffffffffa13decf0>] ? ldlm_expired_completion_wait+0x0/0x370 [ptlrpc 9.043656] [<ffffffffa13e3841>] ldlm_completion_ast+0x5e1/0x9b0 [ptlrpc] 9.044075] [<ffffffff8105de00>] ? default_wake_function+0x0/0x20 9.044366] [<ffffffffa13e2c8e>] ldlm_cli_enqueue_local+0x21e/0x7f0 [ptlrpc] 9.044677] [<ffffffffa13e3260>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] 9.045145] [<ffffffffa056bbc0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] 9.045420] [<ffffffffa05745fb>] mdt_object_local_lock+0x1bb/0xa80 [mdt] 9.045705] [<ffffffffa056bbc0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] 9.046019] [<ffffffffa13e3260>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] 9.046309] [<ffffffffa0575245>] mdt_object_lock_internal+0x65/0x360 [mdt] 9.046596] [<ffffffffa0575604>] mdt_object_lock+0x14/0x20 [mdt] 9.046873] [<ffffffffa05806fc>] mdt_getattr_name_lock+0x103c/0x1ab0 [mdt] 9.047152] [<ffffffff8128863a>] ? strlcpy+0x4a/0x60 9.047427] [<ffffffffa140ff84>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc] 9.047736] [<ffffffffa14116d0>] ? lustre_swab_ldlm_reply+0x0/0x40 [ptlrpc] 9.048031] [<ffffffffa0581692>] mdt_intent_getattr+0x292/0x470 [mdt] 9.048308] [<ffffffffa056e064>] mdt_intent_policy+0x494/0xce0 [mdt] 9.048604] [<ffffffffa13c305f>] ldlm_lock_enqueue+0x12f/0x950 [ptlrpc] 9.048918] [<ffffffffa10b9201>] ? cfs_hash_for_each_enter+0x1/0xa0 [libcfs] 9.049232] [<ffffffffa13eedeb>] ldlm_handle_enqueue0+0x51b/0x13e0 [ptlrpc] 9.049544] [<ffffffffa146dc72>] tgt_enqueue+0x62/0x1d0 [ptlrpc] 9.049852] [<ffffffffa146e85e>] tgt_request_handle+0x8be/0x1000 [ptlrpc] 9.050160] [<ffffffffa141fd64>] ptlrpc_main+0xdf4/0x1940 [ptlrpc] 9.050453] [<ffffffffa141ef70>] ? ptlrpc_main+0x0/0x1940 [ptlrpc] 9.050720] [<ffffffff81098c06>] kthread+0x96/0xa0 9.050970] [<ffffffff8100c24a>] child_rip+0xa/0x20 9.051895] Pid: 6817, comm: mdt01_001 9.052114] 9.052114] Call Trace: 9.052817] [<ffffffffa13be503>] ? _ldlm_lock_debug+0x2e3/0x670 [ptlrpc] 9.053127] [<ffffffff81516894>] ? _spin_lock_irqsave+0x24/0x30 9.053391] [<ffffffff81514231>] schedule_timeout+0x191/0x2e0 9.053653] [<ffffffff81081e50>] ? process_timeout+0x0/0x10 9.053952] [<ffffffffa13decf0>] ? ldlm_expired_completion_wait+0x0/0x370 [ptlrpc 9.054484] [<ffffffffa13e3841>] ldlm_completion_ast+0x5e1/0x9b0 [ptlrpc] 9.054768] [<ffffffff8105de00>] ? default_wake_function+0x0/0x20 9.055087] [<ffffffffa13e2c8e>] ldlm_cli_enqueue_local+0x21e/0x7f0 [ptlrpc] 9.055394] [<ffffffffa13e3260>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] 9.055697] [<ffffffffa056bbc0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] 9.056558] [<ffffffffa05745fb>] mdt_object_local_lock+0x1bb/0xa80 [mdt] 9.056893] [<ffffffffa056bbc0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] 9.057247] [<ffffffffa13e3260>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] 9.057488] [<ffffffffa0575245>] mdt_object_lock_internal+0x65/0x360 [mdt] 9.057687] [<ffffffffa0575604>] mdt_object_lock+0x14/0x20 [mdt] 9.057949] [<ffffffffa05806fc>] mdt_getattr_name_lock+0x103c/0x1ab0 [mdt] 9.058223] [<ffffffff8128863a>] ? strlcpy+0x4a/0x60 9.058530] [<ffffffffa140ff84>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc] 9.058752] [<ffffffffa14116d0>] ? lustre_swab_ldlm_reply+0x0/0x40 [ptlrpc] 9.058958] [<ffffffffa0581692>] mdt_intent_getattr+0x292/0x470 [mdt] 9.059174] [<ffffffffa056e064>] mdt_intent_policy+0x494/0xce0 [mdt] 9.059398] [<ffffffffa13c305f>] ldlm_lock_enqueue+0x12f/0x950 [ptlrpc] 9.059629] [<ffffffffa10b9201>] ? cfs_hash_for_each_enter+0x1/0xa0 [libcfs] 9.059858] [<ffffffffa13eedeb>] ldlm_handle_enqueue0+0x51b/0x13e0 [ptlrpc] 9.060079] [<ffffffffa146dc72>] tgt_enqueue+0x62/0x1d0 [ptlrpc] 9.060286] [<ffffffffa146e85e>] tgt_request_handle+0x8be/0x1000 [ptlrpc] 9.060501] [<ffffffffa141fd64>] ptlrpc_main+0xdf4/0x1940 [ptlrpc] 9.060710] [<ffffffffa141ef70>] ? ptlrpc_main+0x0/0x1940 [ptlrpc] 9.060914] [<ffffffff81098c06>] kthread+0x96/0xa0 9.061085] [<ffffffff8100c24a>] child_rip+0xa/0x20 9.061275] [<ffffffff81098b70>] ? kthread+0x0/0xa0 9.061443] [<ffffffff8100c240>] ? child_rip+0x0/0x20 |
| Comment by Andreas Dilger [ 06/Jan/15 ] |
|
Dropping this from Blocker to Critical, since it is not a new issue for 2.7.0 (it exists since migrate was added in 2.4.0), and only affects a subset of users of the migrate functionality, and not anyone else. |
| Comment by Jinshan Xiong (Inactive) [ 06/Jan/15 ] |
|
I'm investigating this issue. |
| Comment by Jinshan Xiong (Inactive) [ 12/Jan/15 ] |
|
please apply http://review.whamcloud.com/13344 to your tree. It worked well after that patch was applied in my test. |
| Comment by Frank Zago (Inactive) [ 16/Jan/15 ] |
|
Patch that adds some tests for the new API: http://review.whamcloud.com/13441/ |
| Comment by Oleg Drokin [ 20/Jan/15 ] |
|
for the record: using Jinshan's patch did not help all that much and I was still seeing deadlocks on mdt |
| Comment by Jinshan Xiong (Inactive) [ 20/Jan/15 ] |
|
I couldn't reproduce the deadlock problem on MDT. Please collect a core dump when you see the deadlock issue again. |
| Comment by Andreas Dilger [ 03/Feb/15 ] |
|
Frank, Henri, Jinshan, Could you please confirm that the current patch has resolved the deadlock in your testing? It may be that Oleg is hitting a second issue that is not directly related. The second question is whether you are currently running with this patch in your other testing and can confirm that it doesn't introduce other problems? |
| Comment by Jinshan Xiong (Inactive) [ 03/Feb/15 ] |
|
1. I can confirm that the original problem is fixed by this patch; |
| Comment by Henri Doreau (Inactive) [ 04/Feb/15 ] |
|
Hello, same here, I have not been able to reproduce the issues aforementioned... I can try investigating from crash dumps if Oleg has any that he can share, though it's indeed harder than with a reproducer. This patch introduces no regression I'm aware of, and it fixes the original problem. If its size/complexity makes it "unlandable" I can try to split it into two subpatches (one fixing the blocking mode, another one adding the non-blocking mode). I was actually not expecting it to grow that much Edit: I have just triggered deadlocks with racer, as descibed by Oleg, with all rpms installed, not simply running from the source tree. There was concurrent activity going on, but I'll try to reproduce in a non-disturbed environment. |
| Comment by Jinshan Xiong (Inactive) [ 04/Feb/15 ] |
|
Henri, that's great. Please let me know if you need any help from me. |
| Comment by Oleg Drokin [ 04/Feb/15 ] |
|
So in order to move things forward, and based on my understanding that this patch actually helps some user-ereported problems (and also on the assumption thatthe racer problems are now possibly a separate bug)- let's split the patch into two parts: the actual fix and the racer test patch. We can then land the code fix (if it otherwise does not introduce any more failures) and we can wait with the racer test patch until we better understand why it fails and fix those. I am a bit nervous about landing tests that tend to fail as that invalidates our testing strategy quite a bit though. |
| Comment by Gerrit Updater [ 06/Feb/15 ] |
|
Henri Doreau (henri.doreau@cea.fr) uploaded a new patch: http://review.whamcloud.com/13669 |
| Comment by Henri Doreau (Inactive) [ 06/Feb/15 ] |
|
Racer patch: http://review.whamcloud.com/#/c/13669 |
| Comment by Gerrit Updater [ 28/May/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/10013/ |
| Comment by Sebastien Buisson (Inactive) [ 04/Jun/15 ] |
|
Hi, Now that patch at http://review.whamcloud.com/10013 has been merged into master, is it possible to have a backport to b2_5? Thanks, |
| Comment by Andreas Dilger [ 01/Sep/15 ] |
|
Closing |
| Comment by Jean-Baptiste Riaux (Inactive) [ 26/Jul/16 ] |
|
Backport to b2_7_fe http://review.whamcloud.com/#/c/21513/ |