[LU-4840] Deadlock when truncating file during lfs migrate - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.8.0
Affects Version/s: Lustre 2.4.2
Labels:
- cea

Severity:
3
Rank (Obsolete):
13336

Description

While migrating a file with "lfs migrate", if a process tries to truncate the file, both lfs migrate and truncating processes will deadlock.

This will result in both processes never finishing (unless it is killed) and watchdog messages saying that the processes did not progress for the last XXX seconds.

Here is a reproducer:

[root@lustre24cli ~]# cat reproducer.sh
#!/bin/sh

FS=/test
FILE=${FS}/file

rm -f ${FILE}
# Create a file on OST 1 of size 512M
lfs setstripe -o 1 -c 1 ${FILE}
dd if=/dev/zero of=${FILE} bs=1M count=512

echo 3 > /proc/sys/vm/drop_caches

# Launch a migrate to OST 0 and a bit later open it for write
lfs migrate -i 0 --block ${FILE} &
sleep 2
dd if=/dev/zero of=${FILE} bs=1M count=512

Once the last dd tries to open the file, both lfs and dd processes stay forever with this stack:

lfs stack:

[<ffffffff8128e864>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffffa08d98dd>] ll_file_io_generic+0x29d/0x600 [lustre]
[<ffffffffa08d9d7f>] ll_file_aio_read+0x13f/0x2c0 [lustre]
[<ffffffffa08da61c>] ll_file_read+0x16c/0x2a0 [lustre]
[<ffffffff811896b5>] vfs_read+0xb5/0x1a0
[<ffffffff811897f1>] sys_read+0x51/0x90
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

dd stack:

[<ffffffffa03436fe>] cfs_waitq_wait+0xe/0x10 [libcfs]
[<ffffffffa04779fa>] cl_lock_state_wait+0x1aa/0x320 [obdclass]
[<ffffffffa04781eb>] cl_enqueue_locked+0x15b/0x1f0 [obdclass]
[<ffffffffa0478d6e>] cl_lock_request+0x7e/0x270 [obdclass]
[<ffffffffa047e00c>] cl_io_lock+0x3cc/0x560 [obdclass]
[<ffffffffa047e242>] cl_io_loop+0xa2/0x1b0 [obdclass]
[<ffffffffa092a8c8>] cl_setattr_ost+0x208/0x2c0 [lustre]
[<ffffffffa08f8a0e>] ll_setattr_raw+0x9ce/0x1000 [lustre]
[<ffffffffa08f909b>] ll_setattr+0x5b/0xf0 [lustre]
[<ffffffff811a7348>] notify_change+0x168/0x340
[<ffffffff81187074>] do_truncate+0x64/0xa0
[<ffffffff8119bcc1>] do_filp_open+0x861/0xd20
[<ffffffff81185d39>] do_sys_open+0x69/0x140
[<ffffffff81185e50>] sys_open+0x20/0x30
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

migration.png
15 kB
22/Apr/14 12:20 AM

Issue Links

is related to

LU-6785 Interop 2.7.0<->master sanity test_56w: cannot swap layouts: Device or resource busy

Resolved

LU-5915 racer test_1: FAIL: test_1 failed with 4

Resolved

LU-7073 racer with OST object migration hangs on cleanup

Resolved

is related to

LU-6903 racer file migration crash ASSERTION( lov->lo_type == LLT_RAID0 )

Resolved

Activity

[LU-4840] Deadlock when truncating file during lfs migrate

Henri Doreau (Inactive) added a comment - 22/Apr/14 8:44 AM

Thanks Jinshan.

I still have a question regarding when to check data version. In which case could it fail if we both get and check it under the file lease? Also swap layout will perform a dataversion check. Is the following sequence flawed?

open source file
get lease
open volatile
get data version
copy file content
put lease
swap layout (pass in data version for check)
close volatile
close source file

Henri Doreau (Inactive) added a comment - 22/Apr/14 8:44 AM Thanks Jinshan. I still have a question regarding when to check data version. In which case could it fail if we both get and check it under the file lease? Also swap layout will perform a dataversion check. Is the following sequence flawed? open source file get lease open volatile get data version copy file content put lease swap layout (pass in data version for check) close volatile close source file

Jinshan Xiong (Inactive) added a comment - 22/Apr/14 12:23 AM

Please check the attachment for the implementation of migration.

The procedure is a little bit like HSM release where close and swap layout should be an atomic operation. Also, you need to check if the lease is valid in the middle of data copying periodically therefore data copying can abort if the file is being opened by others.

Please take a look at it and I'll be happy to answer questions.

Jinshan Xiong (Inactive) added a comment - 22/Apr/14 12:23 AM Please check the attachment for the implementation of migration. The procedure is a little bit like HSM release where close and swap layout should be an atomic operation. Also, you need to check if the lease is valid in the middle of data copying periodically therefore data copying can abort if the file is being opened by others. Please take a look at it and I'll be happy to answer questions.

Jinshan Xiong (Inactive) added a comment - 22/Apr/14 12:20 AM

Use file lease to implement migration.

Jinshan Xiong (Inactive) added a comment - 22/Apr/14 12:20 AM Use file lease to implement migration.

Henri Doreau (Inactive) added a comment - 18/Apr/14 2:38 PM

Here's a patch aimed to solve it:

http://review.whamcloud.com/10013

Henri Doreau (Inactive) added a comment - 18/Apr/14 2:38 PM Here's a patch aimed to solve it: http://review.whamcloud.com/10013

Aurelien Degremont (Inactive) added a comment - 17/Apr/14 2:40 PM

OK, CEA will do it.
Could you confirm some behaviour of open lease to be sure we are in line. There is only exclusive open lease for now? When lease are revoked by concurrent access?

Aurelien Degremont (Inactive) added a comment - 17/Apr/14 2:40 PM OK, CEA will do it. Could you confirm some behaviour of open lease to be sure we are in line. There is only exclusive open lease for now? When lease are revoked by concurrent access?

Jinshan Xiong (Inactive) added a comment - 10/Apr/14 1:36 PM

That's true, Aurelien. As long as the migration was implemented by CEA, would it be possible for CEA to pick it up again to reimplement it with open lease?

Jinshan

Jinshan Xiong (Inactive) added a comment - 10/Apr/14 1:36 PM That's true, Aurelien. As long as the migration was implemented by CEA, would it be possible for CEA to pick it up again to reimplement it with open lease? Jinshan

Aurelien Degremont (Inactive) added a comment - 09/Apr/14 8:17 PM

I see no objection to replacing grouplock-based code with an open-lease mechanism. Current code was using this lock because open lease did not exist as that time and we were advised to do this this way as they was no mechanism to protect from concurrent access. I think it will be cleaner.

Aurelien Degremont (Inactive) added a comment - 09/Apr/14 8:17 PM I see no objection to replacing grouplock-based code with an open-lease mechanism. Current code was using this lock because open lease did not exist as that time and we were advised to do this this way as they was no mechanism to protect from concurrent access. I think it will be cleaner.

Jinshan Xiong (Inactive) added a comment - 03/Apr/14 5:08 AM - edited

It's a good chance to reimplement migration with open lease.

Aurelien and JC, do you have any inputs on this?

Jinshan Xiong (Inactive) added a comment - 03/Apr/14 5:08 AM - edited It's a good chance to reimplement migration with open lease. Aurelien and JC, do you have any inputs on this?

Zhenyu Xu added a comment - 03/Apr/14 3:59 AM

It relates to the lfs migrate implementation (~~LU-2445~~), lfs migrate first takes a group lock to limite concurrent OST access from other clients, then vfs reads and writes the file to migrate data from OSTs to OSTs, at last drops the group lock.

In this case, lfs migrate gets the group lock, then the other client from the same node tries to truncate the file which takes the inode truncate semaphore (lli_trunc_sem) and enqueues OST lock and waits for it to be granted.

The lfs migrate comes to the read phase, it tries get the truncate semaphore as well.

The other client cannot get its OST lock granted, since OST cannot revoke it from the lfs migrate process. Deadlock happens.

Zhenyu Xu added a comment - 03/Apr/14 3:59 AM It relates to the lfs migrate implementation ( LU-2445 ), lfs migrate first takes a group lock to limite concurrent OST access from other clients, then vfs reads and writes the file to migrate data from OSTs to OSTs, at last drops the group lock. In this case, lfs migrate gets the group lock, then the other client from the same node tries to truncate the file which takes the inode truncate semaphore (lli_trunc_sem) and enqueues OST lock and waits for it to be granted. The lfs migrate comes to the read phase, it tries get the truncate semaphore as well. The other client cannot get its OST lock granted, since OST cannot revoke it from the lfs migrate process. Deadlock happens.

Bob Glossman (Inactive) added a comment - 01/Apr/14 5:52 PM

I can also reproduce it on master. The exact details of the lfs and dd task stacks are a little different but the deadlock is still there.

Bob Glossman (Inactive) added a comment - 01/Apr/14 5:52 PM I can also reproduce it on master. The exact details of the lfs and dd task stacks are a little different but the deadlock is still there.

Oleg Drokin added a comment - 01/Apr/14 5:30 PM

So I tried the script on master and it also happens there.

Oleg Drokin added a comment - 01/Apr/14 5:30 PM So I tried the script on master and it also happens there.

People

Assignee:: Zhenyu Xu

Reporter:: Patrick Valentin (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 20 Start watching this issue

Dates

Created:: 31/Mar/14 1:41 PM

Updated:: 09/Oct/16 1:52 PM

Resolved:: 14/Sep/15 5:32 PM