[LU-4840] Deadlock when truncating file during lfs migrate - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.8.0
Affects Version/s: Lustre 2.4.2
Labels:
- cea

Severity:
3
Rank (Obsolete):
13336

Description

While migrating a file with "lfs migrate", if a process tries to truncate the file, both lfs migrate and truncating processes will deadlock.

This will result in both processes never finishing (unless it is killed) and watchdog messages saying that the processes did not progress for the last XXX seconds.

Here is a reproducer:

[root@lustre24cli ~]# cat reproducer.sh
#!/bin/sh

FS=/test
FILE=${FS}/file

rm -f ${FILE}
# Create a file on OST 1 of size 512M
lfs setstripe -o 1 -c 1 ${FILE}
dd if=/dev/zero of=${FILE} bs=1M count=512

echo 3 > /proc/sys/vm/drop_caches

# Launch a migrate to OST 0 and a bit later open it for write
lfs migrate -i 0 --block ${FILE} &
sleep 2
dd if=/dev/zero of=${FILE} bs=1M count=512

Once the last dd tries to open the file, both lfs and dd processes stay forever with this stack:

lfs stack:

[<ffffffff8128e864>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffffa08d98dd>] ll_file_io_generic+0x29d/0x600 [lustre]
[<ffffffffa08d9d7f>] ll_file_aio_read+0x13f/0x2c0 [lustre]
[<ffffffffa08da61c>] ll_file_read+0x16c/0x2a0 [lustre]
[<ffffffff811896b5>] vfs_read+0xb5/0x1a0
[<ffffffff811897f1>] sys_read+0x51/0x90
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

dd stack:

[<ffffffffa03436fe>] cfs_waitq_wait+0xe/0x10 [libcfs]
[<ffffffffa04779fa>] cl_lock_state_wait+0x1aa/0x320 [obdclass]
[<ffffffffa04781eb>] cl_enqueue_locked+0x15b/0x1f0 [obdclass]
[<ffffffffa0478d6e>] cl_lock_request+0x7e/0x270 [obdclass]
[<ffffffffa047e00c>] cl_io_lock+0x3cc/0x560 [obdclass]
[<ffffffffa047e242>] cl_io_loop+0xa2/0x1b0 [obdclass]
[<ffffffffa092a8c8>] cl_setattr_ost+0x208/0x2c0 [lustre]
[<ffffffffa08f8a0e>] ll_setattr_raw+0x9ce/0x1000 [lustre]
[<ffffffffa08f909b>] ll_setattr+0x5b/0xf0 [lustre]
[<ffffffff811a7348>] notify_change+0x168/0x340
[<ffffffff81187074>] do_truncate+0x64/0xa0
[<ffffffff8119bcc1>] do_filp_open+0x861/0xd20
[<ffffffff81185d39>] do_sys_open+0x69/0x140
[<ffffffff81185e50>] sys_open+0x20/0x30
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

migration.png
15 kB
22/Apr/14 12:20 AM

Issue Links

is related to

LU-6785 Interop 2.7.0<->master sanity test_56w: cannot swap layouts: Device or resource busy

Resolved

LU-5915 racer test_1: FAIL: test_1 failed with 4

Resolved

LU-7073 racer with OST object migration hangs on cleanup

Resolved

is related to

LU-6903 racer file migration crash ASSERTION( lov->lo_type == LLT_RAID0 )

Resolved

Activity

People

Assignee:: Zhenyu Xu

Reporter:: Patrick Valentin (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 20 Start watching this issue

Dates

Created:: 31/Mar/14 1:41 PM

Updated:: 09/Oct/16 1:52 PM

Resolved:: 14/Sep/15 5:32 PM