[LU-3601] HSM release causes running restore to hang, hangs itself - Whamcloud Community JIRA

Details

Type: Technical task
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1
Affects Version/s: Lustre 2.5.0
Labels:
- HSM

Rank (Obsolete):
9136

Description

Running the HSM stack as of July 15 2013, I see a hang when a release is issued while a restore is still running. To reproduce I run the following:

#!/bin/bash

export MOUNT_2=n
export MDSCOUNT=1
export PTLDEBUG="super inode ioctl warning dlmtrace error emerg ha rpctrace vfstrace config console"
export DEBUG_SIZE=512

hsm_root=/tmp/hsm_root

rm -rf $hsm_root
mkdir $hsm_root

llmount.sh

lctl conf_param lustre-MDT0000.mdt.hsm_control=enabled
# lctl conf_param lustre-MDT0001.mdt.hsm_control=enabled
sleep 10
lhsmtool_posix --verbose --hsm_root=$hsm_root --bandwidth 1 lustre

lctl dk > ~/hsm-0-mount.dk

set -x
cd /mnt/lustre
lfs setstripe -c2 f0
dd if=/dev/urandom of=f0 bs=1M count=100
lctl dk > ~/hsm-1-dd.dk

lfs hsm_archive f0
sleep 10
echo > /proc/fs/lustre/ldlm/dump_namespaces
lctl dk > ~/hsm-2-archive.dk

lfs hsm_release f0
echo > /proc/fs/lustre/ldlm/dump_namespaces
lctl dk > ~/hsm-3-release.dk

lfs hsm_restore f0
echo > /proc/fs/lustre/ldlm/dump_namespaces
lctl dk > ~/hsm-4-restore.dk

lfs hsm_release f0

with the last command never returning. The MDS_CLOSE handler looks like

10070
[<ffffffffa0f9866e>] cfs_waitq_wait+0xe/0x10 [libcfs]
[<ffffffffa124826a>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
[<ffffffffa1247920>] ldlm_cli_enqueue_local+0x1f0/0x5c0 [ptlrpc]
[<ffffffffa08cee3b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
[<ffffffffa08cf6b4>] mdt_object_lock+0x14/0x20 [mdt]
[<ffffffffa08f9551>] mdt_mfd_close+0x351/0xde0 [mdt]
[<ffffffffa08fb372>] mdt_close+0x662/0xa60 [mdt]
[<ffffffffa08d2c07>] mdt_handle_common+0x647/0x16d0 [mdt]
[<ffffffffa090c9e5>] mds_readpage_handle+0x15/0x20 [mdt]
[<ffffffffa12813d8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
[<ffffffffa128275d>] ptlrpc_main+0xabd/0x1700 [ptlrpc]
[<ffffffff81096936>] kthread+0x96/0xa0
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffffffffffff>] 0xffffffffffffffff

while the MDS_HSM_PROGRESS handler looks like:

10065
[<ffffffffa0f9866e>] cfs_waitq_wait+0xe/0x10 [libcfs]
[<ffffffffa124826a>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
[<ffffffffa1247920>] ldlm_cli_enqueue_local+0x1f0/0x5c0 [ptlrpc]
[<ffffffffa08cee3b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
[<ffffffffa08cf6b4>] mdt_object_lock+0x14/0x20 [mdt]
[<ffffffffa08cf721>] mdt_object_find_lock+0x61/0x170 [mdt]
[<ffffffffa091dc22>] hsm_get_md_attr+0x62/0x270 [mdt]
[<ffffffffa0923253>] mdt_hsm_update_request_state+0x4d3/0x1c20 [mdt]
[<ffffffffa091ae6e>] mdt_hsm_coordinator_update+0x3e/0xe0 [mdt]
[<ffffffffa090931b>] mdt_hsm_progress+0x21b/0x330 [mdt]
[<ffffffffa08d2c07>] mdt_handle_common+0x647/0x16d0 [mdt]
[<ffffffffa090ca05>] mds_regular_handle+0x15/0x20 [mdt]
[<ffffffffa12813d8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
[<ffffffffa128275d>] ptlrpc_main+0xabd/0x1700 [ptlrpc]
[<ffffffff81096936>] kthread+0x96/0xa0
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffffffffffff>] 0xffffffffffffffff

The close handler is waiting on an EX layout lock on f0. While the
progress handler is waiting on PW update lock on f0. dump_namespaces does not show that the UPDATE lock is granted.

For reference I'm using the following changes:

# LU-2919 hsm: Implementation of exclusive open
# http://review.whamcloud.com/#/c/6730
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/30/6730/13 && git cherry-pick FETCH_HEAD
 
# LU-1333 hsm: Add hsm_release feature.
# http://review.whamcloud.com/#/c/6526
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/26/6526/9 && git cherry-pick FETCH_HEAD
 
# LU-3339 mdt: HSM on disk actions record
# http://review.whamcloud.com/#/c/6529
# MERGED
 
# LU-3340 mdt: HSM memory requests management
# http://review.whamcloud.com/#/c/6530
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/30/6530/8 && git cherry-pick FETCH_HEAD
 
# LU-3341 mdt: HSM coordinator client interface
# http://review.whamcloud.com/#/c/6532
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/32/6532/13 && git cherry-pick FETCH_HEAD
# Needs rebase in sanity-hsm.sh
 
# LU-3342 mdt: HSM coordinator agent interface
# http://review.whamcloud.com/#/c/6534
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/34/6534/8 && git cherry-pick FETCH_HEAD
 
# LU-3343 mdt: HSM coordinator main thread
# http://review.whamcloud.com/#/c/6912
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/12/6912/3 && git cherry-pick FETCH_HEAD
# lustre/mdt/mdt_internal.h
 
# LU-3561 tests: HSM sanity test suite
# http://review.whamcloud.com/#/c/6913/
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/13/6913/4 && git cherry-pick FETCH_HEAD
# lustre/tests/sanity-hsm.sh
 
# LU-3432 llite: Access to released file trigs a restore
# http://review.whamcloud.com/#/c/6537
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/37/6537/11 && git cherry-pick FETCH_HEAD
 
# LU-3363 api: HSM import uses new released pattern
# http://review.whamcloud.com/#/c/6536
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/36/6536/8 && git cherry-pick FETCH_HEAD
 
# LU-2062 utils: HSM Posix CopyTool
# http://review.whamcloud.com/#/c/4737
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/37/4737/18 && git cherry-pick FETCH_HEAD

Attachments

Issue Links

duplicates

LU-4152 layout locks can cause deadlock

Resolved

is related to

LU-3608 HSM Master Ticket for 2.5 Landings

Resolved

is related to

LU-4152 layout locks can cause deadlock

Resolved

Activity

[LU-3601] HSM release causes running restore to hang, hangs itself

Jodi Levi (Inactive) added a comment - 21/Oct/13 8:15 PM

Should Change, 7148 be landed or abandoned?

Jodi Levi (Inactive) added a comment - 21/Oct/13 8:15 PM Should Change, 7148 be landed or abandoned?

jacques-charles lafoucriere added a comment - 04/Aug/13 10:08 AM

sanity-hsm #33 hits the same bug, but was not designed to test concurrent access to file during the restore phase. We also today do no test rename/rm during restore.

jacques-charles lafoucriere added a comment - 04/Aug/13 10:08 AM sanity-hsm #33 hits the same bug, but was not designed to test concurrent access to file during the restore phase. We also today do no test rename/rm during restore.

Aurelien Degremont (Inactive) added a comment - 03/Aug/13 9:00 AM

We already have such test. sanity-hsm #33 deadlock was hitting this bug. John's patch was fixing hit. I will confirm that the latest coordinator, without John's patch do not trig this deadlock anymore on monday, but I'm confident.

Aurelien Degremont (Inactive) added a comment - 03/Aug/13 9:00 AM We already have such test. sanity-hsm #33 deadlock was hitting this bug. John's patch was fixing hit. I will confirm that the latest coordinator, without John's patch do not trig this deadlock anymore on monday, but I'm confident.

jacques-charles lafoucriere added a comment - 02/Aug/13 10:16 PM

We will add sanity-hsm tests for the 2 simple use cases. Will be safer for futures changes.

jacques-charles lafoucriere added a comment - 02/Aug/13 10:16 PM We will add sanity-hsm tests for the 2 simple use cases. Will be safer for futures changes.

John Hammond added a comment - 01/Aug/13 4:10 PM

Since the removal of UPDATE lock use from the coordinator, I can no longer reproduce these issues.

John Hammond added a comment - 01/Aug/13 4:10 PM Since the removal of UPDATE lock use from the coordinator, I can no longer reproduce these issues.

John Hammond added a comment - 30/Jul/13 10:35 PM - edited

A similar hang can be triggered by trying to read a file while a restore is still running. To see this add --bandwidth=1 to the copytool options and do:

# cd /mnt/lustre
# dd if=/dev/urandom of=f0 bs=1M count=10
# lfs hsm_archive f0
# # Wait for archive to complete.
# sleep 15
# lfs hsm_release f0
# lfs hsm_restore f0
# cat f0 > /dev/null

This is addresses by the http://review.whamcloud.com/#/c/7148/.

However even with the latest version (patch set 9) of http://review.whamcloud.com/#/c/6912/ we have an easily exploited race between restore and rename which is not addressed by the change in 7148. Rename onto during restore will hang:

cd /mnt/lustre
dd if=/dev/urandom of=f0 bs=1M count=10
lfs hsm_archive f0
# Wait for archive to complete.
sleep 15
lfs hsm_state f0
lfs hsm_release f0
lfs hsm_restore f0; touch f1; sys_rename f1 f0

Since this rename takes MDS_INODELOCK_FULL on f0, I doubt that the choice of using LAYOUT, UPDATE, or other in hsm_get_md_attr() matters very much. But I could be wrong.

John Hammond added a comment - 30/Jul/13 10:35 PM - edited A similar hang can be triggered by trying to read a file while a restore is still running. To see this add --bandwidth=1 to the copytool options and do: # cd /mnt/lustre # dd if=/dev/urandom of=f0 bs=1M count=10 # lfs hsm_archive f0 # # Wait for archive to complete. # sleep 15 # lfs hsm_release f0 # lfs hsm_restore f0 # cat f0 > /dev/null This is addresses by the http://review.whamcloud.com/#/c/7148/ . However even with the latest version (patch set 9) of http://review.whamcloud.com/#/c/6912/ we have an easily exploited race between restore and rename which is not addressed by the change in 7148. Rename onto during restore will hang: cd /mnt/lustre dd if=/dev/urandom of=f0 bs=1M count=10 lfs hsm_archive f0 # Wait for archive to complete. sleep 15 lfs hsm_state f0 lfs hsm_release f0 lfs hsm_restore f0; touch f1; sys_rename f1 f0 Since this rename takes MDS_INODELOCK_FULL on f0, I doubt that the choice of using LAYOUT, UPDATE, or other in hsm_get_md_attr() matters very much. But I could be wrong.

John Hammond added a comment - 27/Jul/13 2:56 PM

Please see http://review.whamcloud.com/7148 for the LDLM patch we discussed.

John Hammond added a comment - 27/Jul/13 2:56 PM Please see http://review.whamcloud.com/7148 for the LDLM patch we discussed.

Jinshan Xiong (Inactive) added a comment - 26/Jul/13 9:31 PM

I will fix the lock issue above.

The close sounds like a real issue here, we shouldn't block close REQ to finish. Let's use try version of mdt_object_lock() in close.

Jinshan Xiong (Inactive) added a comment - 26/Jul/13 9:31 PM I will fix the lock issue above. The close sounds like a real issue here, we shouldn't block close REQ to finish. Let's use try version of mdt_object_lock() in close.

John Hammond added a comment - 26/Jul/13 7:56 PM - edited

Another issue here is that it may be unsafe to access the mount point being used by the copytool. Especially to perform manual HSM requests, since the MDC's cl_close_lock will prevent multiple concurrent closes. In particular we can have a releasing close block (on EX LAYOUT) because a restore is running, which prevents the restore from being completed, because any close will block on cl_close_lock.

John Hammond added a comment - 26/Jul/13 7:56 PM - edited Another issue here is that it may be unsafe to access the mount point being used by the copytool. Especially to perform manual HSM requests, since the MDC's cl_close_lock will prevent multiple concurrent closes. In particular we can have a releasing close block (on EX LAYOUT) because a restore is running, which prevents the restore from being completed, because any close will block on cl_close_lock.

John Hammond added a comment - 26/Jul/13 4:01 PM

Here is a simpler situation where we can get stuck. (It is also more likely to occur.) Consider the following release vs open race. Assume the file F has already been archived.

Client R starts HSM release on file F.
In lfs_hsm_request, R stats F, the MDT returns a PR LOOKUP,UPDATE,LAYOUT,PERM lock on F.
In lfs_hsm_request, R opens F for path2fid, the MDT returns a CR LOOKUP,LAYOUT lock on F.
In ll_hsm_release/ll_lease_open, R leases F, the MDT returns an EX OPEN lock on F.
Client W tries to open F with MDS_OPEN_LOCK set, the MDT adds a CW OPEN lock to the waiting list.
In ll_hsm_release, client R closes F.
In mdt_hsm_release, the MDT requests a local EX LAYOUT on F. This conflicts with the PR and CR locks already held by R, the server sends blocking ASTs to R for these locks.
The MDT reprocesses the waiting queue for F. Granted list contains the EX OPEN lock. The waiting list contains the CW OPEN, followed by the EX LAYOUT.
As responses to the blocking ASTs come in the F is reprocessed but since there is a blocked CW OPEN lock at the head of the waiting list, the following locks (including the EX LAYOUT) are not considered.
The EX OPEN lock times out and client R is evicted.

John Hammond added a comment - 26/Jul/13 4:01 PM Here is a simpler situation where we can get stuck. (It is also more likely to occur.) Consider the following release vs open race. Assume the file F has already been archived. Client R starts HSM release on file F. In lfs_hsm_request, R stats F, the MDT returns a PR LOOKUP,UPDATE,LAYOUT,PERM lock on F. In lfs_hsm_request, R opens F for path2fid, the MDT returns a CR LOOKUP,LAYOUT lock on F. In ll_hsm_release/ll_lease_open, R leases F, the MDT returns an EX OPEN lock on F. Client W tries to open F with MDS_OPEN_LOCK set, the MDT adds a CW OPEN lock to the waiting list. In ll_hsm_release, client R closes F. In mdt_hsm_release, the MDT requests a local EX LAYOUT on F. This conflicts with the PR and CR locks already held by R, the server sends blocking ASTs to R for these locks. The MDT reprocesses the waiting queue for F. Granted list contains the EX OPEN lock. The waiting list contains the CW OPEN, followed by the EX LAYOUT. As responses to the blocking ASTs come in the F is reprocessed but since there is a blocked CW OPEN lock at the head of the waiting list, the following locks (including the EX LAYOUT) are not considered. The EX OPEN lock times out and client R is evicted.

John Hammond added a comment - 17/Jul/13 9:43 PM

I believe that this situation exposes a limitation of LDLM for inodebits locks. All locks below are on f0.

Starting the restore takes EX LAYOUT lock on the server.
When the releasing close RPC is sent the client has a PR LOOKUP|UPDATE|PERM lock.
The release handler on the server blocks attempting to take an EX LAYOUT lock.
When restore complete, the update progress handler blocks attempting to take an PW UPDATE lock.
The client releases the PR LOOKUP|UPDATE|PERM lock.
The resource (f0) gets reprocessed, but the first waiting lock (EX LAYOUT) cannot be granted, so ldlm_process_inodebits_lock() returns LDLM_ITER_STOP causing ldlm_reprocess_queue() to stop processing the resource. In particular it does not check that the PW UPDATE lock is compatible with all of the granted locks and all of the locks before it in the waiting list.

It also appears that the skip list optimizations in ldlm_inodebits_compat_queue() could be extended/improved by computing compatibility one mode-bits-bunch at a time and by granting locks in bunches.

John Hammond added a comment - 17/Jul/13 9:43 PM I believe that this situation exposes a limitation of LDLM for inodebits locks. All locks below are on f0. Starting the restore takes EX LAYOUT lock on the server. When the releasing close RPC is sent the client has a PR LOOKUP|UPDATE|PERM lock. The release handler on the server blocks attempting to take an EX LAYOUT lock. When restore complete, the update progress handler blocks attempting to take an PW UPDATE lock. The client releases the PR LOOKUP|UPDATE|PERM lock. The resource (f0) gets reprocessed, but the first waiting lock (EX LAYOUT) cannot be granted, so ldlm_process_inodebits_lock() returns LDLM_ITER_STOP causing ldlm_reprocess_queue() to stop processing the resource. In particular it does not check that the PW UPDATE lock is compatible with all of the granted locks and all of the locks before it in the waiting list. It also appears that the skip list optimizations in ldlm_inodebits_compat_queue() could be extended/improved by computing compatibility one mode-bits-bunch at a time and by granting locks in bunches.

People

Assignee:: Jinshan Xiong (Inactive)

Reporter:: John Hammond

Votes:: 0 Vote for this issue

Watchers:: 17 Start watching this issue

Dates

Created:: 16/Jul/13 10:34 PM

Updated:: 14/Feb/14 5:16 PM

Resolved:: 14/Feb/14 5:16 PM