[LU-3601] HSM release causes running restore to hang, hangs itself - Whamcloud Community JIRA

Details

Type: Technical task
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1
Affects Version/s: Lustre 2.5.0
Labels:
- HSM

Rank (Obsolete):
9136

Description

Running the HSM stack as of July 15 2013, I see a hang when a release is issued while a restore is still running. To reproduce I run the following:

#!/bin/bash

export MOUNT_2=n
export MDSCOUNT=1
export PTLDEBUG="super inode ioctl warning dlmtrace error emerg ha rpctrace vfstrace config console"
export DEBUG_SIZE=512

hsm_root=/tmp/hsm_root

rm -rf $hsm_root
mkdir $hsm_root

llmount.sh

lctl conf_param lustre-MDT0000.mdt.hsm_control=enabled
# lctl conf_param lustre-MDT0001.mdt.hsm_control=enabled
sleep 10
lhsmtool_posix --verbose --hsm_root=$hsm_root --bandwidth 1 lustre

lctl dk > ~/hsm-0-mount.dk

set -x
cd /mnt/lustre
lfs setstripe -c2 f0
dd if=/dev/urandom of=f0 bs=1M count=100
lctl dk > ~/hsm-1-dd.dk

lfs hsm_archive f0
sleep 10
echo > /proc/fs/lustre/ldlm/dump_namespaces
lctl dk > ~/hsm-2-archive.dk

lfs hsm_release f0
echo > /proc/fs/lustre/ldlm/dump_namespaces
lctl dk > ~/hsm-3-release.dk

lfs hsm_restore f0
echo > /proc/fs/lustre/ldlm/dump_namespaces
lctl dk > ~/hsm-4-restore.dk

lfs hsm_release f0

with the last command never returning. The MDS_CLOSE handler looks like

10070
[<ffffffffa0f9866e>] cfs_waitq_wait+0xe/0x10 [libcfs]
[<ffffffffa124826a>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
[<ffffffffa1247920>] ldlm_cli_enqueue_local+0x1f0/0x5c0 [ptlrpc]
[<ffffffffa08cee3b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
[<ffffffffa08cf6b4>] mdt_object_lock+0x14/0x20 [mdt]
[<ffffffffa08f9551>] mdt_mfd_close+0x351/0xde0 [mdt]
[<ffffffffa08fb372>] mdt_close+0x662/0xa60 [mdt]
[<ffffffffa08d2c07>] mdt_handle_common+0x647/0x16d0 [mdt]
[<ffffffffa090c9e5>] mds_readpage_handle+0x15/0x20 [mdt]
[<ffffffffa12813d8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
[<ffffffffa128275d>] ptlrpc_main+0xabd/0x1700 [ptlrpc]
[<ffffffff81096936>] kthread+0x96/0xa0
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffffffffffff>] 0xffffffffffffffff

while the MDS_HSM_PROGRESS handler looks like:

10065
[<ffffffffa0f9866e>] cfs_waitq_wait+0xe/0x10 [libcfs]
[<ffffffffa124826a>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
[<ffffffffa1247920>] ldlm_cli_enqueue_local+0x1f0/0x5c0 [ptlrpc]
[<ffffffffa08cee3b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
[<ffffffffa08cf6b4>] mdt_object_lock+0x14/0x20 [mdt]
[<ffffffffa08cf721>] mdt_object_find_lock+0x61/0x170 [mdt]
[<ffffffffa091dc22>] hsm_get_md_attr+0x62/0x270 [mdt]
[<ffffffffa0923253>] mdt_hsm_update_request_state+0x4d3/0x1c20 [mdt]
[<ffffffffa091ae6e>] mdt_hsm_coordinator_update+0x3e/0xe0 [mdt]
[<ffffffffa090931b>] mdt_hsm_progress+0x21b/0x330 [mdt]
[<ffffffffa08d2c07>] mdt_handle_common+0x647/0x16d0 [mdt]
[<ffffffffa090ca05>] mds_regular_handle+0x15/0x20 [mdt]
[<ffffffffa12813d8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
[<ffffffffa128275d>] ptlrpc_main+0xabd/0x1700 [ptlrpc]
[<ffffffff81096936>] kthread+0x96/0xa0
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffffffffffff>] 0xffffffffffffffff

The close handler is waiting on an EX layout lock on f0. While the
progress handler is waiting on PW update lock on f0. dump_namespaces does not show that the UPDATE lock is granted.

For reference I'm using the following changes:

# LU-2919 hsm: Implementation of exclusive open
# http://review.whamcloud.com/#/c/6730
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/30/6730/13 && git cherry-pick FETCH_HEAD
 
# LU-1333 hsm: Add hsm_release feature.
# http://review.whamcloud.com/#/c/6526
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/26/6526/9 && git cherry-pick FETCH_HEAD
 
# LU-3339 mdt: HSM on disk actions record
# http://review.whamcloud.com/#/c/6529
# MERGED
 
# LU-3340 mdt: HSM memory requests management
# http://review.whamcloud.com/#/c/6530
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/30/6530/8 && git cherry-pick FETCH_HEAD
 
# LU-3341 mdt: HSM coordinator client interface
# http://review.whamcloud.com/#/c/6532
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/32/6532/13 && git cherry-pick FETCH_HEAD
# Needs rebase in sanity-hsm.sh
 
# LU-3342 mdt: HSM coordinator agent interface
# http://review.whamcloud.com/#/c/6534
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/34/6534/8 && git cherry-pick FETCH_HEAD
 
# LU-3343 mdt: HSM coordinator main thread
# http://review.whamcloud.com/#/c/6912
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/12/6912/3 && git cherry-pick FETCH_HEAD
# lustre/mdt/mdt_internal.h
 
# LU-3561 tests: HSM sanity test suite
# http://review.whamcloud.com/#/c/6913/
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/13/6913/4 && git cherry-pick FETCH_HEAD
# lustre/tests/sanity-hsm.sh
 
# LU-3432 llite: Access to released file trigs a restore
# http://review.whamcloud.com/#/c/6537
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/37/6537/11 && git cherry-pick FETCH_HEAD
 
# LU-3363 api: HSM import uses new released pattern
# http://review.whamcloud.com/#/c/6536
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/36/6536/8 && git cherry-pick FETCH_HEAD
 
# LU-2062 utils: HSM Posix CopyTool
# http://review.whamcloud.com/#/c/4737
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/37/4737/18 && git cherry-pick FETCH_HEAD

Attachments

Issue Links

duplicates

LU-4152 layout locks can cause deadlock

Resolved

is related to

LU-3608 HSM Master Ticket for 2.5 Landings

Resolved

is related to

LU-4152 layout locks can cause deadlock

Resolved

Activity

[LU-3601] HSM release causes running restore to hang, hangs itself

Vitaly Fertman added a comment - 28/Oct/13 6:00 PM

Andreas, it was hit during testing.

process1.lock1: open|lookup, granted
process2.lock1: layout | XXX, granted
process3.lock1: lookup | XXX, waiting process1.lock1
process1.lock2: layout, waiting process2.lock1
process2.lock1: cancelled, reprocessing does not reach process1.lock2

process1 is open by fid
process3 is getattr

in other words, as 2 locks are taken not atomically, you must guarantee nobody can take a conflict for 1st lock in between. otherwise you need either:

full reprocess
reordering on waiting list
make these 2 enqueue atomic
take 1 common lock with all the ibits

btw, why the last option was not done originally ?

as it can deadlock without HSM, I would consider it as a blocker.

Vitaly Fertman added a comment - 28/Oct/13 6:00 PM Andreas, it was hit during testing. process1.lock1: open|lookup, granted process2.lock1: layout | XXX, granted process3.lock1: lookup | XXX, waiting process1.lock1 process1.lock2: layout, waiting process2.lock1 process2.lock1: cancelled, reprocessing does not reach process1.lock2 process1 is open by fid process3 is getattr in other words, as 2 locks are taken not atomically, you must guarantee nobody can take a conflict for 1st lock in between. otherwise you need either: full reprocess reordering on waiting list make these 2 enqueue atomic take 1 common lock with all the ibits btw, why the last option was not done originally ? as it can deadlock without HSM, I would consider it as a blocker.

Andreas Dilger added a comment - 28/Oct/13 4:28 PM

Andriy wrote in ~~LU-4152~~:

~~LU-1876~~ adds mdt_object_open_lock() which acquires lock in 2 steps for layout locks.
A deadlock is possible since it isn't atomic and ibits locks are reprocessed until first blocking lock found.

Such situation was hit with mdt_reint_open() & mdt_intent_getattr()

mdt_reint_open()->mdt_open_by_fid_lock() takes first part of the lock (ibits=5),
mdt_intent_getattr() tries to obtain lock (ibits=17)
mdt_open_by_fid_lock() tries to obtain second part but fails due to some conflict with another layout lock2. During cancellation of lock2 only getattr lock is reprocessed.
http://review.whamcloud.com/#/c/7148/1 can help, but it is better to fix mdt_open_by_fid_lock()

Andriy, was this problem actually hit during testing, or was this problem found by code inspection?

Andreas Dilger added a comment - 28/Oct/13 4:28 PM Andriy wrote in LU-4152 : LU-1876 adds mdt_object_open_lock() which acquires lock in 2 steps for layout locks. A deadlock is possible since it isn't atomic and ibits locks are reprocessed until first blocking lock found. Such situation was hit with mdt_reint_open() & mdt_intent_getattr() mdt_reint_open()->mdt_open_by_fid_lock() takes first part of the lock (ibits=5), mdt_intent_getattr() tries to obtain lock (ibits=17) mdt_open_by_fid_lock() tries to obtain second part but fails due to some conflict with another layout lock2. During cancellation of lock2 only getattr lock is reprocessed. http://review.whamcloud.com/#/c/7148/1 can help, but it is better to fix mdt_open_by_fid_lock() Andriy, was this problem actually hit during testing, or was this problem found by code inspection?

John Hammond added a comment - 21/Oct/13 9:27 PM

This issue was fixed for 2.5.0 and can be closed now.

John Hammond added a comment - 21/Oct/13 9:27 PM This issue was fixed for 2.5.0 and can be closed now.

John Hammond added a comment - 21/Oct/13 9:25 PM

Landed after being improved per comments on gerrit.

John Hammond added a comment - 21/Oct/13 9:25 PM Landed after being improved per comments on gerrit.

Jodi Levi (Inactive) added a comment - 21/Oct/13 8:15 PM

Should Change, 7148 be landed or abandoned?

Jodi Levi (Inactive) added a comment - 21/Oct/13 8:15 PM Should Change, 7148 be landed or abandoned?

jacques-charles lafoucriere added a comment - 04/Aug/13 10:08 AM

sanity-hsm #33 hits the same bug, but was not designed to test concurrent access to file during the restore phase. We also today do no test rename/rm during restore.

jacques-charles lafoucriere added a comment - 04/Aug/13 10:08 AM sanity-hsm #33 hits the same bug, but was not designed to test concurrent access to file during the restore phase. We also today do no test rename/rm during restore.

Aurelien Degremont (Inactive) added a comment - 03/Aug/13 9:00 AM

We already have such test. sanity-hsm #33 deadlock was hitting this bug. John's patch was fixing hit. I will confirm that the latest coordinator, without John's patch do not trig this deadlock anymore on monday, but I'm confident.

Aurelien Degremont (Inactive) added a comment - 03/Aug/13 9:00 AM We already have such test. sanity-hsm #33 deadlock was hitting this bug. John's patch was fixing hit. I will confirm that the latest coordinator, without John's patch do not trig this deadlock anymore on monday, but I'm confident.

jacques-charles lafoucriere added a comment - 02/Aug/13 10:16 PM

We will add sanity-hsm tests for the 2 simple use cases. Will be safer for futures changes.

jacques-charles lafoucriere added a comment - 02/Aug/13 10:16 PM We will add sanity-hsm tests for the 2 simple use cases. Will be safer for futures changes.

John Hammond added a comment - 01/Aug/13 4:10 PM

Since the removal of UPDATE lock use from the coordinator, I can no longer reproduce these issues.

John Hammond added a comment - 01/Aug/13 4:10 PM Since the removal of UPDATE lock use from the coordinator, I can no longer reproduce these issues.

John Hammond added a comment - 30/Jul/13 10:35 PM - edited

A similar hang can be triggered by trying to read a file while a restore is still running. To see this add --bandwidth=1 to the copytool options and do:

# cd /mnt/lustre
# dd if=/dev/urandom of=f0 bs=1M count=10
# lfs hsm_archive f0
# # Wait for archive to complete.
# sleep 15
# lfs hsm_release f0
# lfs hsm_restore f0
# cat f0 > /dev/null

This is addresses by the http://review.whamcloud.com/#/c/7148/.

However even with the latest version (patch set 9) of http://review.whamcloud.com/#/c/6912/ we have an easily exploited race between restore and rename which is not addressed by the change in 7148. Rename onto during restore will hang:

cd /mnt/lustre
dd if=/dev/urandom of=f0 bs=1M count=10
lfs hsm_archive f0
# Wait for archive to complete.
sleep 15
lfs hsm_state f0
lfs hsm_release f0
lfs hsm_restore f0; touch f1; sys_rename f1 f0

Since this rename takes MDS_INODELOCK_FULL on f0, I doubt that the choice of using LAYOUT, UPDATE, or other in hsm_get_md_attr() matters very much. But I could be wrong.

John Hammond added a comment - 30/Jul/13 10:35 PM - edited A similar hang can be triggered by trying to read a file while a restore is still running. To see this add --bandwidth=1 to the copytool options and do: # cd /mnt/lustre # dd if=/dev/urandom of=f0 bs=1M count=10 # lfs hsm_archive f0 # # Wait for archive to complete. # sleep 15 # lfs hsm_release f0 # lfs hsm_restore f0 # cat f0 > /dev/null This is addresses by the http://review.whamcloud.com/#/c/7148/ . However even with the latest version (patch set 9) of http://review.whamcloud.com/#/c/6912/ we have an easily exploited race between restore and rename which is not addressed by the change in 7148. Rename onto during restore will hang: cd /mnt/lustre dd if=/dev/urandom of=f0 bs=1M count=10 lfs hsm_archive f0 # Wait for archive to complete. sleep 15 lfs hsm_state f0 lfs hsm_release f0 lfs hsm_restore f0; touch f1; sys_rename f1 f0 Since this rename takes MDS_INODELOCK_FULL on f0, I doubt that the choice of using LAYOUT, UPDATE, or other in hsm_get_md_attr() matters very much. But I could be wrong.

John Hammond added a comment - 27/Jul/13 2:56 PM

Please see http://review.whamcloud.com/7148 for the LDLM patch we discussed.

John Hammond added a comment - 27/Jul/13 2:56 PM Please see http://review.whamcloud.com/7148 for the LDLM patch we discussed.

People

Assignee:: Jinshan Xiong (Inactive)

Reporter:: John Hammond

Votes:: 0 Vote for this issue

Watchers:: 17 Start watching this issue

Dates

Created:: 16/Jul/13 10:34 PM

Updated:: 14/Feb/14 5:16 PM

Resolved:: 14/Feb/14 5:16 PM