[LU-3601] HSM release causes running restore to hang, hangs itself - Whamcloud Community JIRA

Details

Type: Technical task
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1
Affects Version/s: Lustre 2.5.0
Labels:
- HSM

Rank (Obsolete):
9136

Description

Running the HSM stack as of July 15 2013, I see a hang when a release is issued while a restore is still running. To reproduce I run the following:

#!/bin/bash

export MOUNT_2=n
export MDSCOUNT=1
export PTLDEBUG="super inode ioctl warning dlmtrace error emerg ha rpctrace vfstrace config console"
export DEBUG_SIZE=512

hsm_root=/tmp/hsm_root

rm -rf $hsm_root
mkdir $hsm_root

llmount.sh

lctl conf_param lustre-MDT0000.mdt.hsm_control=enabled
# lctl conf_param lustre-MDT0001.mdt.hsm_control=enabled
sleep 10
lhsmtool_posix --verbose --hsm_root=$hsm_root --bandwidth 1 lustre

lctl dk > ~/hsm-0-mount.dk

set -x
cd /mnt/lustre
lfs setstripe -c2 f0
dd if=/dev/urandom of=f0 bs=1M count=100
lctl dk > ~/hsm-1-dd.dk

lfs hsm_archive f0
sleep 10
echo > /proc/fs/lustre/ldlm/dump_namespaces
lctl dk > ~/hsm-2-archive.dk

lfs hsm_release f0
echo > /proc/fs/lustre/ldlm/dump_namespaces
lctl dk > ~/hsm-3-release.dk

lfs hsm_restore f0
echo > /proc/fs/lustre/ldlm/dump_namespaces
lctl dk > ~/hsm-4-restore.dk

lfs hsm_release f0

with the last command never returning. The MDS_CLOSE handler looks like

10070
[<ffffffffa0f9866e>] cfs_waitq_wait+0xe/0x10 [libcfs]
[<ffffffffa124826a>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
[<ffffffffa1247920>] ldlm_cli_enqueue_local+0x1f0/0x5c0 [ptlrpc]
[<ffffffffa08cee3b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
[<ffffffffa08cf6b4>] mdt_object_lock+0x14/0x20 [mdt]
[<ffffffffa08f9551>] mdt_mfd_close+0x351/0xde0 [mdt]
[<ffffffffa08fb372>] mdt_close+0x662/0xa60 [mdt]
[<ffffffffa08d2c07>] mdt_handle_common+0x647/0x16d0 [mdt]
[<ffffffffa090c9e5>] mds_readpage_handle+0x15/0x20 [mdt]
[<ffffffffa12813d8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
[<ffffffffa128275d>] ptlrpc_main+0xabd/0x1700 [ptlrpc]
[<ffffffff81096936>] kthread+0x96/0xa0
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffffffffffff>] 0xffffffffffffffff

while the MDS_HSM_PROGRESS handler looks like:

10065
[<ffffffffa0f9866e>] cfs_waitq_wait+0xe/0x10 [libcfs]
[<ffffffffa124826a>] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
[<ffffffffa1247920>] ldlm_cli_enqueue_local+0x1f0/0x5c0 [ptlrpc]
[<ffffffffa08cee3b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
[<ffffffffa08cf6b4>] mdt_object_lock+0x14/0x20 [mdt]
[<ffffffffa08cf721>] mdt_object_find_lock+0x61/0x170 [mdt]
[<ffffffffa091dc22>] hsm_get_md_attr+0x62/0x270 [mdt]
[<ffffffffa0923253>] mdt_hsm_update_request_state+0x4d3/0x1c20 [mdt]
[<ffffffffa091ae6e>] mdt_hsm_coordinator_update+0x3e/0xe0 [mdt]
[<ffffffffa090931b>] mdt_hsm_progress+0x21b/0x330 [mdt]
[<ffffffffa08d2c07>] mdt_handle_common+0x647/0x16d0 [mdt]
[<ffffffffa090ca05>] mds_regular_handle+0x15/0x20 [mdt]
[<ffffffffa12813d8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
[<ffffffffa128275d>] ptlrpc_main+0xabd/0x1700 [ptlrpc]
[<ffffffff81096936>] kthread+0x96/0xa0
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffffffffffff>] 0xffffffffffffffff

The close handler is waiting on an EX layout lock on f0. While the
progress handler is waiting on PW update lock on f0. dump_namespaces does not show that the UPDATE lock is granted.

For reference I'm using the following changes:

# LU-2919 hsm: Implementation of exclusive open
# http://review.whamcloud.com/#/c/6730
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/30/6730/13 && git cherry-pick FETCH_HEAD
 
# LU-1333 hsm: Add hsm_release feature.
# http://review.whamcloud.com/#/c/6526
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/26/6526/9 && git cherry-pick FETCH_HEAD
 
# LU-3339 mdt: HSM on disk actions record
# http://review.whamcloud.com/#/c/6529
# MERGED
 
# LU-3340 mdt: HSM memory requests management
# http://review.whamcloud.com/#/c/6530
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/30/6530/8 && git cherry-pick FETCH_HEAD
 
# LU-3341 mdt: HSM coordinator client interface
# http://review.whamcloud.com/#/c/6532
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/32/6532/13 && git cherry-pick FETCH_HEAD
# Needs rebase in sanity-hsm.sh
 
# LU-3342 mdt: HSM coordinator agent interface
# http://review.whamcloud.com/#/c/6534
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/34/6534/8 && git cherry-pick FETCH_HEAD
 
# LU-3343 mdt: HSM coordinator main thread
# http://review.whamcloud.com/#/c/6912
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/12/6912/3 && git cherry-pick FETCH_HEAD
# lustre/mdt/mdt_internal.h
 
# LU-3561 tests: HSM sanity test suite
# http://review.whamcloud.com/#/c/6913/
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/13/6913/4 && git cherry-pick FETCH_HEAD
# lustre/tests/sanity-hsm.sh
 
# LU-3432 llite: Access to released file trigs a restore
# http://review.whamcloud.com/#/c/6537
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/37/6537/11 && git cherry-pick FETCH_HEAD
 
# LU-3363 api: HSM import uses new released pattern
# http://review.whamcloud.com/#/c/6536
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/36/6536/8 && git cherry-pick FETCH_HEAD
 
# LU-2062 utils: HSM Posix CopyTool
# http://review.whamcloud.com/#/c/4737
git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/37/4737/18 && git cherry-pick FETCH_HEAD

Attachments

Issue Links

duplicates

LU-4152 layout locks can cause deadlock

Resolved

is related to

LU-3608 HSM Master Ticket for 2.5 Landings

Resolved

is related to

LU-4152 layout locks can cause deadlock

Resolved

Activity

[LU-3601] HSM release causes running restore to hang, hangs itself

Andreas Dilger added a comment - 10/Feb/14 9:40 PM

Patch http://review.whamcloud.com/8084 was landed under this bug, but is not reported here.

Andreas Dilger added a comment - 10/Feb/14 9:40 PM Patch http://review.whamcloud.com/8084 was landed under this bug, but is not reported here.

Jinshan Xiong (Inactive) added a comment - 13/Nov/13 4:23 PM

this is fixed in ~~LU-4152~~

Jinshan Xiong (Inactive) added a comment - 13/Nov/13 4:23 PM this is fixed in LU-4152

Patrick Farrell (Inactive) added a comment - 29/Oct/13 9:28 PM

Moving conversation about patches to ~~LU-4152~~; latest is there.

Patrick Farrell (Inactive) added a comment - 29/Oct/13 9:28 PM Moving conversation about patches to LU-4152 ; latest is there.

Patrick Farrell (Inactive) added a comment - 29/Oct/13 6:14 PM

Oleg - We hit it while testing NFS exported Lustre during a large-ish test run, with tests drawn primarily from the Linux Test Project. The problem is we don't always hit it with the same test.

The test engineer who's been handling it thinks a way to hit it is concurrent runs of fsx-linux with different command line options. Those are being run against an NFS export of Lustre.
He's going to try to pin that down this afternoon, I'll update if he's able to be more specific.

Patrick Farrell (Inactive) added a comment - 29/Oct/13 6:14 PM Oleg - We hit it while testing NFS exported Lustre during a large-ish test run, with tests drawn primarily from the Linux Test Project. The problem is we don't always hit it with the same test. The test engineer who's been handling it thinks a way to hit it is concurrent runs of fsx-linux with different command line options. Those are being run against an NFS export of Lustre. He's going to try to pin that down this afternoon, I'll update if he's able to be more specific.

Oleg Drokin added a comment - 29/Oct/13 5:49 PM

Patrick: what's your exact reproducer to hit this? We are so far unable to hit it ourselves

Oleg Drokin added a comment - 29/Oct/13 5:49 PM Patrick: what's your exact reproducer to hit this? We are so far unable to hit it ourselves

John Hammond added a comment - 29/Oct/13 12:52 PM

Links to Oleg's patches (which all reference this issue) may be found in the comments on ~~LU-4152~~.

John Hammond added a comment - 29/Oct/13 12:52 PM Links to Oleg's patches (which all reference this issue) may be found in the comments on LU-4152 .

Jinshan Xiong (Inactive) added a comment - 28/Oct/13 9:38 PM

just an update - Oleg is creating a patch for this issue.

Jinshan Xiong (Inactive) added a comment - 28/Oct/13 9:38 PM just an update - Oleg is creating a patch for this issue.

Patrick Farrell (Inactive) added a comment - 28/Oct/13 7:52 PM

Jinshan - This was originally a Cray bug (thank you Andriy and Vitaly for bringing this up), which I've been tracking.

I think eliminating the case where 2 locks are taken non-atomically is key long term. If you're planning to do that, then that sounds good.
If you're planning to only do it in certain cases, are you completely sure we don't have another possible live lock?

I'd back Vitaly's suggestion that it be a blocker. We're able to trigger it during testing of NFS export, presumably because of the open_by_fid operations caused by NFS export.

Patrick Farrell (Inactive) added a comment - 28/Oct/13 7:52 PM Jinshan - This was originally a Cray bug (thank you Andriy and Vitaly for bringing this up), which I've been tracking. I think eliminating the case where 2 locks are taken non-atomically is key long term. If you're planning to do that, then that sounds good. If you're planning to only do it in certain cases, are you completely sure we don't have another possible live lock? I'd back Vitaly's suggestion that it be a blocker. We're able to trigger it during testing of NFS export, presumably because of the open_by_fid operations caused by NFS export.

Jinshan Xiong (Inactive) added a comment - 28/Oct/13 7:21 PM

Indeed, this is a live lock case.

To clarify, the process1 must be writing an empty file without layout, so writing will cause new layout to be created.

btw, why the last option was not done originally ?

The reason for me to not acquire one common lock is that we have to acquire EX mode for layout lock which will be too strong for lookup and open lock since they have to share the same DLM lock.

Though patch 7148 can fix this problem, acquiring 2 locks in a row is generally bad. Therefore, I'll fix by acquiring one lock with EX mode for the above case, however, this lock won't be returned to client side. As a result, the process will not cache this specific open. This is good as it will happen rarely.

How do you guys think?

Jinshan Xiong (Inactive) added a comment - 28/Oct/13 7:21 PM Indeed, this is a live lock case. To clarify, the process1 must be writing an empty file without layout, so writing will cause new layout to be created. btw, why the last option was not done originally ? The reason for me to not acquire one common lock is that we have to acquire EX mode for layout lock which will be too strong for lookup and open lock since they have to share the same DLM lock. Though patch 7148 can fix this problem, acquiring 2 locks in a row is generally bad. Therefore, I'll fix by acquiring one lock with EX mode for the above case, however, this lock won't be returned to client side. As a result, the process will not cache this specific open. This is good as it will happen rarely. How do you guys think?

Vitaly Fertman added a comment - 28/Oct/13 6:00 PM

Andreas, it was hit during testing.

process1.lock1: open|lookup, granted
process2.lock1: layout | XXX, granted
process3.lock1: lookup | XXX, waiting process1.lock1
process1.lock2: layout, waiting process2.lock1
process2.lock1: cancelled, reprocessing does not reach process1.lock2

process1 is open by fid
process3 is getattr

in other words, as 2 locks are taken not atomically, you must guarantee nobody can take a conflict for 1st lock in between. otherwise you need either:

full reprocess
reordering on waiting list
make these 2 enqueue atomic
take 1 common lock with all the ibits

btw, why the last option was not done originally ?

as it can deadlock without HSM, I would consider it as a blocker.

Vitaly Fertman added a comment - 28/Oct/13 6:00 PM Andreas, it was hit during testing. process1.lock1: open|lookup, granted process2.lock1: layout | XXX, granted process3.lock1: lookup | XXX, waiting process1.lock1 process1.lock2: layout, waiting process2.lock1 process2.lock1: cancelled, reprocessing does not reach process1.lock2 process1 is open by fid process3 is getattr in other words, as 2 locks are taken not atomically, you must guarantee nobody can take a conflict for 1st lock in between. otherwise you need either: full reprocess reordering on waiting list make these 2 enqueue atomic take 1 common lock with all the ibits btw, why the last option was not done originally ? as it can deadlock without HSM, I would consider it as a blocker.

People

Assignee:: Jinshan Xiong (Inactive)

Reporter:: John Hammond

Votes:: 0 Vote for this issue

Watchers:: 17 Start watching this issue

Dates

Created:: 16/Jul/13 10:34 PM

Updated:: 14/Feb/14 5:16 PM

Resolved:: 14/Feb/14 5:16 PM