HSM _not only_ small fixes and to do list goes here
(LU-3647)
|
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.5.1 |
| Type: | Technical task | Priority: | Major |
| Reporter: | John Hammond | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HSM | ||
| Issue Links: |
|
||||||||||||||||||||
| Rank (Obsolete): | 9136 | ||||||||||||||||||||
| Description |
|
Running the HSM stack as of July 15 2013, I see a hang when a release is issued while a restore is still running. To reproduce I run the following: #!/bin/bash export MOUNT_2=n export MDSCOUNT=1 export PTLDEBUG="super inode ioctl warning dlmtrace error emerg ha rpctrace vfstrace config console" export DEBUG_SIZE=512 hsm_root=/tmp/hsm_root rm -rf $hsm_root mkdir $hsm_root llmount.sh lctl conf_param lustre-MDT0000.mdt.hsm_control=enabled # lctl conf_param lustre-MDT0001.mdt.hsm_control=enabled sleep 10 lhsmtool_posix --verbose --hsm_root=$hsm_root --bandwidth 1 lustre lctl dk > ~/hsm-0-mount.dk set -x cd /mnt/lustre lfs setstripe -c2 f0 dd if=/dev/urandom of=f0 bs=1M count=100 lctl dk > ~/hsm-1-dd.dk lfs hsm_archive f0 sleep 10 echo > /proc/fs/lustre/ldlm/dump_namespaces lctl dk > ~/hsm-2-archive.dk lfs hsm_release f0 echo > /proc/fs/lustre/ldlm/dump_namespaces lctl dk > ~/hsm-3-release.dk lfs hsm_restore f0 echo > /proc/fs/lustre/ldlm/dump_namespaces lctl dk > ~/hsm-4-restore.dk lfs hsm_release f0 with the last command never returning. The MDS_CLOSE handler looks like 10070 [<ffffffffa0f9866e>] cfs_waitq_wait+0xe/0x10 [libcfs] [<ffffffffa124826a>] ldlm_completion_ast+0x57a/0x960 [ptlrpc] [<ffffffffa1247920>] ldlm_cli_enqueue_local+0x1f0/0x5c0 [ptlrpc] [<ffffffffa08cee3b>] mdt_object_lock0+0x33b/0xaf0 [mdt] [<ffffffffa08cf6b4>] mdt_object_lock+0x14/0x20 [mdt] [<ffffffffa08f9551>] mdt_mfd_close+0x351/0xde0 [mdt] [<ffffffffa08fb372>] mdt_close+0x662/0xa60 [mdt] [<ffffffffa08d2c07>] mdt_handle_common+0x647/0x16d0 [mdt] [<ffffffffa090c9e5>] mds_readpage_handle+0x15/0x20 [mdt] [<ffffffffa12813d8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] [<ffffffffa128275d>] ptlrpc_main+0xabd/0x1700 [ptlrpc] [<ffffffff81096936>] kthread+0x96/0xa0 [<ffffffff8100c0ca>] child_rip+0xa/0x20 [<ffffffffffffffff>] 0xffffffffffffffff while the MDS_HSM_PROGRESS handler looks like: 10065 [<ffffffffa0f9866e>] cfs_waitq_wait+0xe/0x10 [libcfs] [<ffffffffa124826a>] ldlm_completion_ast+0x57a/0x960 [ptlrpc] [<ffffffffa1247920>] ldlm_cli_enqueue_local+0x1f0/0x5c0 [ptlrpc] [<ffffffffa08cee3b>] mdt_object_lock0+0x33b/0xaf0 [mdt] [<ffffffffa08cf6b4>] mdt_object_lock+0x14/0x20 [mdt] [<ffffffffa08cf721>] mdt_object_find_lock+0x61/0x170 [mdt] [<ffffffffa091dc22>] hsm_get_md_attr+0x62/0x270 [mdt] [<ffffffffa0923253>] mdt_hsm_update_request_state+0x4d3/0x1c20 [mdt] [<ffffffffa091ae6e>] mdt_hsm_coordinator_update+0x3e/0xe0 [mdt] [<ffffffffa090931b>] mdt_hsm_progress+0x21b/0x330 [mdt] [<ffffffffa08d2c07>] mdt_handle_common+0x647/0x16d0 [mdt] [<ffffffffa090ca05>] mds_regular_handle+0x15/0x20 [mdt] [<ffffffffa12813d8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] [<ffffffffa128275d>] ptlrpc_main+0xabd/0x1700 [ptlrpc] [<ffffffff81096936>] kthread+0x96/0xa0 [<ffffffff8100c0ca>] child_rip+0xa/0x20 [<ffffffffffffffff>] 0xffffffffffffffff The close handler is waiting on an EX layout lock on f0. While the For reference I'm using the following changes: # LU-2919 hsm: Implementation of exclusive open # http://review.whamcloud.com/#/c/6730 git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/30/6730/13 && git cherry-pick FETCH_HEAD # LU-1333 hsm: Add hsm_release feature. # http://review.whamcloud.com/#/c/6526 git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/26/6526/9 && git cherry-pick FETCH_HEAD # LU-3339 mdt: HSM on disk actions record # http://review.whamcloud.com/#/c/6529 # MERGED # LU-3340 mdt: HSM memory requests management # http://review.whamcloud.com/#/c/6530 git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/30/6530/8 && git cherry-pick FETCH_HEAD # LU-3341 mdt: HSM coordinator client interface # http://review.whamcloud.com/#/c/6532 git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/32/6532/13 && git cherry-pick FETCH_HEAD # Needs rebase in sanity-hsm.sh # LU-3342 mdt: HSM coordinator agent interface # http://review.whamcloud.com/#/c/6534 git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/34/6534/8 && git cherry-pick FETCH_HEAD # LU-3343 mdt: HSM coordinator main thread # http://review.whamcloud.com/#/c/6912 git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/12/6912/3 && git cherry-pick FETCH_HEAD # lustre/mdt/mdt_internal.h # LU-3561 tests: HSM sanity test suite # http://review.whamcloud.com/#/c/6913/ git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/13/6913/4 && git cherry-pick FETCH_HEAD # lustre/tests/sanity-hsm.sh # LU-3432 llite: Access to released file trigs a restore # http://review.whamcloud.com/#/c/6537 git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/37/6537/11 && git cherry-pick FETCH_HEAD # LU-3363 api: HSM import uses new released pattern # http://review.whamcloud.com/#/c/6536 git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/36/6536/8 && git cherry-pick FETCH_HEAD # LU-2062 utils: HSM Posix CopyTool # http://review.whamcloud.com/#/c/4737 git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/37/4737/18 && git cherry-pick FETCH_HEAD |
| Comments |
| Comment by Jinshan Xiong (Inactive) [ 17/Jul/13 ] |
|
Hi JC, Can you please refresh your coordinator patches based on release patches? I tried to make a build but unfortunately met some merge errors. Thanks, |
| Comment by jacques-charles lafoucriere [ 17/Jul/13 ] |
|
Not sure a coordinator refresh will be enough. I can push our last full branch based on LU-1333-v10. Is it ok for you? |
| Comment by John Hammond [ 17/Jul/13 ] |
|
I believe that this situation exposes a limitation of LDLM for inodebits locks. All locks below are on f0.
It also appears that the skip list optimizations in ldlm_inodebits_compat_queue() could be extended/improved by computing compatibility one mode-bits-bunch at a time and by granting locks in bunches. |
| Comment by John Hammond [ 26/Jul/13 ] |
|
Here is a simpler situation where we can get stuck. (It is also more likely to occur.) Consider the following release vs open race. Assume the file F has already been archived.
|
| Comment by John Hammond [ 26/Jul/13 ] |
|
Another issue here is that it may be unsafe to access the mount point being used by the copytool. Especially to perform manual HSM requests, since the MDC's cl_close_lock will prevent multiple concurrent closes. In particular we can have a releasing close block (on EX LAYOUT) because a restore is running, which prevents the restore from being completed, because any close will block on cl_close_lock. |
| Comment by Jinshan Xiong (Inactive) [ 26/Jul/13 ] |
|
I will fix the lock issue above. The close sounds like a real issue here, we shouldn't block close REQ to finish. Let's use try version of mdt_object_lock() in close. |
| Comment by John Hammond [ 27/Jul/13 ] |
|
Please see http://review.whamcloud.com/7148 for the LDLM patch we discussed. |
| Comment by John Hammond [ 30/Jul/13 ] |
|
A similar hang can be triggered by trying to read a file while a restore is still running. To see this add --bandwidth=1 to the copytool options and do: # cd /mnt/lustre # dd if=/dev/urandom of=f0 bs=1M count=10 # lfs hsm_archive f0 # # Wait for archive to complete. # sleep 15 # lfs hsm_release f0 # lfs hsm_restore f0 # cat f0 > /dev/null This is addresses by the http://review.whamcloud.com/#/c/7148/. However even with the latest version (patch set 9) of http://review.whamcloud.com/#/c/6912/ we have an easily exploited race between restore and rename which is not addressed by the change in 7148. Rename onto during restore will hang: cd /mnt/lustre dd if=/dev/urandom of=f0 bs=1M count=10 lfs hsm_archive f0 # Wait for archive to complete. sleep 15 lfs hsm_state f0 lfs hsm_release f0 lfs hsm_restore f0; touch f1; sys_rename f1 f0 Since this rename takes MDS_INODELOCK_FULL on f0, I doubt that the choice of using LAYOUT, UPDATE, or other in hsm_get_md_attr() matters very much. But I could be wrong. |
| Comment by John Hammond [ 01/Aug/13 ] |
|
Since the removal of UPDATE lock use from the coordinator, I can no longer reproduce these issues. |
| Comment by jacques-charles lafoucriere [ 02/Aug/13 ] |
|
We will add sanity-hsm tests for the 2 simple use cases. Will be safer for futures changes. |
| Comment by Aurelien Degremont (Inactive) [ 03/Aug/13 ] |
|
We already have such test. sanity-hsm #33 deadlock was hitting this bug. John's patch was fixing hit. I will confirm that the latest coordinator, without John's patch do not trig this deadlock anymore on monday, but I'm confident. |
| Comment by jacques-charles lafoucriere [ 04/Aug/13 ] |
|
sanity-hsm #33 hits the same bug, but was not designed to test concurrent access to file during the restore phase. We also today do no test rename/rm during restore. |
| Comment by Jodi Levi (Inactive) [ 21/Oct/13 ] |
|
Should Change, 7148 be landed or abandoned? |
| Comment by John Hammond [ 21/Oct/13 ] |
|
Landed after being improved per comments on gerrit. |
| Comment by John Hammond [ 21/Oct/13 ] |
|
This issue was fixed for 2.5.0 and can be closed now. |
| Comment by Andreas Dilger [ 28/Oct/13 ] |
|
Andriy wrote in
Andriy, was this problem actually hit during testing, or was this problem found by code inspection? |
| Comment by Vitaly Fertman [ 28/Oct/13 ] |
|
Andreas, it was hit during testing. process1.lock1: open|lookup, granted process1 is open by fid in other words, as 2 locks are taken not atomically, you must guarantee nobody can take a conflict for 1st lock in between. otherwise you need either:
btw, why the last option was not done originally ? as it can deadlock without HSM, I would consider it as a blocker. |
| Comment by Jinshan Xiong (Inactive) [ 28/Oct/13 ] |
|
Indeed, this is a live lock case. To clarify, the process1 must be writing an empty file without layout, so writing will cause new layout to be created.
The reason for me to not acquire one common lock is that we have to acquire EX mode for layout lock which will be too strong for lookup and open lock since they have to share the same DLM lock. Though patch 7148 can fix this problem, acquiring 2 locks in a row is generally bad. Therefore, I'll fix by acquiring one lock with EX mode for the above case, however, this lock won't be returned to client side. As a result, the process will not cache this specific open. This is good as it will happen rarely. How do you guys think? |
| Comment by Patrick Farrell (Inactive) [ 28/Oct/13 ] |
|
Jinshan - This was originally a Cray bug (thank you Andriy and Vitaly for bringing this up), which I've been tracking. I think eliminating the case where 2 locks are taken non-atomically is key long term. If you're planning to do that, then that sounds good. I'd back Vitaly's suggestion that it be a blocker. We're able to trigger it during testing of NFS export, presumably because of the open_by_fid operations caused by NFS export. |
| Comment by Jinshan Xiong (Inactive) [ 28/Oct/13 ] |
|
just an update - Oleg is creating a patch for this issue. |
| Comment by John Hammond [ 29/Oct/13 ] |
|
Links to Oleg's patches (which all reference this issue) may be found in the comments on |
| Comment by Oleg Drokin [ 29/Oct/13 ] |
|
Patrick: what's your exact reproducer to hit this? We are so far unable to hit it ourselves |
| Comment by Patrick Farrell (Inactive) [ 29/Oct/13 ] |
|
Oleg - We hit it while testing NFS exported Lustre during a large-ish test run, with tests drawn primarily from the Linux Test Project. The problem is we don't always hit it with the same test. The test engineer who's been handling it thinks a way to hit it is concurrent runs of fsx-linux with different command line options. Those are being run against an NFS export of Lustre. |
| Comment by Patrick Farrell (Inactive) [ 29/Oct/13 ] |
|
Moving conversation about patches to |
| Comment by Jinshan Xiong (Inactive) [ 13/Nov/13 ] |
|
this is fixed in |
| Comment by Andreas Dilger [ 10/Feb/14 ] |
|
Patch http://review.whamcloud.com/8084 was landed under this bug, but is not reported here. |