[LU-8010] lfs hsm command hangs up after lfs hsm_cancel - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.9.0
Affects Version/s: Lustre 2.8.0
Labels:
- HSM

Severity:
3
Project:
HSM
Rank (Obsolete):
9223372036854775807

Description

When hsm_restore are canceled, the following lfs hsm command hangs up.
To reproduce, setup HSM and do the following
(We found 2 patterns. They seem to be hanged at different points.):

(1)

# lfs hsm_state /lustre/file
/lustre/file: (0x0000000d) released exists archived, archive_id:1
# lfs hsm_restore /lustre/file
# lfs hsm_cancel /lustre/file
# lfs hsm_restore /lustre/file (quickly after lfs hsm_cancel)
# lfs hsm_restore /lustre/file  *hang up

The call trace is as following:

PID: 9550   TASK: ffff880c33f86040  CPU: 13  COMMAND: "lfs"
 #0 [ffff880c1ca8f568] schedule at ffffffff81539170
 #1 [ffff880c1ca8f640] schedule_timeout at ffffffff8153a042
 #2 [ffff880c1ca8f6f0] ldlm_completion_ast at ffffffffa0847fc9 [ptlrpc]
 #3 [ffff880c1ca8f7a0] ldlm_cli_enqueue_fini at ffffffffa08420e6 [ptlrpc]
 #4 [ffff880c1ca8f840] ldlm_cli_enqueue at ffffffffa08429a1 [ptlrpc]
 #5 [ffff880c1ca8f8f0] mdc_enqueue at ffffffffa0a6e8aa [mdc]
 #6 [ffff880c1ca8fa40] lmv_enqueue at ffffffffa0a25bfb [lmv]
 #7 [ffff880c1ca8fac0] ll_layout_refresh_locked at ffffffffa0b084f6 [lustre]
 #8 [ffff880c1ca8fc00] ll_layout_refresh at ffffffffa0b09159 [lustre]
 #9 [ffff880c1ca8fc50] vvp_io_init at ffffffffa0b5227f [lustre]
#10 [ffff880c1ca8fcc0] cl_io_init0 at ffffffffa06a8e78 [obdclass]
#11 [ffff880c1ca8fd00] cl_io_init at ffffffffa06abdf4 [obdclass]
#12 [ffff880c1ca8fd40] cl_glimpse_size0 at ffffffffa0b4bc05 [lustre]
#13 [ffff880c1ca8fda0] ll_getattr at ffffffffa0b07cb8 [lustre]
#14 [ffff880c1ca8fe40] vfs_getattr at ffffffff81197c61
#15 [ffff880c1ca8fe80] vfs_fstatat at ffffffff81197cf4
#16 [ffff880c1ca8fee0] vfs_lstat at ffffffff81197d9e
#17 [ffff880c1ca8fef0] sys_newlstat at ffffffff81197dc4
#18 [ffff880c1ca8ff80] tracesys at ffffffff8100b2e8 (via system_call)

(2)

# lfs hsm_state /lustre/file
/lustre/file: (0x0000000d) released exists archived, archive_id:1
# lfs hsm_restore /lustre/file
<MDS Immediate reset>
<MDS recover and HSM setup>
# lfs hsm_action /lustre/file
/lustre/file: RESTORE waiting (from 0 to EOF)
# lfs hsm_state /lustre/file
/lustre/file: (0x0000000d) released exists archived, archive_id:1
# lfs hsm_cancel /lustre/file
# lfs hsm_restore /lustre/file   *hang up

The call trace is as following:

PID: 3731   TASK: ffff880c35087520  CPU: 13  COMMAND: "lfs"
 #0 [ffff880c327df598] schedule at ffffffff81539170
 #1 [ffff880c327df670] schedule_timeout at ffffffff8153a042
 #2 [ffff880c327df720] ptlrpc_set_wait at ffffffffa0860c41 [ptlrpc]
 #3 [ffff880c327df7e0] ptlrpc_queue_wait at ffffffffa0861301 [ptlrpc]
 #4 [ffff880c327df800] mdc_ioc_hsm_request at ffffffffa0a65138 [mdc]
 #5 [ffff880c327df830] mdc_iocontrol at ffffffffa0a66df9 [mdc]
 #6 [ffff880c327df950] obd_iocontrol at ffffffffa0a16aa5 [lmv]
 #7 [ffff880c327df9a0] lmv_iocontrol at ffffffffa0a2d9dc [lmv]
 #8 [ffff880c327dfb90] obd_iocontrol at ffffffffa0af2d15 [lustre]
 #9 [ffff880c327dfbe0] ll_dir_ioctl at ffffffffa0af9ae8 [lustre]
#10 [ffff880c327dfe60] vfs_ioctl at ffffffff811a7972
#11 [ffff880c327dfea0] do_vfs_ioctl at ffffffff811a7b14
#12 [ffff880c327dff30] sys_ioctl at ffffffff811a8091
#13 [ffff880c327dff80] tracesys at ffffffff8100b2e8 (via system_call)

Attachments

Activity

People

Assignee:: Bruno Faccini (Inactive)

Reporter:: Tatsushi Takamura

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 12/Apr/16 1:01 AM

Updated:: 27/Sep/16 5:02 PM

Resolved:: 16/May/16 9:44 PM