Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.8.0
-
3
-
HSM
-
9223372036854775807
Description
When hsm_restore are canceled, the following lfs hsm command hangs up.
To reproduce, setup HSM and do the following
(We found 2 patterns. They seem to be hanged at different points.):
(1)
# lfs hsm_state /lustre/file /lustre/file: (0x0000000d) released exists archived, archive_id:1 # lfs hsm_restore /lustre/file # lfs hsm_cancel /lustre/file # lfs hsm_restore /lustre/file (quickly after lfs hsm_cancel) # lfs hsm_restore /lustre/file *hang up
The call trace is as following:
PID: 9550 TASK: ffff880c33f86040 CPU: 13 COMMAND: "lfs" #0 [ffff880c1ca8f568] schedule at ffffffff81539170 #1 [ffff880c1ca8f640] schedule_timeout at ffffffff8153a042 #2 [ffff880c1ca8f6f0] ldlm_completion_ast at ffffffffa0847fc9 [ptlrpc] #3 [ffff880c1ca8f7a0] ldlm_cli_enqueue_fini at ffffffffa08420e6 [ptlrpc] #4 [ffff880c1ca8f840] ldlm_cli_enqueue at ffffffffa08429a1 [ptlrpc] #5 [ffff880c1ca8f8f0] mdc_enqueue at ffffffffa0a6e8aa [mdc] #6 [ffff880c1ca8fa40] lmv_enqueue at ffffffffa0a25bfb [lmv] #7 [ffff880c1ca8fac0] ll_layout_refresh_locked at ffffffffa0b084f6 [lustre] #8 [ffff880c1ca8fc00] ll_layout_refresh at ffffffffa0b09159 [lustre] #9 [ffff880c1ca8fc50] vvp_io_init at ffffffffa0b5227f [lustre] #10 [ffff880c1ca8fcc0] cl_io_init0 at ffffffffa06a8e78 [obdclass] #11 [ffff880c1ca8fd00] cl_io_init at ffffffffa06abdf4 [obdclass] #12 [ffff880c1ca8fd40] cl_glimpse_size0 at ffffffffa0b4bc05 [lustre] #13 [ffff880c1ca8fda0] ll_getattr at ffffffffa0b07cb8 [lustre] #14 [ffff880c1ca8fe40] vfs_getattr at ffffffff81197c61 #15 [ffff880c1ca8fe80] vfs_fstatat at ffffffff81197cf4 #16 [ffff880c1ca8fee0] vfs_lstat at ffffffff81197d9e #17 [ffff880c1ca8fef0] sys_newlstat at ffffffff81197dc4 #18 [ffff880c1ca8ff80] tracesys at ffffffff8100b2e8 (via system_call)
(2)
# lfs hsm_state /lustre/file /lustre/file: (0x0000000d) released exists archived, archive_id:1 # lfs hsm_restore /lustre/file <MDS Immediate reset> <MDS recover and HSM setup> # lfs hsm_action /lustre/file /lustre/file: RESTORE waiting (from 0 to EOF) # lfs hsm_state /lustre/file /lustre/file: (0x0000000d) released exists archived, archive_id:1 # lfs hsm_cancel /lustre/file # lfs hsm_restore /lustre/file *hang up
The call trace is as following:
PID: 3731 TASK: ffff880c35087520 CPU: 13 COMMAND: "lfs" #0 [ffff880c327df598] schedule at ffffffff81539170 #1 [ffff880c327df670] schedule_timeout at ffffffff8153a042 #2 [ffff880c327df720] ptlrpc_set_wait at ffffffffa0860c41 [ptlrpc] #3 [ffff880c327df7e0] ptlrpc_queue_wait at ffffffffa0861301 [ptlrpc] #4 [ffff880c327df800] mdc_ioc_hsm_request at ffffffffa0a65138 [mdc] #5 [ffff880c327df830] mdc_iocontrol at ffffffffa0a66df9 [mdc] #6 [ffff880c327df950] obd_iocontrol at ffffffffa0a16aa5 [lmv] #7 [ffff880c327df9a0] lmv_iocontrol at ffffffffa0a2d9dc [lmv] #8 [ffff880c327dfb90] obd_iocontrol at ffffffffa0af2d15 [lustre] #9 [ffff880c327dfbe0] ll_dir_ioctl at ffffffffa0af9ae8 [lustre] #10 [ffff880c327dfe60] vfs_ioctl at ffffffff811a7972 #11 [ffff880c327dfea0] do_vfs_ioctl at ffffffff811a7b14 #12 [ffff880c327dff30] sys_ioctl at ffffffff811a8091 #13 [ffff880c327dff80] tracesys at ffffffff8100b2e8 (via system_call)