[LU-8345] hsm release hung Created: 29/Jun/16 Updated: 29/Mar/22 Resolved: 29/Mar/22 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Robert Read (Inactive) | Assignee: | John Hammond |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RHEL 7.2 |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Testing HSM release , restore, and then release again with ~1000 files in a directory (submitting hsm requests in batches of 50 each). During the second round of releases, a call llapi_hsm_request hung and was unkillable. No other activity was happening on the filesystem, and the agent and movers were running on different mountpoints. I extracted this stack with "echo t > /proc/sysrq-trigger" [5182373.816335] lhsm S ffff8806e4a80000 0 7849 6455 0x00000080 [5182373.819456] ffff880e74bb36d8 0000000000000086 ffff8806e4a80000 ffff880e74bb3fd8 [5182373.822816] ffff880e74bb3fd8 ffff880e74bb3fd8 ffff8806e4a80000 ffff880e73c81600 [5182373.826120] ffff8806e4a80000 0000000000000000 ffffffffa06a7910 ffff8806e4a80000 [5182373.829475] Call Trace: [5182373.831077] [<ffffffffa06a7910>] ? ldlm_completion_ast_async+0x300/0x300 [ptlrpc] [5182373.834253] [<ffffffff8163aab9>] schedule+0x29/0x70 [5182373.836587] [<ffffffffa06a818d>] ldlm_completion_ast+0x62d/0x910 [ptlrpc] [5182373.839514] [<ffffffffa06959a0>] ? ldlm_resource_add_lock+0x70/0x1b0 [ptlrpc] [5182373.842620] [<ffffffff810b8c10>] ? wake_up_state+0x20/0x20 [5182373.845227] [<ffffffffa06a9918>] ldlm_cli_enqueue_fini+0x938/0xdb0 [ptlrpc] [5182373.848256] [<ffffffffa06aa045>] ldlm_cli_enqueue+0x2b5/0x890 [ptlrpc] [5182373.851121] [<ffffffffa06a7b60>] ? ldlm_expired_completion_wait+0x250/0x250 [ptlrpc] [5182373.854442] [<ffffffffa0949260>] ? ll_invalidate_negative_children+0x1d0/0x1d0 [lustre] [5182373.857856] [<ffffffffa0374f7f>] mdc_enqueue+0x2bf/0x17b0 [mdc] [5182373.860550] [<ffffffffa08d35f9>] ? lmv_lock_match+0x239/0x5f0 [lmv] [5182373.863428] [<ffffffffa08e4091>] lmv_enqueue+0x161/0x600 [lmv] [5182373.866060] [<ffffffffa0927475>] ll_layout_refresh_locked+0x325/0xd90 [lustre] [5182373.869133] [<ffffffffa0949260>] ? ll_invalidate_negative_children+0x1d0/0x1d0 [lustre] [5182373.872528] [<ffffffffa06a7b60>] ? ldlm_expired_completion_wait+0x250/0x250 [ptlrpc] [5182373.875880] [<ffffffffa092808f>] ll_layout_refresh+0x1af/0x290 [lustre] [5182373.878790] [<ffffffffa096ddbf>] vvp_io_init+0x3df/0x4c0 [lustre] [5182373.881587] [<ffffffffa03cf277>] ? cfs_hash_find_or_add+0xa7/0x1b0 [libcfs] [5182373.884661] [<ffffffffa04ed168>] cl_io_init0.isra.15+0x88/0x160 [obdclass] [5182373.887699] [<ffffffffa09218aa>] ? ll_data_version+0x5a/0x320 [lustre] [5182373.890671] [<ffffffffa04ed335>] cl_io_init+0x55/0xd0 [obdclass] [5182373.893491] [<ffffffffa092194f>] ll_data_version+0xff/0x320 [lustre] [5182373.896356] [<ffffffffa0921bff>] ll_hsm_release+0x8f/0x320 [lustre] [5182373.899152] [<ffffffffa0912f1b>] ll_dir_ioctl+0x498b/0x6be0 [lustre] [5182373.902057] [<ffffffff811e9401>] ? terminate_walk+0x51/0x60 [5182373.904657] [<ffffffff811ebce5>] ? do_last+0x635/0x1270 [5182373.907161] [<ffffffff811ee602>] ? path_openat+0xc2/0x490 [5182373.909729] [<ffffffff8119259c>] ? tlb_flush_mmu.part.54+0x6c/0xc0 [5182373.912608] [<ffffffff811efdcb>] ? do_filp_open+0x4b/0xb0 [5182373.915175] [<ffffffff811f1e85>] do_vfs_ioctl+0x2e5/0x4c0 [5182373.917713] [<ffffffff811fca67>] ? __fd_install+0x47/0x60 [5182373.920314] [<ffffffff811f2101>] SyS_ioctl+0xa1/0xc0 [5182373.922786] [<ffffffff81645b09>] system_call_fastpath+0x16/0x1b |
| Comments |
| Comment by Evan D. Chen (Inactive) [ 01/Jul/16 ] |
|
John, Can you take a look of this? |