[LU-12167] conf-sanity test_90c: transaction blocked hang on OST umount Created: 07/Apr/19 Updated: 04/Jul/20 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Alex Zhuravlev |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for wangshilong <wshilong@ddn.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/30cc74a4-5923-11e9-8e92-52540065bddc test_90c failed with the following error: Timeout occurred after 379 mins, last suite running was conf-sanity, restarting cluster to continue tests Looks ZFS transaction blocked which hang on OST umount 19270.120645] Pid: 1384, comm: ll_ost00_006 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Mon Mar 25 21:14:10 UTC 2019 [19270.122330] Call Trace: [19270.122837] [<ffffffffc042f2d5>] cv_wait_common+0x125/0x150 [spl] [19270.123988] [<ffffffffc042f315>] __cv_wait+0x15/0x20 [spl] [19270.124988] [<ffffffffc05a62bf>] txg_wait_synced+0xef/0x140 [zfs] [19270.126335] [<ffffffffc11d39db>] osd_trans_stop+0x53b/0x5e0 [osd_zfs] [19270.127563] [<ffffffffc132ed45>] ofd_trans_stop+0x25/0x60 [ofd] [19270.128692] [<ffffffffc1333335>] ofd_destroy+0x2c5/0x960 [ofd] [19270.129759] [<ffffffffc132b5a4>] ofd_destroy_by_fid+0x1f4/0x4a0 [ofd] [19270.130919] [<ffffffffc1321677>] ofd_destroy_hdl+0x267/0x970 [ofd] [19270.132027] [<ffffffffc0fed06a>] tgt_request_handle+0x91a/0x15c0 [ptlrpc] [19270.133698] [<ffffffffc0f906ae>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc] [19270.135099] [<ffffffffc0f941ac>] ptlrpc_main+0xbbc/0x2090 [ptlrpc] [19270.136246] [<ffffffff9d4c1c31>] kthread+0xd1/0xe0 [19270.137169] [<ffffffff9db74c37>] ret_from_fork_nospec_end+0x0/0x39 [19270.138304] [<ffffffffffffffff>] 0xffffffffffffffff [19270.139282] Pid: 1381, comm: ll_ost00_005 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Mon Mar 25 21:14:10 UTC 2019 [19270.141084] Call Trace: [19270.141612] [<ffffffffc042f2d5>] cv_wait_common+0x125/0x150 [spl] [19270.143092] [<ffffffffc042f315>] __cv_wait+0x15/0x20 [spl] [19270.144111] [<ffffffffc05a62bf>] txg_wait_synced+0xef/0x140 [zfs] [19270.145275] [<ffffffffc11d39db>] osd_trans_stop+0x53b/0x5e0 [osd_zfs] [19270.146454] [<ffffffffc132ed45>] ofd_trans_stop+0x25/0x60 [ofd] [19270.147545] [<ffffffffc1333335>] ofd_destroy+0x2c5/0x960 [ofd] [19270.148616] [<ffffffffc132b5a4>] ofd_destroy_by_fid+0x1f4/0x4a0 [ofd] [19270.149771] [<ffffffffc1321677>] ofd_destroy_hdl+0x267/0x970 [ofd] [19270.150910] [<ffffffffc0fed06a>] tgt_request_handle+0x91a/0x15c0 [ptlrpc] [19270.152179] [<ffffffffc0f906ae>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc] [19270.153593] [<ffffffffc0f941ac>] ptlrpc_main+0xbbc/0x2090 [ptlrpc] [19270.154744] [<ffffffff9d4c1c31>] kthread+0xd1/0xe0 [19270.155633] [<ffffffff9db74c37>] ret_from_fork_nospec_end+0x0/0x39 [19270.156767] [<ffffffffffffffff>] 0xffffffffffffffff [19270.157685] Pid: 1393, comm: ll_ost00_009 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Mon Mar 25 21:14:10 UTC 2019 [19270.159379] Call Trace: [19270.159829] [<ffffffffc042f2d5>] cv_wait_common+0x125/0x150 [spl] [19270.160932] [<ffffffffc042f315>] __cv_wait+0x15/0x20 [spl] [19270.161933] [<ffffffffc05a62bf>] txg_wait_synced+0xef/0x140 [zfs] [19270.163056] [<ffffffffc11d39db>] osd_trans_stop+0x53b/0x5e0 [osd_zfs] [19270.164205] [<ffffffffc132ed45>] ofd_trans_stop+0x25/0x60 [ofd] [19270.165278] [<ffffffffc1333335>] ofd_destroy+0x2c5/0x960 [ofd] [19270.166331] [<ffffffffc132b5a4>] ofd_destroy_by_fid+0x1f4/0x4a0 [ofd] [19270.167492] [<ffffffffc1321677>] ofd_destroy_hdl+0x267/0x970 [ofd] [19270.168600] [<ffffffffc0fed06a>] tgt_request_handle+0x91a/0x15c0 [ptlrpc] [19270.169846] [<ffffffffc0f906ae>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc] [19270.171223] [<ffffffffc0f941ac>] ptlrpc_main+0xbbc/0x2090 [ptlrpc] [19270.172365] [<ffffffff9d4c1c31>] kthread+0xd1/0xe0 [19270.173243] [<ffffffff9db74c37>] ret_from_fork_nospec_end+0x0/0x39 [19270.174349] [<ffffffffffffffff>] 0xffffffffffffffff .... VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Wang Shilong (Inactive) [ 07/Apr/19 ] |
|
Check following OST backtrace. https://testing.whamcloud.com/test_logs/3373a93e-5923-11e9-8e92-52540065bddc/show_text |
| Comment by Wang Shilong (Inactive) [ 07/Apr/19 ] |
|
Looks dmu objection allocation blocked on mutex acquired: [22795.946424] Call Trace: [22795.946868] [<ffffffff9d4d31b6>] __cond_resched+0x26/0x30 [22795.947820] [<ffffffff9db67eea>] _cond_resched+0x3a/0x50 [22795.948754] [<ffffffff9db65e92>] mutex_lock+0x12/0x2f [22795.949668] [<ffffffffc055d5b3>] dnode_check_slots_free+0x63/0x100 [zfs] [22795.950844] [<ffffffffc055e4b6>] dnode_hold_impl+0x4e6/0xc40 [zfs] [22795.951938] [<ffffffffc054b905>] ? dmu_object_next+0x95/0x140 [zfs] [22795.953063] [<ffffffffc054bb9f>] dmu_object_alloc_dnsize+0x1ef/0x3e0 [zfs] [22795.954262] [<ffffffffc11ddba2>] __osd_object_create+0x82/0x170 [osd_zfs] [22795.955441] [<ffffffffc11ddf0d>] osd_mkreg+0x7d/0x210 [osd_zfs] [22795.956484] [<ffffffffc11d9506>] osd_create+0x316/0xaf0 [osd_zfs] [22795.957577] [<ffffffffc133028c>] ofd_precreate_objects+0xeec/0x1aa0 [ofd] [22795.958765] [<ffffffffc13221f4>] ofd_create_hdl+0x474/0x2100 [ofd] [22795.959861] [<ffffffffc0c0a8d4>] ? libcfs_log_return+0x24/0x30 [libcfs] [22795.961032] [<ffffffffc0f84613>] ? lustre_pack_reply_v2+0x183/0x280 [ptlrpc] [22795.962281] [<ffffffffc0fed06a>] tgt_request_handle+0x91a/0x15c0 [ptlrpc] [22795.963464] [<ffffffffc0c0df17>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [22795.964627] [<ffffffffc0f906ae>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc] [22795.965966] [<ffffffffc0f941ac>] ptlrpc_main+0xbbc/0x2090 [ptlrpc] [22795.967048] [<ffffffff9d4d0880>] ? finish_task_switch+0x50/0x1c0 [22795.968125] [<ffffffffc0f935f0>] ? ptlrpc_register_service+0xfa0/0xfa0 [ptlrpc] [22795.969395] [<ffffffff9d4c1c31>] kthread+0xd1/0xe0 [22795.970237] [<ffffffff9d4c1b60>] ? insert_kthread_work+0x40/0x40 [22795.971279] [<ffffffff9db74c37>] ret_from_fork_nospec_begin+0x21/0x21 [22795.972400] [<ffffffff9d4c1b60>] ? insert_kthread_work+0x40/0x40 [22795.973453] ll_ost00_004 D ffff9f40243a30c0 0 1372 2 0x00000080 |
| Comment by Alex Zhuravlev [ 08/Apr/19 ] |
|
this looks likeĀ https://github.com/zfsonlinux/zfs/issues/8426 |
| Comment by Wang Shilong (Inactive) [ 08/Apr/19 ] |
|
So problem will be solved once ZFS is upgraded to 0.8 |
| Comment by Sebastien Buisson [ 19/Nov/19 ] |
|
Probably the same problem hit in https://testing.whamcloud.com/test_sets/6ccd175e-0a5a-11ea-8e77-52540065bddc, but during conf-sanity test_76c and with a slightly different stack trace: [22615.838841] Pid: 9453, comm: ll_ost00_006 3.10.0-957.27.2.el7_lustre.x86_64 #1 SMP Fri Nov 8 17:23:54 UTC 2019 [22615.838842] Call Trace: [22615.838866] [<ffffffffc05272d5>] cv_wait_common+0x125/0x150 [spl] [22615.838870] [<ffffffffc0527315>] __cv_wait+0x15/0x20 [spl] [22615.838996] [<ffffffffc07d84d3>] txg_wait_open+0xc3/0x110 [zfs] [22615.839014] [<ffffffffc078ddca>] dmu_tx_wait+0x3aa/0x3c0 [zfs] [22615.839031] [<ffffffffc078de72>] dmu_tx_assign+0x92/0x490 [zfs] [22615.839075] [<ffffffffc137dfd9>] osd_trans_start+0x199/0x440 [osd_zfs] [22615.839136] [<ffffffffc14c906c>] ofd_precreate_objects+0xa1c/0x1df0 [ofd] [22615.839140] [<ffffffffc14bc024>] ofd_create_hdl+0x474/0x2100 [ofd] [22615.839518] [<ffffffffc11afeea>] tgt_request_handle+0x97a/0x1620 [ptlrpc] [22615.839556] [<ffffffffc1156a66>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [22615.839590] [<ffffffffc115a59c>] ptlrpc_main+0xbac/0x1540 [ptlrpc] [22615.839594] [<ffffffffb7ac2e81>] kthread+0xd1/0xe0 [22615.839597] [<ffffffffb8177c37>] ret_from_fork_nospec_end+0x0/0x39 [22615.839605] [<ffffffffffffffff>] 0xffffffffffffffff [22615.862159] Pid: 30965, comm: ll_ost00_004 3.10.0-957.27.2.el7_lustre.x86_64 #1 SMP Fri Nov 8 17:23:54 UTC 2019 [22615.863851] Call Trace: [22615.864279] [<ffffffffc05272d5>] cv_wait_common+0x125/0x150 [spl] [22615.865584] [<ffffffffc0527315>] __cv_wait+0x15/0x20 [spl] [22615.866771] [<ffffffffc07d84d3>] txg_wait_open+0xc3/0x110 [zfs] [22615.867885] [<ffffffffc078ddca>] dmu_tx_wait+0x3aa/0x3c0 [zfs] [22615.868958] [<ffffffffc078de72>] dmu_tx_assign+0x92/0x490 [zfs] [22615.870026] [<ffffffffc137dfd9>] osd_trans_start+0x199/0x440 [osd_zfs] [22615.871199] [<ffffffffc14c906c>] ofd_precreate_objects+0xa1c/0x1df0 [ofd] [22615.872413] [<ffffffffc14bc024>] ofd_create_hdl+0x474/0x2100 [ofd] [22615.873538] [<ffffffffc11afeea>] tgt_request_handle+0x97a/0x1620 [ptlrpc] [22615.874798] [<ffffffffc1156a66>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [22615.876165] [<ffffffffc115a59c>] ptlrpc_main+0xbac/0x1540 [ptlrpc] [22615.877303] [<ffffffffb7ac2e81>] kthread+0xd1/0xe0 [22615.878183] [<ffffffffb8177c37>] ret_from_fork_nospec_end+0x0/0x39 [22615.879298] [<ffffffffffffffff>] 0xffffffffffffffff |
| Comment by Andreas Dilger [ 04/Jul/20 ] |
|
+1 on master with the same stack as Sebastien: |