Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.14.0
-
None
-
ZFS and single MDS
-
3
-
9223372036854775807
Description
sanity-sec test_18 hangs. It looks like this issue started on 15 MAY 2020 with https://testing.whamcloud.com/test_sets/8514ff93-a5ac-45f0-9584-dff4d4e42eae for Lustre 2.13.53.163.
Looking at the suite_log for https://testing.whamcloud.com/test_sets/6875529f-ecc2-432e-9f20-e8810a08448c, the last thing we see is
CMD: trevis-201vm1.trevis.whamcloud.com runas -u60003 -g60003 -G60003 lfs quota -q /mnt/lustre running as uid/gid/euid/egid 60003/60003/60003/60003, groups: 60003 [lfs] [quota] [-q] [/mnt/lustre] CMD: trevis-201vm2 chmod 000 /mnt/lustre/d18.sanity-sec CMD: trevis-201vm2 chmod 001 /mnt/lustre/d18.sanity-sec CMD: trevis-201vm2 chmod 002 /mnt/lustre/d18.sanity-sec CMD: trevis-201vm2 chmod 003 /mnt/lustre/d18.sanity-sec CMD: trevis-201vm2 chmod 004 /mnt/lustre/d18.sanity-sec CMD: trevis-201vm2 chmod 005 /mnt/lustre/d18.sanity-sec CMD: trevis-201vm2 chmod 006 /mnt/lustre/d18.sanity-sec
Looking at the MDS (vm4) console log, we see the hang
[32726.826140] Lustre: mdt00_003: service thread pid 819155 was inactive for 41.957 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [32726.826328] Pid: 816983, comm: mdt_rdpg00_000 4.18.0-240.1.1.el8_lustre.x86_64 #1 SMP Wed Dec 9 21:09:51 UTC 2020 [32726.829004] Lustre: Skipped 1 previous similar message [32726.830561] Call Trace TBD: [32726.831789] Pid: 819155, comm: mdt00_003 4.18.0-240.1.1.el8_lustre.x86_64 #1 SMP Wed Dec 9 21:09:51 UTC 2020 [32726.833277] Call Trace TBD: [32808.745683] INFO: task txg_quiesce:816666 blocked for more than 120 seconds. [32808.746826] Tainted: P OE --------- - - 4.18.0-240.1.1.el8_lustre.x86_64 #1 [32808.748143] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [32808.749328] txg_quiesce D 0 816666 2 0x80004080 [32808.750184] Call Trace: [32808.750601] __schedule+0x2a6/0x700 [32808.751160] schedule+0x38/0xa0 [32808.751671] cv_wait_common+0xfb/0x130 [spl] [32808.752325] ? finish_wait+0x80/0x80 [32808.752958] txg_quiesce_thread+0x2ac/0x3a0 [zfs] [32808.753741] ? txg_sync_thread+0x480/0x480 [zfs] [32808.754470] ? __thread_exit+0x20/0x20 [spl] [32808.755136] thread_generic_wrapper+0x6f/0x80 [spl] [32808.755892] kthread+0x112/0x130 [32808.756397] ? kthread_flush_work_fn+0x10/0x10 [32808.757084] ret_from_fork+0x35/0x40 [32808.757657] INFO: task mdt_rdpg00_000:816983 blocked for more than 120 seconds. [32808.758764] Tainted: P OE --------- - - 4.18.0-240.1.1.el8_lustre.x86_64 #1 [32808.760077] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [32808.761260] mdt_rdpg00_000 D 0 816983 2 0x80004080 [32808.762114] Call Trace: [32808.762502] __schedule+0x2a6/0x700 [32808.763046] schedule+0x38/0xa0 [32808.763536] rwsem_down_write_slowpath+0x32d/0x4e0 [32808.764303] ? dmu_tx_callback_register+0x57/0x80 [zfs] [32808.765124] qti_lqes_write_lock+0x87/0x190 [lquota] [32808.765896] qmt_dqacq0+0x4eb/0x2380 [lquota] [32808.766566] ? lqe_locate_find+0x546/0x7c0 [lquota] [32808.767316] ? qmt_dqacq+0x668/0x790 [lquota] [32808.767996] qmt_dqacq+0x668/0x790 [lquota] [32808.768669] mdt_quota_dqacq+0x59/0x120 [mdt] [32808.769391] tgt_request_handle+0xc78/0x1910 [ptlrpc] [32808.770205] ptlrpc_server_handle_request+0x31a/0xba0 [ptlrpc] [32808.771130] ptlrpc_main+0xba4/0x14a0 [ptlrpc] [32808.771910] ? __schedule+0x2ae/0x700 [32808.772528] ? ptlrpc_register_service+0xfb0/0xfb0 [ptlrpc] [32808.773381] kthread+0x112/0x130 [32808.773908] ? kthread_flush_work_fn+0x10/0x10 [32808.774601] ret_from_fork+0x35/0x40 [32808.775157] INFO: task qmt_reba_lustre:817021 blocked for more than 120 seconds. [32808.776291] Tainted: P OE --------- - - 4.18.0-240.1.1.el8_lustre.x86_64 #1 [32808.777627] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [32808.778806] qmt_reba_lustre D 0 817021 2 0x80004080 [32808.779652] Call Trace: [32808.780033] __schedule+0x2a6/0x700 [32808.780586] schedule+0x38/0xa0 [32808.781080] rwsem_down_write_slowpath+0x32d/0x4e0 [32808.781827] ? cfs_hash_bd_lookup_intent+0x2a/0xc0 [libcfs] [32808.782688] qmt_reba_thread+0x6ca/0x9b0 [lquota] [32808.783407] ? qmt_glimpse_lock.isra.19+0xfb0/0xfb0 [lquota] [32808.784269] kthread+0x112/0x130 [32808.784777] ? kthread_flush_work_fn+0x10/0x10 [32808.785452] ret_from_fork+0x35/0x40 [32808.786011] INFO: task mdt00_003:819155 blocked for more than 120 seconds. [32808.787045] Tainted: P OE --------- - - 4.18.0-240.1.1.el8_lustre.x86_64 #1 [32808.788345] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [32808.789522] mdt00_003 D 0 819155 2 0x80004080 [32808.790365] Call Trace: [32808.790760] __schedule+0x2a6/0x700 [32808.791290] ? __wake_up_common+0x7a/0x190 [32808.791922] schedule+0x38/0xa0 [32808.792406] io_schedule+0x12/0x40 [32808.792941] cv_wait_common+0xaf/0x130 [spl] [32808.793592] ? finish_wait+0x80/0x80 [32808.794184] txg_wait_synced_impl+0xc2/0x110 [zfs] [32808.794967] txg_wait_synced+0xc/0x40 [zfs] [32808.795619] osd_trans_stop+0x51f/0x560 [osd_zfs] [32808.796374] top_trans_stop+0x9f/0xd30 [ptlrpc] [32808.797089] ? mdd_acl_chmod+0x185/0x4b0 [mdd] [32808.797791] lod_trans_stop+0x9b/0x340 [lod] [32808.798453] ? mdd_attr_set+0x54a/0xc20 [mdd] [32808.799129] mdd_trans_stop+0x29/0x17f [mdd] [32808.799808] mdt_reint_setattr+0xc7a/0x12a0 [mdt] [32808.800565] ? nodemap_get_from_exp+0xea/0x270 [ptlrpc] [32808.801372] mdt_reint_rec+0x11f/0x250 [mdt] [32808.802044] mdt_reint_internal+0x498/0x780 [mdt] [32808.802778] mdt_reint+0x5e/0x100 [mdt] [32808.803400] tgt_request_handle+0xc78/0x1910 [ptlrpc] [32808.804212] ptlrpc_server_handle_request+0x31a/0xba0 [ptlrpc] [32808.805135] ptlrpc_main+0xba4/0x14a0 [ptlrpc] [32808.805824] ? __schedule+0x2ae/0x700 [32808.806412] ? ptlrpc_register_service+0xfb0/0xfb0 [ptlrpc] [32808.808415] kthread+0x112/0x130 [32808.808950] ? kthread_flush_work_fn+0x10/0x10 [32808.809648] ret_from_fork+0x35/0x40
Logs for more hangs are at
https://testing.whamcloud.com/test_sets/a8822b08-9e99-4320-8750-fbfffad657a5
https://testing.whamcloud.com/test_sets/713a62aa-3393-4b43-8178-87398949b6c0
https://testing.whamcloud.com/test_sets/a801aef1-a4d8-4470-a54e-49c953f2c246