Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.13.0, Lustre 2.12.2
-
RHEL7.6 servers running ZFS 0.8.1
-
3
-
9223372036854775807
Description
Our production system ran into the follow lockup on the MDS server:
[Fri Jul 19 13:54:38 2019] Lustre: f2-MDT0001: haven't heard from client 612e6326-dce7-70db-0049-d0bf81057df3 (at 10.10.33.4@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8aabc86c2400, cur 1563559146 expire 1563558996 last 1563558919
[Fri Jul 19 13:55:31 2019] Lustre: f2-MDT0001: Connection restored to 612e6326-dce7-70db-0049-d0bf81057df3 (at 10.10.33.4@o2ib2)
[Fri Jul 19 13:55:31 2019] Lustre: Skipped 1 previous similar message
[Fri Jul 19 14:16:43 2019] INFO: task mdt02_001:34191 blocked for more than 120 seconds.
[Fri Jul 19 14:16:43 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Jul 19 14:16:43 2019] mdt02_001 D ffff8ab22689e180 0 34191 2 0x00000000
[Fri Jul 19 14:16:43 2019] Call Trace:
[Fri Jul 19 14:16:43 2019] [<ffffffffa6767152>] ? mutex_lock+0x12/0x2f
[Fri Jul 19 14:16:43 2019] [<ffffffffa6768ed9>] schedule+0x29/0x70
[Fri Jul 19 14:16:43 2019] [<ffffffffa676a7c5>] rwsem_down_write_failed+0x225/0x3a0
[Fri Jul 19 14:16:43 2019] [<ffffffffa6387257>] call_rwsem_down_write_failed+0x17/0x30
[Fri Jul 19 14:16:43 2019] [<ffffffffa676820d>] down_write+0x2d/0x3d
[Fri Jul 19 14:16:43 2019] [<ffffffffc19c0417>] lod_qos_statfs_update+0x97/0x2b0 [lod]
[Fri Jul 19 14:16:43 2019] [<ffffffffc19c25ba>] lod_qos_prep_create+0x16a/0x1890 [lod]
[Fri Jul 19 14:16:43 2019] [<ffffffffc0edbeab>] ? dbuf_read+0x41b/0x5c0 [zfs]
[Fri Jul 19 14:16:43 2019] [<ffffffffc17151d1>] ? qsd_op_begin+0xb1/0x4b0 [lquota]
[Fri Jul 19 14:16:43 2019] [<ffffffffc1760d0a>] ? osd_declare_quota+0x29a/0x450 [osd_zfs]
[Fri Jul 19 14:16:43 2019] [<ffffffffc19c3ef5>] lod_prepare_create+0x215/0x2e0 [lod]
[Fri Jul 19 14:16:43 2019] [<ffffffffc19b3e0e>] lod_declare_striped_create+0x1ee/0x980 [lod]
[Fri Jul 19 14:16:43 2019] [<ffffffffc19c44bf>] ? lod_sub_declare_create+0xdf/0x210 [lod]
[Fri Jul 19 14:16:43 2019] [<ffffffffc19b86e4>] lod_declare_create+0x204/0x590 [lod]
[Fri Jul 19 14:16:43 2019] [<ffffffffc12e5489>] ? lu_context_refill+0x19/0x50 [obdclass]
[Fri Jul 19 14:16:43 2019] [<ffffffffc1a2ec32>] mdd_declare_create_object_internal+0xe2/0x2f0 [mdd]
[Fri Jul 19 14:16:43 2019] [<ffffffffc1a1e6bc>] mdd_declare_create+0x4c/0xcb0 [mdd]
[Fri Jul 19 14:16:43 2019] [<ffffffffc1a22827>] mdd_create+0x897/0x14b0 [mdd]
[Fri Jul 19 14:16:43 2019] [<ffffffffc18c1f60>] mdt_reint_open+0x19d0/0x27d0 [mdt]
[Fri Jul 19 14:16:43 2019] [<ffffffffc12f92b8>] ? upcall_cache_get_entry+0x218/0x8b0 [obdclass]
[Fri Jul 19 14:16:43 2019] [<ffffffffc18b4fa3>] mdt_reint_rec+0x83/0x210 [mdt]
[Fri Jul 19 14:16:43 2019] [<ffffffffc18931b3>] mdt_reint_internal+0x6e3/0xaf0 [mdt]
[Fri Jul 19 14:16:43 2019] [<ffffffffc189f706>] ? mdt_intent_fixup_resent+0x36/0x220 [mdt]
[Fri Jul 19 14:16:43 2019] [<ffffffffc189f972>] mdt_intent_open+0x82/0x3a0 [mdt]
[Fri Jul 19 14:16:43 2019] [<ffffffffc12c4129>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
[Fri Jul 19 14:16:43 2019] [<ffffffffc189da18>] mdt_intent_policy+0x2e8/0xd00 [mdt]
[Fri Jul 19 14:16:43 2019] [<ffffffffc189f8f0>] ? mdt_intent_fixup_resent+0x220/0x220 [mdt]
[Fri Jul 19 14:16:43 2019] [<ffffffffc14cfd26>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc]
[Fri Jul 19 14:16:43 2019] [<ffffffffc0d63033>] ? cfs_hash_bd_add_locked+0x63/0x80 [libcfs]
[Fri Jul 19 14:16:43 2019] [<ffffffffc0d667be>] ? cfs_hash_add+0xbe/0x1a0 [libcfs]
[Fri Jul 19 14:16:43 2019] [<ffffffffc14f8587>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc]
[Fri Jul 19 14:16:43 2019] [<ffffffffc15206d0>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]
[Fri Jul 19 14:16:43 2019] [<ffffffffc15806c2>] tgt_enqueue+0x62/0x210 [ptlrpc]
[Fri Jul 19 14:16:43 2019] [<ffffffffc158501a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[Fri Jul 19 14:16:43 2019] [<ffffffffc1560a51>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
[Fri Jul 19 14:16:43 2019] [<ffffffffc0d57bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs]
[Fri Jul 19 14:16:43 2019] [<ffffffffc152a80b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[Fri Jul 19 14:16:43 2019] [<ffffffffc1527695>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]
[Fri Jul 19 14:16:43 2019] [<ffffffffa60d6b72>] ? default_wake_function+0x12/0x20
[Fri Jul 19 14:16:43 2019] [<ffffffffa60cbc0b>] ? __wake_up_common+0x5b/0x90
[Fri Jul 19 14:16:43 2019] [<ffffffffc152e13c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc]
[Fri Jul 19 14:16:43 2019] [<ffffffffc152d640>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
[Fri Jul 19 14:16:43 2019] [<ffffffffa60c1da1>] kthread+0xd1/0xe0
[Fri Jul 19 14:16:43 2019] [<ffffffffa60c1cd0>] ? insert_kthread_work+0x40/0x40
[Fri Jul 19 14:16:43 2019] [<ffffffffa6775c1d>] ret_from_fork_nospec_begin+0x7/0x21
[Fri Jul 19 14:16:43 2019] [<ffffffffa60c1cd0>] ? insert_kthread_work+0x40/0x40