Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12606

mutex lock up in dbuf_read()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.13.0, Lustre 2.12.2
    • RHEL7.6 servers running ZFS 0.8.1
    • 3
    • 9223372036854775807

    Description

      Our production system ran into the follow lockup on the MDS server:

      [Fri Jul 19 13:54:38 2019] Lustre: f2-MDT0001: haven't heard from client 612e6326-dce7-70db-0049-d0bf81057df3 (at 10.10.33.4@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8aabc86c2400, cur 1563559146 expire 1563558996 last 1563558919

      [Fri Jul 19 13:55:31 2019] Lustre: f2-MDT0001: Connection restored to 612e6326-dce7-70db-0049-d0bf81057df3 (at 10.10.33.4@o2ib2)

      [Fri Jul 19 13:55:31 2019] Lustre: Skipped 1 previous similar message

      [Fri Jul 19 14:16:43 2019] INFO: task mdt02_001:34191 blocked for more than 120 seconds.

      [Fri Jul 19 14:16:43 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

      [Fri Jul 19 14:16:43 2019] mdt02_001       D ffff8ab22689e180     0 34191      2 0x00000000

      [Fri Jul 19 14:16:43 2019] Call Trace:

      [Fri Jul 19 14:16:43 2019]  [<ffffffffa6767152>] ? mutex_lock+0x12/0x2f

      [Fri Jul 19 14:16:43 2019]  [<ffffffffa6768ed9>] schedule+0x29/0x70

      [Fri Jul 19 14:16:43 2019]  [<ffffffffa676a7c5>] rwsem_down_write_failed+0x225/0x3a0

      [Fri Jul 19 14:16:43 2019]  [<ffffffffa6387257>] call_rwsem_down_write_failed+0x17/0x30

      [Fri Jul 19 14:16:43 2019]  [<ffffffffa676820d>] down_write+0x2d/0x3d

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc19c0417>] lod_qos_statfs_update+0x97/0x2b0 [lod]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc19c25ba>] lod_qos_prep_create+0x16a/0x1890 [lod]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc0edbeab>] ? dbuf_read+0x41b/0x5c0 [zfs]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc17151d1>] ? qsd_op_begin+0xb1/0x4b0 [lquota]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc1760d0a>] ? osd_declare_quota+0x29a/0x450 [osd_zfs]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc19c3ef5>] lod_prepare_create+0x215/0x2e0 [lod]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc19b3e0e>] lod_declare_striped_create+0x1ee/0x980 [lod]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc19c44bf>] ? lod_sub_declare_create+0xdf/0x210 [lod]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc19b86e4>] lod_declare_create+0x204/0x590 [lod]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc12e5489>] ? lu_context_refill+0x19/0x50 [obdclass]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc1a2ec32>] mdd_declare_create_object_internal+0xe2/0x2f0 [mdd]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc1a1e6bc>] mdd_declare_create+0x4c/0xcb0 [mdd]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc1a22827>] mdd_create+0x897/0x14b0 [mdd]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc18c1f60>] mdt_reint_open+0x19d0/0x27d0 [mdt]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc12f92b8>] ? upcall_cache_get_entry+0x218/0x8b0 [obdclass]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc18b4fa3>] mdt_reint_rec+0x83/0x210 [mdt]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc18931b3>] mdt_reint_internal+0x6e3/0xaf0 [mdt]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc189f706>] ? mdt_intent_fixup_resent+0x36/0x220 [mdt]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc189f972>] mdt_intent_open+0x82/0x3a0 [mdt]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc12c4129>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc189da18>] mdt_intent_policy+0x2e8/0xd00 [mdt]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc189f8f0>] ? mdt_intent_fixup_resent+0x220/0x220 [mdt]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc14cfd26>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc0d63033>] ? cfs_hash_bd_add_locked+0x63/0x80 [libcfs]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc0d667be>] ? cfs_hash_add+0xbe/0x1a0 [libcfs]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc14f8587>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc15206d0>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc15806c2>] tgt_enqueue+0x62/0x210 [ptlrpc]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc158501a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc1560a51>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc0d57bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc152a80b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc1527695>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffa60d6b72>] ? default_wake_function+0x12/0x20

      [Fri Jul 19 14:16:43 2019]  [<ffffffffa60cbc0b>] ? __wake_up_common+0x5b/0x90

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc152e13c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffc152d640>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]

      [Fri Jul 19 14:16:43 2019]  [<ffffffffa60c1da1>] kthread+0xd1/0xe0

      [Fri Jul 19 14:16:43 2019]  [<ffffffffa60c1cd0>] ? insert_kthread_work+0x40/0x40

      [Fri Jul 19 14:16:43 2019]  [<ffffffffa6775c1d>] ret_from_fork_nospec_begin+0x7/0x21

      [Fri Jul 19 14:16:43 2019]  [<ffffffffa60c1cd0>] ? insert_kthread_work+0x40/0x40

       

      Attachments

        Activity

          People

            bzzz Alex Zhuravlev
            simmonsja James A Simmons
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: