Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18658

circular locking dependency for lod QOS code

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.17.0
    • System running RHEL8 debug kernel
    • 3
    • 9223372036854775807

    Description

      For the lod QOS code we have the following in testing it looks like we are trying to take

      the ltd qos semaphore under the ldo_layout_mutex mutex.

       

      [ 1232.031551] ====================================================== [ 1232.033770] WARNING: possible circular locking dependency detected [ 1232.035514] 4.18.0rh8.10-debug #5 Tainted: G O -------- - - [ 1232.038158] ------------------------------------------------------ [ 1232.040160] tgt_recover_0/22674 is trying to acquire lock: [ 1232.041995] ffff94e34d3ca7b8 (&ltd->ltd_rw_sem){+++}{3:3}, at: lod_getref+0x25/0x60 [lod] [ 1232.044821] [ 1232.044821] but task is already holding lock: [ 1232.047118] ffff94e46a9138c8 (&lod_obj>ldo_layout_mutex){..}{3:3}, at: lod_use_defined_striping+0x5a/0xce0 [lod] [ 1232.050812] [ 1232.050812] which lock already depends on the new lock. [ 1232.050812] [ 1232.053902] [ 1232.053902] the existing dependency chain (in reverse order) is: [ 1232.056101] [ 1232.056101] -> #2 (&lod_obj>ldo_layout_mutex){.+.}-{3:3}: [ 1232.058460] __lock_acquire+0x655/0xee0 [ 1232.060016] lock_acquire+0x16a/0x540 [ 1232.061345] __mutex_lock+0xd0/0x1000 [ 1232.062766] mutex_lock_nested+0x27/0x30 [ 1232.064414] lod_obj_for_each_stripe+0x61/0x400 [lod] [ 1232.066372] lod_check_and_reserve_ost.isra.15+0x595/0xe90 [lod] [ 1232.068813] lod_ost_alloc_rr+0x515/0xce0 [lod] [ 1232.070812] lod_qos_prep_create+0x135e/0x1c10 [lod] [ 1232.072522] lod_prepare_create+0x202/0x470 [lod] [ 1232.073805] lod_declare_striped_create+0x270/0xf60 [lod] [ 1232.075718] lod_declare_create+0x3d4/0x9c0 [lod] [ 1232.077600] mdd_declare_create_object_internal+0x107/0x4a0 [mdd] [ 1232.079851] mdd_declare_create_object.isra.25+0x55/0xc40 [mdd] [ 1232.082136] mdd_declare_create+0x59/0x410 [mdd] [ 1232.083632] mdd_create+0x5bd/0x1d00 [mdd] [ 1232.084935] mdt_reint_open+0x423b/0x4550 [mdt] [ 1232.086414] mdt_reint_rec+0x139/0x2c0 [mdt] [ 1232.088160] mdt_reint_internal+0x6a0/0xbf0 [mdt] [ 1232.090057] mdt_intent_open+0x180/0x5b0 [mdt] [ 1232.091711] mdt_intent_opc.constprop.43+0x153/0xfb0 [mdt] [ 1232.093520] mdt_intent_policy+0x14b/0x670 [mdt] [ 1232.095316] ldlm_lock_enqueue+0x43c/0xcd0 [ptlrpc] [ 1232.097566] ldlm_handle_enqueue+0x408/0x2290 [ptlrpc] [ 1232.099916] tgt_enqueue+0xd0/0x300 [ptlrpc] [ 1232.101868] tgt_handle_request0+0x137/0xaf0 [ptlrpc] [ 1232.104090] tgt_request_handle+0x351/0x1c10 [ptlrpc] [ 1232.106173] ptlrpc_server_handle_request+0x374/0x1320 [ptlrpc] [ 1232.108611] ptlrpc_main+0xd2a/0x1450 [ptlrpc] [ 1232.109805] kthread+0x1d7/0x210 [ 1232.110550] ret_from_fork+0x24/0x30

      1232.111481] > #1 (&ltd>ltd_qos.lq_rw_sem){++++}-{3:3}: [ 1232.113706] __lock_acquire+0x655/0xee0 [ 1232.115207] lock_acquire+0x16a/0x540 [ 1232.116717] down_write+0x61/0x3e0 [ 1232.118113] lod_qos_calc_rr.isra.10+0x149/0x6a0 [lod] [ 1232.120343] lod_ost_alloc_rr+0x1bf/0xce0 [lod] [ 1232.122265] lod_qos_prep_create+0x135e/0x1c10 [lod] [ 1232.124022] lod_prepare_create+0x202/0x470 [lod] [ 1232.125524] lod_declare_striped_create+0x270/0xf60 [lod] [ 1232.127404] lod_declare_create+0x3d4/0x9c0 [lod] [ 1232.129189] mdd_declare_create_object_internal+0x107/0x4a0 [mdd] [ 1232.131737] mdd_declare_create_object.isra.25+0x55/0xc40 [mdd] [ 1232.133990] mdd_declare_create+0x59/0x410 [mdd] [ 1232.136007] mdd_create+0x5bd/0x1d00 [mdd] [ 1232.137797] mdt_reint_open+0x423b/0x4550 [mdt] [ 1232.139782] mdt_reint_rec+0x139/0x2c0 [mdt] [ 1232.141663] mdt_reint_internal+0x6a0/0xbf0 [mdt] [ 1232.143590] mdt_intent_open+0x180/0x5b0 [mdt] [ 1232.145273] mdt_intent_opc.constprop.43+0x153/0xfb0 [mdt] [ 1232.146797] mdt_intent_policy+0x14b/0x670 [mdt] [ 1232.148832] ldlm_lock_enqueue+0x43c/0xcd0 [ptlrpc] [ 1232.150679] ldlm_handle_enqueue+0x408/0x2290 [ptlrpc] [ 1232.152726] tgt_enqueue+0xd0/0x300 [ptlrpc] [ 1232.154290] tgt_handle_request0+0x137/0xaf0 [ptlrpc] [ 1232.156353] tgt_request_handle+0x351/0x1c10 [ptlrpc] [ 1232.158508] ptlrpc_server_handle_request+0x374/0x1320 [ptlrpc] [ 1232.160174] ptlrpc_main+0xd2a/0x1450 [ptlrpc] [ 1232.162146] kthread+0x1d7/0x210 [ 1232.163387] ret_from_fork+0x24/0x30

      Attachments

        Issue Links

          Activity

            [LU-18658] circular locking dependency for lod QOS code

            We are doing both!!! RHEL8 reports less issues than RHEL9 so we are focusing on resolving the bugs shared by both. Once RHEL8 is done we will work on RHEL9 issues.

            simmonsja James A Simmons added a comment - We are doing both!!! RHEL8 reports less issues than RHEL9 so we are focusing on resolving the bugs shared by both. Once RHEL8 is done we will work on RHEL9 issues.
            pjones Peter Jones added a comment -

            James

            Why are we using a RHEL8 debug kernel rather than a RHEL9 debug kernel for master?

            Peter

            pjones Peter Jones added a comment - James Why are we using a RHEL8 debug kernel rather than a RHEL9 debug kernel for master? Peter

            People

              pjones Peter Jones
              simmonsja James A Simmons
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: