Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14621

Broken lock-transaction ordering in MDS code

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • Upstream
    • None
    • 3
    • 9223372036854775807

    Description

      there are few places where local lock is taken before transaction starts which breaks transaction-then-locks rule:

       lbug_with_loc.cold.6+0x18/0x18 [libcfs]
       ? osd_trans_start+0x2f1/0x5a0 [osd_ldiskfs]
       osd_trans_start+0x2fd/0x5a0 [osd_ldiskfs]
       top_trans_start+0x40c/0x940 [ptlrpc]
       ? mdd_orphan_declare_delete+0x176/0x5c0 [mdd]
       mdd_orphan_cleanup_thread+0xaa1/0x18f0 [mdd]
       ? mdd_orphan_declare_delete+0x5c0/0x5c0 [mdd]
       kthread+0x11a/0x130
      
      [<0>] libcfs_call_trace+0x76/0xa0 [libcfs]
      [<0>] lbug_with_loc+0x3e/0x80 [libcfs]
      [<0>] osd_trans_start+0x2fd/0x5a0 [osd_ldiskfs]
      [<0>] top_trans_start+0x40c/0x940 [ptlrpc]
      [<0>] mdd_swap_layouts+0x12f5/0x2350 [mdd]
      [<0>] mdt_swap_layouts+0x40e/0x9a0 [mdt]
      
      [<0>] osd_trans_start+0x2fd/0x5a0 [osd_ldiskfs]
      [<0>] top_trans_start+0x40c/0x940 [ptlrpc]
      [<0>] mdd_xattr_set+0x18c7/0x2e50 [mdd]
      [<0>] mdt_close_handle_layouts+0xe23/0x1160 [mdt]
      [<0>] mdt_mfd_close+0x5af/0x3110 [mdt]
      [<0>] mdt_close_internal+0xfd/0x230 [mdt]
      [<0>] mdt_close+0x60a/0x840 [mdt]
      
      Call Trace:
      [<0>] libcfs_call_trace+0x76/0xa0 [libcfs]
      [<0>] lbug_with_loc+0x3e/0x80 [libcfs]
      [<0>] osd_trans_start+0x2fd/0x5a0 [osd_ldiskfs]
      [<0>] __lfsck_layout_update_pfid+0x15f/0x580 [lfsck]
      [<0>] lfsck_layout_slave_in_notify_local+0x4ed/0x710 [lfsck]
      [<0>] lfsck_in_notify_local+0x81/0x3f0 [lfsck]
      [<0>] ofd_inconsistency_verification_main+0x1f2/0xa70 [ofd]
      

      it's trivial to reproduce with the following patch:

      index 04984f5d9f..b91b7c73e5 100644
      --- a/lustre/osd-ldiskfs/osd_handler.c
      +++ b/lustre/osd-ldiskfs/osd_handler.c
      @@ -1913,6 +1913,9 @@ static int osd_trans_start(const struct lu_env *env, struct dt_device *d,
       
              ENTRY;
       
      +       LASSERT(oti->oti_w_locks == 0);
      +       LASSERT(oti->oti_r_locks == 0);
      +
              LASSERT(current->journal_info == NULL);
       
              oh = container_of(th, struct osd_thandle, ot_super);
      

      Attachments

        Issue Links

          Activity

            People

              bzzz Alex Zhuravlev
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: