Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5994

DT transaction start and object lock ordering

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.7.0
    • None
    • 3
    • 16718

    Description

      lfsck_layout_slave_conditional_destroy()

      and

      lfsck_layout_slave_repair_pfid()

      both call dt_trans_start_local() while holding a DT write lock on an object. This was found by code inspection.

      I also ran with the following:

      diff --git a/lustre/osd-ldiskfs/osd_handler.c b/lustre/osd-ldiskfs/osd_handler.c
      index 80d1e67..ed15b0c 100644
      --- a/lustre/osd-ldiskfs/osd_handler.c
      +++ b/lustre/osd-ldiskfs/osd_handler.c
      @@ -1058,6 +1058,8 @@ int osd_trans_start(const struct lu_env *env, struct dt_device *d,
               if (!IS_ERR(jh)) {
                       oh->ot_handle = jh;
                       LASSERT(oti->oti_txns == 0);
      +               LASSERT(oti->oti_w_locks == 0);
      +               LASSERT(oti->oti_r_locks == 0);
                       lu_context_init(&th->th_ctx, th->th_tags);
                       lu_context_enter(&th->th_ctx);
      

      Running sanity-lfsck.sh I was able to trigger a crash from

      lfsck_layout_slave_repair_pfid()

      :

      [ 1827.711965] Lustre: DEBUG MARKER: == sanity-lfsck test 19b: OST-object inconsistency self repair == 11:01:32 (1417798892)
      [ 1827.813194] Lustre: DEBUG MARKER: cancel_lru_locks osc start
      [ 1827.838214] Lustre: *** cfs_fail_loc=1611, val=0***
      [ 1827.839821] Lustre: Skipped 3 previous similar messages
      [ 1827.877070] Lustre: DEBUG MARKER: cancel_lru_locks osc stop
      [ 1827.957214] LustreError: 21862:0:(osd_handler.c:1061:osd_trans_start()) ASSERTION( oti->oti_w_locks == 0 ) failed:
      [ 1827.960100] LustreError: 21862:0:(osd_handler.c:1061:osd_trans_start()) LBUG
      [ 1827.961698] Pid: 21862, comm: inconsistency_v
      [ 1827.962700]
      [ 1827.962702] Call Trace:
      [ 1827.963692]  [<ffffffffa052f8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [ 1827.965393]  [<ffffffffa052fec7>] lbug_with_loc+0x47/0xb0 [libcfs]
      [ 1827.966913]  [<ffffffffa0d98809>] osd_trans_start+0x389/0x730 [osd_ldiskfs]
      [ 1827.968553]  [<ffffffffa0c44ed2>] lfsck_layout_slave_in_notify+0x982/0xcf0 [lfsck]
      [ 1827.969963]  [<ffffffffa0c0cd37>] lfsck_in_notify+0xf7/0x5a0 [lfsck]
      [ 1827.971155]  [<ffffffffa0fe6037>] ofd_inconsistency_verification_main+0x367/0xdb0 [ofd]
      [ 1827.972639]  [<ffffffff8105e59d>] ? finish_task_switch+0x7d/0x110
      [ 1827.973774]  [<ffffffff8105e568>] ? finish_task_switch+0x48/0x110
      [ 1827.974896]  [<ffffffff81061d90>] ? default_wake_function+0x0/0x20
      [ 1827.976039]  [<ffffffffa0fe5cd0>] ? ofd_inconsistency_verification_main+0x0/0xdb0 [ofd]
      [ 1827.977539]  [<ffffffff8109e856>] kthread+0x96/0xa0
      [ 1827.978448]  [<ffffffff8100c30a>] child_rip+0xa/0x20
      [ 1827.979368]  [<ffffffff815562e0>] ? _spin_unlock_irq+0x30/0x40
      [ 1827.980507]  [<ffffffff8100bb10>] ? restore_args+0x0/0x30
      [ 1827.981704]  [<ffffffff8109e7c0>] ? kthread+0x0/0xa0
      [ 1827.982787]  [<ffffffff8100c300>] ? child_rip+0x0/0x20
      [ 1827.983912]
      

      I also found that OFD also does this in enough places that I added

       || strstr(current->comm, "ll_ost") != NULL

      to the assertions above. Should OFD be following the same order?

      Attachments

        Issue Links

          Activity

            [LU-5994] DT transaction start and object lock ordering

            It will be fixed via LU-10048.

            yong.fan nasf (Inactive) added a comment - It will be fixed via LU-10048 .

            Part of the issues for the order of starting transaction and locking object will be handled via the patch http://review.whamcloud.com/23689

            yong.fan nasf (Inactive) added a comment - Part of the issues for the order of starting transaction and locking object will be handled via the patch http://review.whamcloud.com/23689

            Then I would say that it will not be a small change. We need to unify the dt_lock and transaction order on both MDT and OST, that may be affect not only scalabilities but also potential deadlock. Currently, the order of dt_lock and transaction in LFSCK matches both MDT and OST (different functions use different orders).
            Since Lustre-2.7 feature landing has been already closed, we can resolve this known main technical debt in next release.

            yong.fan nasf (Inactive) added a comment - Then I would say that it will not be a small change. We need to unify the dt_lock and transaction order on both MDT and OST, that may be affect not only scalabilities but also potential deadlock. Currently, the order of dt_lock and transaction in LFSCK matches both MDT and OST (different functions use different orders). Since Lustre-2.7 feature landing has been already closed, we can resolve this known main technical debt in next release.

            well, I think it'd be very good to have a same order of course. as for MDS: basically we use dt_write_lock() to control nlink+/nlink+. IIRC. if we take the lock outside of a transaction, then basically all the operations in a directory are serialized. actually both OSDs protect nlink internally and I wonder wouldn't it be enough. originally this "local" locking was introduced in 2.0 to support a mode where LDLM lock is taken for a node, not for a thread. something to take into account, I guess.

            bzzz Alex Zhuravlev added a comment - well, I think it'd be very good to have a same order of course. as for MDS: basically we use dt_write_lock() to control nlink+ /nlink +. IIRC. if we take the lock outside of a transaction, then basically all the operations in a directory are serialized. actually both OSDs protect nlink internally and I wonder wouldn't it be enough. originally this "local" locking was introduced in 2.0 to support a mode where LDLM lock is taken for a node, not for a thread. something to take into account, I guess.

            If according to local filesystem order, we should take dt_lock before transaction started, right? But as a distributed filesystem, what's the potential trouble if we change MDS side order to match OST side? And what we should do for this ticket? We have kept such confused orders for a long time.

            yong.fan nasf (Inactive) added a comment - If according to local filesystem order, we should take dt_lock before transaction started, right? But as a distributed filesystem, what's the potential trouble if we change MDS side order to match OST side? And what we should do for this ticket? We have kept such confused orders for a long time.

            yes, this is a known technical debt.. we always wanted to have locks-the-transaction order (this is what the majority of the filesystem do) and implemented OFD this way, but that would serialize directory operations on MDS..

            bzzz Alex Zhuravlev added a comment - yes, this is a known technical debt.. we always wanted to have locks-the-transaction order (this is what the majority of the filesystem do) and implemented OFD this way, but that would serialize directory operations on MDS..

            Hm, I remembered that why LFSCK used such transaction/dt_lock order. Normally, or say on MDT, we start transaction firstly, then take dt_lock; but on the OST, we use the reverse order, for example, ofd_object_destroy().

            I do not know why the OFD has different behaviour as MDT does.
            Alex, what's your suggestion for that?

            yong.fan nasf (Inactive) added a comment - Hm, I remembered that why LFSCK used such transaction/dt_lock order. Normally, or say on MDT, we start transaction firstly, then take dt_lock; but on the OST, we use the reverse order, for example, ofd_object_destroy(). I do not know why the OFD has different behaviour as MDT does. Alex, what's your suggestion for that?

            People

              yong.fan nasf (Inactive)
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: