Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13093

osd_trans_exec_check()) lustre-OST0001: opcode 0: used 8, used now 8, reserved 1 LBUG

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0
    • Lustre 2.12.0, Lustre 2.13.0, Lustre 2.14.0
    • None
    • 3
    • 9223372036854775807

    Description

      During racer the next LBUG happened

      00080000:00000400:0.0:1576770158.744398:0:13105:0:(osd_internal.h:1288:osd_trans_exec_check()) lustre-OST0001: opcode 0: used 8, used now 8, reserved 1
      00080000:00000400:0.0:1576770158.750040:0:13105:0:(osd_handler.c:1801:osd_trans_dump_creds())   create: 0/0/0, destroy: 0/0/0
      00080000:00000400:0.0:1576770158.755956:0:13105:0:(osd_handler.c:1808:osd_trans_dump_creds())   attr_set: 1/1/8, xattr_set: 2/274/0
      00080000:00000400:0.0:1576770158.760212:0:13105:0:(osd_handler.c:1818:osd_trans_dump_creds())   write: 0/0/0, punch: 0/0/0, quota 6/6/0
      00080000:00000400:0.0:1576770158.764684:0:13105:0:(osd_handler.c:1825:osd_trans_dump_creds())   insert: 0/0/0, delete: 0/0/0
      00080000:00000400:0.0:1576770158.768719:0:13105:0:(osd_handler.c:1832:osd_trans_dump_creds())   ref_add: 0/0/0, ref_del: 0/0/0
      00080000:00040000:0.0:1576770158.772504:0:13105:0:(osd_internal.h:1291:osd_trans_exec_check()) LBUG
      
       #2 [ffff99728ef238a0] panic at ffffffffa5b5d55b
       #3 [ffff99728ef23920] lbug_with_loc at ffffffffc0aa48cb [libcfs]
       #4 [ffff99728ef23950] osd_attr_set at ffffffffc11f6f5e [osd_ldiskfs]
       #5 [ffff99728ef239b8] ofd_write_attr_set at ffffffffc135c292 [ofd]
       #6 [ffff99728ef23a28] ofd_commitrw_write at ffffffffc135c99d [ofd]
       #7 [ffff99728ef23ab8] ofd_commitrw at ffffffffc1360dac [ofd]
       #8 [ffff99728ef23b30] tgt_brw_write at ffffffffc0fba34c [ptlrpc]
       #9 [ffff99728ef23ca8] tgt_request_handle at ffffffffc0fbe0ba [ptlrpc]
      
      

      After adding some debug I've found that vfs_dq_transfer spent 7-8 credits during this failure. Here is a full log with a race, one thread do brw_write another do set_attr

      00000100:00100000:1.0:1576770158.667759:0:13105:0:(service.c:2227:ptlrpc_server_handle_request()) Handling RPC req@ffff89779e486050 pname:cluuid+ref:pid:xid:nid:opc:job ll_ost_io00_011:ed527c8c-2be3-96ff-ed07-99e9220c53c7+30:9976:x1653362876819568:12345-192.168.111.36@tcp:4:cat.0
      00080000:00080000:0.0:1576770158.714330:0:13105:0:(osd_handler.c:2777:osd_declare_attr_set()) declare uid 0 -> 0
      00000001:00080000:0.0:1576770158.714333:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 1
      00000001:00080000:0.0:1576770158.714334:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=0 res=1
      00000001:00080000:0.0:1576770158.714337:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 1
      00000001:00080000:0.0:1576770158.714338:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 2
      00000001:00080000:0.0:1576770158.714339:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=0 res=2
      00000001:00080000:0.0:1576770158.714340:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 2
      00080000:00080000:0.0:1576770158.714341:0:13105:0:(osd_handler.c:2785:osd_declare_attr_set()) declare gid 0 -> 0
      00000001:00080000:0.0:1576770158.714341:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 1
      00000001:00080000:0.0:1576770158.714342:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=1 res=1
      00000001:00080000:0.0:1576770158.714343:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 1
      00000001:00080000:0.0:1576770158.714343:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 2
      00000001:00080000:0.0:1576770158.714344:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=1 res=2
      00000001:00080000:0.0:1576770158.714345:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 2
      00000001:00080000:0.0:1576770158.714345:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 2 res 1
      00000001:00080000:0.0:1576770158.714346:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=2 res=1
      00000001:00080000:0.0:1576770158.714347:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 2 res 1
      00000001:00080000:0.0:1576770158.714347:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 2 res 2
      00000001:00080000:0.0:1576770158.714348:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=2 res=2
      00000001:00080000:0.0:1576770158.714349:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 2 res 2
      00000100:00100000:1.0:1576770158.734168:0:13086:0:(nrs_fifo.c:179:nrs_fifo_req_get()) NRS start fifo request from 12345-192.168.111.33@tcp, seq: 202067
      00000100:00100000:1.0:1576770158.734170:0:13086:0:(service.c:2227:ptlrpc_server_handle_request()) Handling RPC req@ffff89778bdd2880 pname:cluuid+ref:pid:xid:nid:opc:job ll_ost00_008:lustre-MDT0001-mdtlov_UUID+17:11480:x1653362832391888:12345-192.168.111.33@tcp:2:osp-syn-1-1.0
      00080000:00080000:1.0:1576770158.734179:0:13086:0:(osd_handler.c:2777:osd_declare_attr_set()) declare uid 0 -> 1892
      00000001:00080000:1.0:1576770158.734179:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 1892 type 0 res 1
      00000001:00080000:1.0:1576770158.734180:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 73 credits for quota uid=1892 type=0 res=1
      00000001:00080000:1.0:1576770158.734181:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 1
      00000001:00080000:1.0:1576770158.734182:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=0 res=1
      00000001:00080000:1.0:1576770158.734183:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 1892 type 0 res 2
      00000001:00080000:1.0:1576770158.734184:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 73 credits for quota uid=1892 type=0 res=2
      00000001:00080000:1.0:1576770158.734215:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 2
      00000001:00080000:1.0:1576770158.734216:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=0 res=2
      00080000:00080000:1.0:1576770158.734218:0:13086:0:(osd_handler.c:2785:osd_declare_attr_set()) declare gid 0 -> 23392
      00000001:00080000:1.0:1576770158.734219:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 23392 type 1 res 1
      00000001:00080000:1.0:1576770158.734221:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 73 credits for quota uid=23392 type=1 res=1
      00000001:00080000:1.0:1576770158.734222:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 1
      00000001:00080000:1.0:1576770158.734223:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=1 res=1
      00000001:00080000:1.0:1576770158.734224:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 23392 type 1 res 2
      00000001:00080000:1.0:1576770158.734225:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 73 credits for quota uid=23392 type=1 res=2
      00000001:00080000:1.0:1576770158.734257:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 2
      00000001:00080000:1.0:1576770158.734259:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=1 res=2
      00000001:00080000:1.0:1576770158.734278:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 2
      00000001:00080000:1.0:1576770158.734280:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 2
      00000001:00080000:1.0:1576770158.734281:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 2 res 2
      00000001:00080000:1.0:1576770158.734282:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=2 res=2
      00080000:00080000:1.0:1576770158.734285:0:13086:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 512, transaction  512
      00080000:00080000:1.0:1576770158.734286:0:13086:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 512, transaction  512
      00080000:00080000:1.0:1576770158.734288:0:13086:0:(osd_handler.c:2933:osd_quota_transfer()) executing vfs_dq_transfer inode 293 uid 0 -> 1892 gid 0 -> 23392
      00080000:00080000:1.0:1576770158.734311:0:13086:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 512, transaction  512
      00080000:00080000:1.0:1576770158.734313:0:13086:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 512, transaction  512
      00080000:00080000:1.0:1576770158.734317:0:13086:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 512, transaction  511
      00002000:00010000:1.0:1576770158.734328:0:13086:0:(ofd_lvb.c:307:ofd_lvbo_update()) res: [0x280000400:0xe81:0x0] updating lvb size from disk: 10642 -> 10642
      00002000:00010000:1.0:1576770158.734330:0:13086:0:(ofd_lvb.c:314:ofd_lvbo_update()) res: [0x280000400:0xe81:0x0] updating lvb mtime from disk: 1576770157 -> 1576770157
      00002000:00010000:1.0:1576770158.734331:0:13086:0:(ofd_lvb.c:320:ofd_lvbo_update()) res: [0x280000400:0xe81:0x0] updating lvb atime from disk: 0 -> 0
      00002000:00010000:1.0:1576770158.734333:0:13086:0:(ofd_lvb.c:326:ofd_lvbo_update()) res: [0x280000400:0xe81:0x0] updating lvb ctime from disk: 1576770158 -> 1576770158
      00002000:00010000:1.0:1576770158.734334:0:13086:0:(ofd_lvb.c:332:ofd_lvbo_update()) res: [0x280000400:0xe81:0x0] updating lvb blocks from disk: 0 -> 0
      00000100:00100000:1.0:1576770158.734368:0:13086:0:(service.c:2277:ptlrpc_server_handle_request()) Handled RPC req@ffff89778bdd2880 pname:cluuid+ref:pid:xid:nid:opc:job ll_ost00_008:lustre-MDT0001-mdtlov_UUID+18:11480:x1653362832391888:12345-192.168.111.33@tcp:2:osp-syn-1-1.0 Request processed in 196us (383us total) trans 4295022387 rc 0/0
      00000100:00100000:1.0:1576770158.734373:0:13086:0:(nrs_fifo.c:241:nrs_fifo_req_stop()) NRS stop fifo request from 12345-192.168.111.33@tcp, seq: 202067
      00080000:00080000:0.0:1576770158.744271:0:13105:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 291, transaction  291
      00080000:00080000:0.0:1576770158.744272:0:13105:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 291, transaction  291
      00080000:00080000:0.0:1576770158.744273:0:13105:0:(osd_handler.c:2933:osd_quota_transfer()) executing vfs_dq_transfer inode 293 uid 1892 -> 0 gid 23392 -> 0
      00080000:00080000:0.0:1576770158.744394:0:13105:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 291, transaction  284
      00080000:00080000:0.0:1576770158.744395:0:13105:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 291, transaction  284
      00080000:00080000:0.0:1576770158.744397:0:13105:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 291, transaction  283
      00080000:00000400:0.0:1576770158.744398:0:13105:0:(osd_internal.h:1288:osd_trans_exec_check()) lustre-OST0001: opcode 0: used 8, used now 8, reserved 1
      00080000:00000400:0.0:1576770158.750040:0:13105:0:(osd_handler.c:1801:osd_trans_dump_creds())   create: 0/0/0, destroy: 0/0/0
      00080000:00000400:0.0:1576770158.755956:0:13105:0:(osd_handler.c:1808:osd_trans_dump_creds())   attr_set: 1/1/8, xattr_set: 2/274/0
      00080000:00000400:0.0:1576770158.760212:0:13105:0:(osd_handler.c:1818:osd_trans_dump_creds())   write: 0/0/0, punch: 0/0/0, quota 6/6/0
      00080000:00000400:0.0:1576770158.764684:0:13105:0:(osd_handler.c:1825:osd_trans_dump_creds())   insert: 0/0/0, delete: 0/0/0
      00080000:00000400:0.0:1576770158.768719:0:13105:0:(osd_handler.c:1832:osd_trans_dump_creds())   ref_add: 0/0/0, ref_del: 0/0/0
      00080000:00040000:0.0:1576770158.772504:0:13105:0:(osd_internal.h:1291:osd_trans_exec_check()) LBUG
      

      So here, brw_write reserved credit for uid 0 -> uid 0 quota. But after OST_SETATTR happened and changed uid from 0 to 1892. brw_write continued to set attr and failed because not enough credits for 1892 -> 0 transfer.
      It looks like locking fot setattr with ofd_read_lock is broken since commit dcb0f2f03f7 LU-10048 ofd: take local locks within transaction.

      Attachments

        Issue Links

          Activity

            [LU-13093] osd_trans_exec_check()) lustre-OST0001: opcode 0: used 8, used now 8, reserved 1 LBUG

            well, OST_WRITE may and should change credentials as this is the way to initialize UID/GID correctly given objects are precreated with no knowledge who will be using them.

            bzzz Alex Zhuravlev added a comment - well, OST_WRITE may and should change credentials as this is the way to initialize UID/GID correctly given objects are precreated with no knowledge who will be using them.

            It looks bug don't related to the credits at all, but it related to the brw internal state.

            based on Alex comment early
            >> ofd_write_attrset(pid 13105) do declaration for 0 -> 0, then another thread(pid 13086) changed uid, and then ofd_write_attrset do ofd_read_lock, setattr.

            it mean - brw should skip any attribute changes during write, but it looks, brw tries to restore original owner (as setattr executed).

            shadow Alexey Lyashkov added a comment - It looks bug don't related to the credits at all, but it related to the brw internal state. based on Alex comment early >> ofd_write_attrset(pid 13105) do declaration for 0 -> 0, then another thread(pid 13086) changed uid, and then ofd_write_attrset do ofd_read_lock, setattr. it mean - brw should skip any attribute changes during write, but it looks, brw tries to restore original owner (as setattr executed).

            yes, I'm aware about credits a bit. in regular setups the journal is really huge as writes (especially fragmented) require more credits and normally number of threads is high as well. where possible it does makes sense to optimize credits, of course (e.g. LU-4611), but we can't change locking - DoM is the point where we had to unify it with MDT's order.

            bzzz Alex Zhuravlev added a comment - yes, I'm aware about credits a bit. in regular setups the journal is really huge as writes (especially fragmented) require more credits and normally number of threads is high as well. where possible it does makes sense to optimize credits, of course (e.g. LU-4611 ), but we can't change locking - DoM is the point where we had to unify it with MDT's order.

            not really except a tiny local setup...

            Alex, having too-large credit reservation hurts concurrency by consuming all journal credits and blocking other threads in start_this_handle(), unless journal is huge. Having a very large journal can consume a lot of RAM, which is also not good. Better to have more efficient journal declarations if possible.

            I guess the other question here is why the BRW write clobbers the UID from setattr? That is a problem even more important than the credit calculation, as it may leave objects with the wrong ownership == wrong quota.

            adilger Andreas Dilger added a comment - not really except a tiny local setup... Alex, having too-large credit reservation hurts concurrency by consuming all journal credits and blocking other threads in start_this_handle() , unless journal is huge. Having a very large journal can consume a lot of RAM, which is also not good. Better to have more efficient journal declarations if possible. I guess the other question here is why the BRW write clobbers the UID from setattr? That is a problem even more important than the credit calculation, as it may leave objects with the wrong ownership == wrong quota.

            not really except a tiny local setup...

            bzzz Alex Zhuravlev added a comment - not really except a tiny local setup...

            set_attr for 0 -> 1892 reserved ~290 credits, it is too much to reserve this number every time

            aboyko Alexander Boyko added a comment - set_attr for 0 -> 1892 reserved ~290 credits, it is too much to reserve this number every time

            yes, there was a similar issue recently. in general we can't rely on locking anymore as we take locks after declaration (MDD does that from the very beginning).

            bzzz Alex Zhuravlev added a comment - yes, there was a similar issue recently. in general we can't rely on locking anymore as we take locks after declaration (MDD does that from the very beginning).

            At old version ofd_preprw_write takes ofd_read_lock and it protects declare+setattr from OST_SETATTR. Now both operation are separate.

            aboyko Alexander Boyko added a comment - At old version ofd_preprw_write takes ofd_read_lock and it protects declare+setattr from OST_SETATTR. Now both operation are separate.

            we'd calculate credits like both UIDs aren't 0 w/o optimization which how it was long before.

            bzzz Alex Zhuravlev added a comment - we'd calculate credits like both UIDs aren't 0 w/o optimization which how it was long before.

            this is actually not the locking broken but rather credits calculation (in optimization part)..

            Could you clarify, if we reserve credits for a moment1, and execute operation at moment2, how this relates to calculation optimization? We do it only once per thread.

            aboyko Alexander Boyko added a comment - this is actually not the locking broken but rather credits calculation (in optimization part).. Could you clarify, if we reserve credits for a moment1, and execute operation at moment2, how this relates to calculation optimization? We do it only once per thread.
                 th = ofd_trans_create(env, ofd);
                    if (IS_ERR(th))
                            GOTO(out, rc = PTR_ERR(th));
            
                    if (la->la_valid) {
                            rc = dt_declare_attr_set(env, dt_obj, la, th);
                            if (rc)
                                    GOTO(out_tx, rc);
                    }
            
                    rc = dt_declare_xattr_set(env, dt_obj, &info->fti_buf,
                                    XATTR_NAME_FID, 0, th);
                    if (rc)
                            GOTO(out_tx, rc);
            
                    /* We don't need a transno for this operation which will be re-executed
                     * anyway when the OST_WRITE (with a transno assigned) is replayed */
                    rc = dt_trans_start_local(env, ofd->ofd_osd , th);
                    if (rc)
                            GOTO(out_tx, rc);
                                                                                         <----------------- here another thread call osd_attr_set and changed uid/gid
                    ofd_read_lock(env, ofd_obj);
            
                    /* set uid/gid/projid */
                    if (la->la_valid) {
                            rc = dt_attr_set(env, dt_obj, la, th);
                            if (rc)
                                    GOTO(out_unlock, rc);
                    }
            
            
             
            aboyko Alexander Boyko added a comment - th = ofd_trans_create(env, ofd); if (IS_ERR(th)) GOTO(out, rc = PTR_ERR(th)); if (la->la_valid) { rc = dt_declare_attr_set(env, dt_obj, la, th); if (rc) GOTO(out_tx, rc); } rc = dt_declare_xattr_set(env, dt_obj, &info->fti_buf, XATTR_NAME_FID, 0, th); if (rc) GOTO(out_tx, rc); /* We don't need a transno for this operation which will be re-executed * anyway when the OST_WRITE (with a transno assigned) is replayed */ rc = dt_trans_start_local(env, ofd->ofd_osd , th); if (rc) GOTO(out_tx, rc); <----------------- here another thread call osd_attr_set and changed uid/gid ofd_read_lock(env, ofd_obj); /* set uid/gid/projid */ if (la->la_valid) { rc = dt_attr_set(env, dt_obj, la, th); if (rc) GOTO(out_unlock, rc); }

            People

              bzzz Alex Zhuravlev
              aboyko Alexander Boyko
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: