[LU-13093] osd_trans_exec_check()) lustre-OST0001: opcode 0: used 8, used now 8, reserved 1 LBUG Created: 20/Dec/19 Updated: 20/Jun/21 Resolved: 28/Jan/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0, Lustre 2.13.0, Lustre 2.14.0 |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Alexander Boyko | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
During racer the next LBUG happened 00080000:00000400:0.0:1576770158.744398:0:13105:0:(osd_internal.h:1288:osd_trans_exec_check()) lustre-OST0001: opcode 0: used 8, used now 8, reserved 1 00080000:00000400:0.0:1576770158.750040:0:13105:0:(osd_handler.c:1801:osd_trans_dump_creds()) create: 0/0/0, destroy: 0/0/0 00080000:00000400:0.0:1576770158.755956:0:13105:0:(osd_handler.c:1808:osd_trans_dump_creds()) attr_set: 1/1/8, xattr_set: 2/274/0 00080000:00000400:0.0:1576770158.760212:0:13105:0:(osd_handler.c:1818:osd_trans_dump_creds()) write: 0/0/0, punch: 0/0/0, quota 6/6/0 00080000:00000400:0.0:1576770158.764684:0:13105:0:(osd_handler.c:1825:osd_trans_dump_creds()) insert: 0/0/0, delete: 0/0/0 00080000:00000400:0.0:1576770158.768719:0:13105:0:(osd_handler.c:1832:osd_trans_dump_creds()) ref_add: 0/0/0, ref_del: 0/0/0 00080000:00040000:0.0:1576770158.772504:0:13105:0:(osd_internal.h:1291:osd_trans_exec_check()) LBUG #2 [ffff99728ef238a0] panic at ffffffffa5b5d55b #3 [ffff99728ef23920] lbug_with_loc at ffffffffc0aa48cb [libcfs] #4 [ffff99728ef23950] osd_attr_set at ffffffffc11f6f5e [osd_ldiskfs] #5 [ffff99728ef239b8] ofd_write_attr_set at ffffffffc135c292 [ofd] #6 [ffff99728ef23a28] ofd_commitrw_write at ffffffffc135c99d [ofd] #7 [ffff99728ef23ab8] ofd_commitrw at ffffffffc1360dac [ofd] #8 [ffff99728ef23b30] tgt_brw_write at ffffffffc0fba34c [ptlrpc] #9 [ffff99728ef23ca8] tgt_request_handle at ffffffffc0fbe0ba [ptlrpc] After adding some debug I've found that vfs_dq_transfer spent 7-8 credits during this failure. Here is a full log with a race, one thread do brw_write another do set_attr 00000100:00100000:1.0:1576770158.667759:0:13105:0:(service.c:2227:ptlrpc_server_handle_request()) Handling RPC req@ffff89779e486050 pname:cluuid+ref:pid:xid:nid:opc:job ll_ost_io00_011:ed527c8c-2be3-96ff-ed07-99e9220c53c7+30:9976:x1653362876819568:12345-192.168.111.36@tcp:4:cat.0 00080000:00080000:0.0:1576770158.714330:0:13105:0:(osd_handler.c:2777:osd_declare_attr_set()) declare uid 0 -> 0 00000001:00080000:0.0:1576770158.714333:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 1 00000001:00080000:0.0:1576770158.714334:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=0 res=1 00000001:00080000:0.0:1576770158.714337:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 1 00000001:00080000:0.0:1576770158.714338:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 2 00000001:00080000:0.0:1576770158.714339:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=0 res=2 00000001:00080000:0.0:1576770158.714340:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 2 00080000:00080000:0.0:1576770158.714341:0:13105:0:(osd_handler.c:2785:osd_declare_attr_set()) declare gid 0 -> 0 00000001:00080000:0.0:1576770158.714341:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 1 00000001:00080000:0.0:1576770158.714342:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=1 res=1 00000001:00080000:0.0:1576770158.714343:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 1 00000001:00080000:0.0:1576770158.714343:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 2 00000001:00080000:0.0:1576770158.714344:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=1 res=2 00000001:00080000:0.0:1576770158.714345:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 2 00000001:00080000:0.0:1576770158.714345:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 2 res 1 00000001:00080000:0.0:1576770158.714346:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=2 res=1 00000001:00080000:0.0:1576770158.714347:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 2 res 1 00000001:00080000:0.0:1576770158.714347:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 2 res 2 00000001:00080000:0.0:1576770158.714348:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=2 res=2 00000001:00080000:0.0:1576770158.714349:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 2 res 2 00000100:00100000:1.0:1576770158.734168:0:13086:0:(nrs_fifo.c:179:nrs_fifo_req_get()) NRS start fifo request from 12345-192.168.111.33@tcp, seq: 202067 00000100:00100000:1.0:1576770158.734170:0:13086:0:(service.c:2227:ptlrpc_server_handle_request()) Handling RPC req@ffff89778bdd2880 pname:cluuid+ref:pid:xid:nid:opc:job ll_ost00_008:lustre-MDT0001-mdtlov_UUID+17:11480:x1653362832391888:12345-192.168.111.33@tcp:2:osp-syn-1-1.0 00080000:00080000:1.0:1576770158.734179:0:13086:0:(osd_handler.c:2777:osd_declare_attr_set()) declare uid 0 -> 1892 00000001:00080000:1.0:1576770158.734179:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 1892 type 0 res 1 00000001:00080000:1.0:1576770158.734180:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 73 credits for quota uid=1892 type=0 res=1 00000001:00080000:1.0:1576770158.734181:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 1 00000001:00080000:1.0:1576770158.734182:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=0 res=1 00000001:00080000:1.0:1576770158.734183:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 1892 type 0 res 2 00000001:00080000:1.0:1576770158.734184:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 73 credits for quota uid=1892 type=0 res=2 00000001:00080000:1.0:1576770158.734215:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 2 00000001:00080000:1.0:1576770158.734216:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=0 res=2 00080000:00080000:1.0:1576770158.734218:0:13086:0:(osd_handler.c:2785:osd_declare_attr_set()) declare gid 0 -> 23392 00000001:00080000:1.0:1576770158.734219:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 23392 type 1 res 1 00000001:00080000:1.0:1576770158.734221:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 73 credits for quota uid=23392 type=1 res=1 00000001:00080000:1.0:1576770158.734222:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 1 00000001:00080000:1.0:1576770158.734223:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=1 res=1 00000001:00080000:1.0:1576770158.734224:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 23392 type 1 res 2 00000001:00080000:1.0:1576770158.734225:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 73 credits for quota uid=23392 type=1 res=2 00000001:00080000:1.0:1576770158.734257:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 2 00000001:00080000:1.0:1576770158.734259:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=1 res=2 00000001:00080000:1.0:1576770158.734278:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 2 00000001:00080000:1.0:1576770158.734280:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 2 00000001:00080000:1.0:1576770158.734281:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 2 res 2 00000001:00080000:1.0:1576770158.734282:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=2 res=2 00080000:00080000:1.0:1576770158.734285:0:13086:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 512, transaction 512 00080000:00080000:1.0:1576770158.734286:0:13086:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 512, transaction 512 00080000:00080000:1.0:1576770158.734288:0:13086:0:(osd_handler.c:2933:osd_quota_transfer()) executing vfs_dq_transfer inode 293 uid 0 -> 1892 gid 0 -> 23392 00080000:00080000:1.0:1576770158.734311:0:13086:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 512, transaction 512 00080000:00080000:1.0:1576770158.734313:0:13086:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 512, transaction 512 00080000:00080000:1.0:1576770158.734317:0:13086:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 512, transaction 511 00002000:00010000:1.0:1576770158.734328:0:13086:0:(ofd_lvb.c:307:ofd_lvbo_update()) res: [0x280000400:0xe81:0x0] updating lvb size from disk: 10642 -> 10642 00002000:00010000:1.0:1576770158.734330:0:13086:0:(ofd_lvb.c:314:ofd_lvbo_update()) res: [0x280000400:0xe81:0x0] updating lvb mtime from disk: 1576770157 -> 1576770157 00002000:00010000:1.0:1576770158.734331:0:13086:0:(ofd_lvb.c:320:ofd_lvbo_update()) res: [0x280000400:0xe81:0x0] updating lvb atime from disk: 0 -> 0 00002000:00010000:1.0:1576770158.734333:0:13086:0:(ofd_lvb.c:326:ofd_lvbo_update()) res: [0x280000400:0xe81:0x0] updating lvb ctime from disk: 1576770158 -> 1576770158 00002000:00010000:1.0:1576770158.734334:0:13086:0:(ofd_lvb.c:332:ofd_lvbo_update()) res: [0x280000400:0xe81:0x0] updating lvb blocks from disk: 0 -> 0 00000100:00100000:1.0:1576770158.734368:0:13086:0:(service.c:2277:ptlrpc_server_handle_request()) Handled RPC req@ffff89778bdd2880 pname:cluuid+ref:pid:xid:nid:opc:job ll_ost00_008:lustre-MDT0001-mdtlov_UUID+18:11480:x1653362832391888:12345-192.168.111.33@tcp:2:osp-syn-1-1.0 Request processed in 196us (383us total) trans 4295022387 rc 0/0 00000100:00100000:1.0:1576770158.734373:0:13086:0:(nrs_fifo.c:241:nrs_fifo_req_stop()) NRS stop fifo request from 12345-192.168.111.33@tcp, seq: 202067 00080000:00080000:0.0:1576770158.744271:0:13105:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 291, transaction 291 00080000:00080000:0.0:1576770158.744272:0:13105:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 291, transaction 291 00080000:00080000:0.0:1576770158.744273:0:13105:0:(osd_handler.c:2933:osd_quota_transfer()) executing vfs_dq_transfer inode 293 uid 1892 -> 0 gid 23392 -> 0 00080000:00080000:0.0:1576770158.744394:0:13105:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 291, transaction 284 00080000:00080000:0.0:1576770158.744395:0:13105:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 291, transaction 284 00080000:00080000:0.0:1576770158.744397:0:13105:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 291, transaction 283 00080000:00000400:0.0:1576770158.744398:0:13105:0:(osd_internal.h:1288:osd_trans_exec_check()) lustre-OST0001: opcode 0: used 8, used now 8, reserved 1 00080000:00000400:0.0:1576770158.750040:0:13105:0:(osd_handler.c:1801:osd_trans_dump_creds()) create: 0/0/0, destroy: 0/0/0 00080000:00000400:0.0:1576770158.755956:0:13105:0:(osd_handler.c:1808:osd_trans_dump_creds()) attr_set: 1/1/8, xattr_set: 2/274/0 00080000:00000400:0.0:1576770158.760212:0:13105:0:(osd_handler.c:1818:osd_trans_dump_creds()) write: 0/0/0, punch: 0/0/0, quota 6/6/0 00080000:00000400:0.0:1576770158.764684:0:13105:0:(osd_handler.c:1825:osd_trans_dump_creds()) insert: 0/0/0, delete: 0/0/0 00080000:00000400:0.0:1576770158.768719:0:13105:0:(osd_handler.c:1832:osd_trans_dump_creds()) ref_add: 0/0/0, ref_del: 0/0/0 00080000:00040000:0.0:1576770158.772504:0:13105:0:(osd_internal.h:1291:osd_trans_exec_check()) LBUG So here, brw_write reserved credit for uid 0 -> uid 0 quota. But after OST_SETATTR happened and changed uid from 0 to 1892. brw_write continued to set attr and failed because not enough credits for 1892 -> 0 transfer. |
| Comments |
| Comment by Alex Zhuravlev [ 20/Dec/19 ] |
|
what kernel version do you use? |
| Comment by Alex Zhuravlev [ 20/Dec/19 ] |
|
this is actually not the locking broken but rather credits calculation (in optimization part).. |
| Comment by Alexander Boyko [ 20/Dec/19 ] |
|
ofd_write_attrset(pid 13105) do declaration for 0 -> 0, then another thread(pid 13086) changed uid, and then ofd_write_attrset do ofd_read_lock, setattr. 3.10.0-957.1.3957.1.3.x3.4.160.x86_64 |
| Comment by Alexander Boyko [ 20/Dec/19 ] |
th = ofd_trans_create(env, ofd);
if (IS_ERR(th))
GOTO(out, rc = PTR_ERR(th));
if (la->la_valid) {
rc = dt_declare_attr_set(env, dt_obj, la, th);
if (rc)
GOTO(out_tx, rc);
}
rc = dt_declare_xattr_set(env, dt_obj, &info->fti_buf,
XATTR_NAME_FID, 0, th);
if (rc)
GOTO(out_tx, rc);
/* We don't need a transno for this operation which will be re-executed
* anyway when the OST_WRITE (with a transno assigned) is replayed */
rc = dt_trans_start_local(env, ofd->ofd_osd , th);
if (rc)
GOTO(out_tx, rc);
<----------------- here another thread call osd_attr_set and changed uid/gid
ofd_read_lock(env, ofd_obj);
/* set uid/gid/projid */
if (la->la_valid) {
rc = dt_attr_set(env, dt_obj, la, th);
if (rc)
GOTO(out_unlock, rc);
}
|
| Comment by Alexander Boyko [ 20/Dec/19 ] |
Could you clarify, if we reserve credits for a moment1, and execute operation at moment2, how this relates to calculation optimization? We do it only once per thread. |
| Comment by Alex Zhuravlev [ 20/Dec/19 ] |
|
we'd calculate credits like both UIDs aren't 0 w/o optimization which how it was long before. |
| Comment by Alexander Boyko [ 20/Dec/19 ] |
|
At old version ofd_preprw_write takes ofd_read_lock and it protects declare+setattr from OST_SETATTR. Now both operation are separate. |
| Comment by Alex Zhuravlev [ 20/Dec/19 ] |
|
yes, there was a similar issue recently. in general we can't rely on locking anymore as we take locks after declaration (MDD does that from the very beginning). |
| Comment by Alexander Boyko [ 20/Dec/19 ] |
|
set_attr for 0 -> 1892 reserved ~290 credits, it is too much to reserve this number every time |
| Comment by Alex Zhuravlev [ 20/Dec/19 ] |
|
not really |
| Comment by Andreas Dilger [ 20/Dec/19 ] |
Alex, having too-large credit reservation hurts concurrency by consuming all journal credits and blocking other threads in start_this_handle(), unless journal is huge. Having a very large journal can consume a lot of RAM, which is also not good. Better to have more efficient journal declarations if possible. I guess the other question here is why the BRW write clobbers the UID from setattr? That is a problem even more important than the credit calculation, as it may leave objects with the wrong ownership == wrong quota. |
| Comment by Alex Zhuravlev [ 21/Dec/19 ] |
|
yes, I'm aware about credits a bit. in regular setups the journal is really huge as writes (especially fragmented) require more credits and normally number of threads is high as well. where possible it does makes sense to optimize credits, of course (e.g. |
| Comment by Alexey Lyashkov [ 27/Dec/19 ] |
|
It looks bug don't related to the credits at all, but it related to the brw internal state. based on Alex comment early it mean - brw should skip any attribute changes during write, but it looks, brw tries to restore original owner (as setattr executed). |
| Comment by Alex Zhuravlev [ 27/Dec/19 ] |
|
well, OST_WRITE may and should change credentials as this is the way to initialize UID/GID correctly given objects are precreated with no knowledge who will be using them. |
| Comment by Alexey Lyashkov [ 27/Dec/19 ] |
|
yes, but this change only for the precreated objects aka inode version == 0. while any setattr will change an inode version (including a first write), so this change should be NOP once inode version greater than zero. |
| Comment by Andreas Dilger [ 27/Dec/19 ] |
|
We don't use the version to check this, but rather the SUID/SGID bits on the inode, which are set on the inode at precreate time, and cleared when the UID/GID are set. Since there is no lock held across the declare and transaction, it makes sense to reverify these flags in the BRW code under lock before setattr is called. The OST_SETATTR code should also lock the inode at this point for exclusion. That avoids one clobbering the other in a race. |
| Comment by Gerrit Updater [ 31/Dec/19 ] |
|
Alexandr Boyko (c17825@cray.com) uploaded a new patch: https://review.whamcloud.com/37117 |
| Comment by Gerrit Updater [ 28/Jan/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37117/ |
| Comment by Peter Jones [ 28/Jan/20 ] |
|
Landed for 2.14 |
| Comment by Gerrit Updater [ 12/Apr/21 ] |
|
Etienne AUJAMES (eaujames@ddn.com) uploaded a new patch: https://review.whamcloud.com/43278 |
| Comment by Alex Zhuravlev [ 14/Apr/21 ] |
|
still hitting this with racer... |