Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.12.0, Lustre 2.13.0, Lustre 2.14.0
-
None
-
3
-
9223372036854775807
Description
During racer the next LBUG happened
00080000:00000400:0.0:1576770158.744398:0:13105:0:(osd_internal.h:1288:osd_trans_exec_check()) lustre-OST0001: opcode 0: used 8, used now 8, reserved 1 00080000:00000400:0.0:1576770158.750040:0:13105:0:(osd_handler.c:1801:osd_trans_dump_creds()) create: 0/0/0, destroy: 0/0/0 00080000:00000400:0.0:1576770158.755956:0:13105:0:(osd_handler.c:1808:osd_trans_dump_creds()) attr_set: 1/1/8, xattr_set: 2/274/0 00080000:00000400:0.0:1576770158.760212:0:13105:0:(osd_handler.c:1818:osd_trans_dump_creds()) write: 0/0/0, punch: 0/0/0, quota 6/6/0 00080000:00000400:0.0:1576770158.764684:0:13105:0:(osd_handler.c:1825:osd_trans_dump_creds()) insert: 0/0/0, delete: 0/0/0 00080000:00000400:0.0:1576770158.768719:0:13105:0:(osd_handler.c:1832:osd_trans_dump_creds()) ref_add: 0/0/0, ref_del: 0/0/0 00080000:00040000:0.0:1576770158.772504:0:13105:0:(osd_internal.h:1291:osd_trans_exec_check()) LBUG #2 [ffff99728ef238a0] panic at ffffffffa5b5d55b #3 [ffff99728ef23920] lbug_with_loc at ffffffffc0aa48cb [libcfs] #4 [ffff99728ef23950] osd_attr_set at ffffffffc11f6f5e [osd_ldiskfs] #5 [ffff99728ef239b8] ofd_write_attr_set at ffffffffc135c292 [ofd] #6 [ffff99728ef23a28] ofd_commitrw_write at ffffffffc135c99d [ofd] #7 [ffff99728ef23ab8] ofd_commitrw at ffffffffc1360dac [ofd] #8 [ffff99728ef23b30] tgt_brw_write at ffffffffc0fba34c [ptlrpc] #9 [ffff99728ef23ca8] tgt_request_handle at ffffffffc0fbe0ba [ptlrpc]
After adding some debug I've found that vfs_dq_transfer spent 7-8 credits during this failure. Here is a full log with a race, one thread do brw_write another do set_attr
00000100:00100000:1.0:1576770158.667759:0:13105:0:(service.c:2227:ptlrpc_server_handle_request()) Handling RPC req@ffff89779e486050 pname:cluuid+ref:pid:xid:nid:opc:job ll_ost_io00_011:ed527c8c-2be3-96ff-ed07-99e9220c53c7+30:9976:x1653362876819568:12345-192.168.111.36@tcp:4:cat.0 00080000:00080000:0.0:1576770158.714330:0:13105:0:(osd_handler.c:2777:osd_declare_attr_set()) declare uid 0 -> 0 00000001:00080000:0.0:1576770158.714333:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 1 00000001:00080000:0.0:1576770158.714334:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=0 res=1 00000001:00080000:0.0:1576770158.714337:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 1 00000001:00080000:0.0:1576770158.714338:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 2 00000001:00080000:0.0:1576770158.714339:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=0 res=2 00000001:00080000:0.0:1576770158.714340:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 2 00080000:00080000:0.0:1576770158.714341:0:13105:0:(osd_handler.c:2785:osd_declare_attr_set()) declare gid 0 -> 0 00000001:00080000:0.0:1576770158.714341:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 1 00000001:00080000:0.0:1576770158.714342:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=1 res=1 00000001:00080000:0.0:1576770158.714343:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 1 00000001:00080000:0.0:1576770158.714343:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 2 00000001:00080000:0.0:1576770158.714344:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=1 res=2 00000001:00080000:0.0:1576770158.714345:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 2 00000001:00080000:0.0:1576770158.714345:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 2 res 1 00000001:00080000:0.0:1576770158.714346:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=2 res=1 00000001:00080000:0.0:1576770158.714347:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 2 res 1 00000001:00080000:0.0:1576770158.714347:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 2 res 2 00000001:00080000:0.0:1576770158.714348:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=2 res=2 00000001:00080000:0.0:1576770158.714349:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 2 res 2 00000100:00100000:1.0:1576770158.734168:0:13086:0:(nrs_fifo.c:179:nrs_fifo_req_get()) NRS start fifo request from 12345-192.168.111.33@tcp, seq: 202067 00000100:00100000:1.0:1576770158.734170:0:13086:0:(service.c:2227:ptlrpc_server_handle_request()) Handling RPC req@ffff89778bdd2880 pname:cluuid+ref:pid:xid:nid:opc:job ll_ost00_008:lustre-MDT0001-mdtlov_UUID+17:11480:x1653362832391888:12345-192.168.111.33@tcp:2:osp-syn-1-1.0 00080000:00080000:1.0:1576770158.734179:0:13086:0:(osd_handler.c:2777:osd_declare_attr_set()) declare uid 0 -> 1892 00000001:00080000:1.0:1576770158.734179:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 1892 type 0 res 1 00000001:00080000:1.0:1576770158.734180:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 73 credits for quota uid=1892 type=0 res=1 00000001:00080000:1.0:1576770158.734181:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 1 00000001:00080000:1.0:1576770158.734182:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=0 res=1 00000001:00080000:1.0:1576770158.734183:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 1892 type 0 res 2 00000001:00080000:1.0:1576770158.734184:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 73 credits for quota uid=1892 type=0 res=2 00000001:00080000:1.0:1576770158.734215:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 2 00000001:00080000:1.0:1576770158.734216:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=0 res=2 00080000:00080000:1.0:1576770158.734218:0:13086:0:(osd_handler.c:2785:osd_declare_attr_set()) declare gid 0 -> 23392 00000001:00080000:1.0:1576770158.734219:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 23392 type 1 res 1 00000001:00080000:1.0:1576770158.734221:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 73 credits for quota uid=23392 type=1 res=1 00000001:00080000:1.0:1576770158.734222:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 1 00000001:00080000:1.0:1576770158.734223:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=1 res=1 00000001:00080000:1.0:1576770158.734224:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 23392 type 1 res 2 00000001:00080000:1.0:1576770158.734225:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 73 credits for quota uid=23392 type=1 res=2 00000001:00080000:1.0:1576770158.734257:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 2 00000001:00080000:1.0:1576770158.734259:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=1 res=2 00000001:00080000:1.0:1576770158.734278:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 2 00000001:00080000:1.0:1576770158.734280:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 2 00000001:00080000:1.0:1576770158.734281:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 2 res 2 00000001:00080000:1.0:1576770158.734282:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=2 res=2 00080000:00080000:1.0:1576770158.734285:0:13086:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 512, transaction 512 00080000:00080000:1.0:1576770158.734286:0:13086:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 512, transaction 512 00080000:00080000:1.0:1576770158.734288:0:13086:0:(osd_handler.c:2933:osd_quota_transfer()) executing vfs_dq_transfer inode 293 uid 0 -> 1892 gid 0 -> 23392 00080000:00080000:1.0:1576770158.734311:0:13086:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 512, transaction 512 00080000:00080000:1.0:1576770158.734313:0:13086:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 512, transaction 512 00080000:00080000:1.0:1576770158.734317:0:13086:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 512, transaction 511 00002000:00010000:1.0:1576770158.734328:0:13086:0:(ofd_lvb.c:307:ofd_lvbo_update()) res: [0x280000400:0xe81:0x0] updating lvb size from disk: 10642 -> 10642 00002000:00010000:1.0:1576770158.734330:0:13086:0:(ofd_lvb.c:314:ofd_lvbo_update()) res: [0x280000400:0xe81:0x0] updating lvb mtime from disk: 1576770157 -> 1576770157 00002000:00010000:1.0:1576770158.734331:0:13086:0:(ofd_lvb.c:320:ofd_lvbo_update()) res: [0x280000400:0xe81:0x0] updating lvb atime from disk: 0 -> 0 00002000:00010000:1.0:1576770158.734333:0:13086:0:(ofd_lvb.c:326:ofd_lvbo_update()) res: [0x280000400:0xe81:0x0] updating lvb ctime from disk: 1576770158 -> 1576770158 00002000:00010000:1.0:1576770158.734334:0:13086:0:(ofd_lvb.c:332:ofd_lvbo_update()) res: [0x280000400:0xe81:0x0] updating lvb blocks from disk: 0 -> 0 00000100:00100000:1.0:1576770158.734368:0:13086:0:(service.c:2277:ptlrpc_server_handle_request()) Handled RPC req@ffff89778bdd2880 pname:cluuid+ref:pid:xid:nid:opc:job ll_ost00_008:lustre-MDT0001-mdtlov_UUID+18:11480:x1653362832391888:12345-192.168.111.33@tcp:2:osp-syn-1-1.0 Request processed in 196us (383us total) trans 4295022387 rc 0/0 00000100:00100000:1.0:1576770158.734373:0:13086:0:(nrs_fifo.c:241:nrs_fifo_req_stop()) NRS stop fifo request from 12345-192.168.111.33@tcp, seq: 202067 00080000:00080000:0.0:1576770158.744271:0:13105:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 291, transaction 291 00080000:00080000:0.0:1576770158.744272:0:13105:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 291, transaction 291 00080000:00080000:0.0:1576770158.744273:0:13105:0:(osd_handler.c:2933:osd_quota_transfer()) executing vfs_dq_transfer inode 293 uid 1892 -> 0 gid 23392 -> 0 00080000:00080000:0.0:1576770158.744394:0:13105:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 291, transaction 284 00080000:00080000:0.0:1576770158.744395:0:13105:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 291, transaction 284 00080000:00080000:0.0:1576770158.744397:0:13105:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 291, transaction 283 00080000:00000400:0.0:1576770158.744398:0:13105:0:(osd_internal.h:1288:osd_trans_exec_check()) lustre-OST0001: opcode 0: used 8, used now 8, reserved 1 00080000:00000400:0.0:1576770158.750040:0:13105:0:(osd_handler.c:1801:osd_trans_dump_creds()) create: 0/0/0, destroy: 0/0/0 00080000:00000400:0.0:1576770158.755956:0:13105:0:(osd_handler.c:1808:osd_trans_dump_creds()) attr_set: 1/1/8, xattr_set: 2/274/0 00080000:00000400:0.0:1576770158.760212:0:13105:0:(osd_handler.c:1818:osd_trans_dump_creds()) write: 0/0/0, punch: 0/0/0, quota 6/6/0 00080000:00000400:0.0:1576770158.764684:0:13105:0:(osd_handler.c:1825:osd_trans_dump_creds()) insert: 0/0/0, delete: 0/0/0 00080000:00000400:0.0:1576770158.768719:0:13105:0:(osd_handler.c:1832:osd_trans_dump_creds()) ref_add: 0/0/0, ref_del: 0/0/0 00080000:00040000:0.0:1576770158.772504:0:13105:0:(osd_internal.h:1291:osd_trans_exec_check()) LBUG
So here, brw_write reserved credit for uid 0 -> uid 0 quota. But after OST_SETATTR happened and changed uid from 0 to 1892. brw_write continued to set attr and failed because not enough credits for 1892 -> 0 transfer.
It looks like locking fot setattr with ofd_read_lock is broken since commit dcb0f2f03f7 LU-10048 ofd: take local locks within transaction.
Attachments
Issue Links
- is related to
-
LU-7021 osd_trans_exec_check()) LBUG for ofd_write_attr_set
-
- Resolved
-
yes, I'm aware about credits a bit. in regular setups the journal is really huge as writes (especially fragmented) require more credits and normally number of threads is high as well. where possible it does makes sense to optimize credits, of course (e.g.
LU-4611), but we can't change locking - DoM is the point where we had to unify it with MDT's order.