Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.12.0, Lustre 2.13.0, Lustre 2.14.0
-
None
-
3
-
9223372036854775807
Description
During racer the next LBUG happened
00080000:00000400:0.0:1576770158.744398:0:13105:0:(osd_internal.h:1288:osd_trans_exec_check()) lustre-OST0001: opcode 0: used 8, used now 8, reserved 1 00080000:00000400:0.0:1576770158.750040:0:13105:0:(osd_handler.c:1801:osd_trans_dump_creds()) create: 0/0/0, destroy: 0/0/0 00080000:00000400:0.0:1576770158.755956:0:13105:0:(osd_handler.c:1808:osd_trans_dump_creds()) attr_set: 1/1/8, xattr_set: 2/274/0 00080000:00000400:0.0:1576770158.760212:0:13105:0:(osd_handler.c:1818:osd_trans_dump_creds()) write: 0/0/0, punch: 0/0/0, quota 6/6/0 00080000:00000400:0.0:1576770158.764684:0:13105:0:(osd_handler.c:1825:osd_trans_dump_creds()) insert: 0/0/0, delete: 0/0/0 00080000:00000400:0.0:1576770158.768719:0:13105:0:(osd_handler.c:1832:osd_trans_dump_creds()) ref_add: 0/0/0, ref_del: 0/0/0 00080000:00040000:0.0:1576770158.772504:0:13105:0:(osd_internal.h:1291:osd_trans_exec_check()) LBUG #2 [ffff99728ef238a0] panic at ffffffffa5b5d55b #3 [ffff99728ef23920] lbug_with_loc at ffffffffc0aa48cb [libcfs] #4 [ffff99728ef23950] osd_attr_set at ffffffffc11f6f5e [osd_ldiskfs] #5 [ffff99728ef239b8] ofd_write_attr_set at ffffffffc135c292 [ofd] #6 [ffff99728ef23a28] ofd_commitrw_write at ffffffffc135c99d [ofd] #7 [ffff99728ef23ab8] ofd_commitrw at ffffffffc1360dac [ofd] #8 [ffff99728ef23b30] tgt_brw_write at ffffffffc0fba34c [ptlrpc] #9 [ffff99728ef23ca8] tgt_request_handle at ffffffffc0fbe0ba [ptlrpc]
After adding some debug I've found that vfs_dq_transfer spent 7-8 credits during this failure. Here is a full log with a race, one thread do brw_write another do set_attr
00000100:00100000:1.0:1576770158.667759:0:13105:0:(service.c:2227:ptlrpc_server_handle_request()) Handling RPC req@ffff89779e486050 pname:cluuid+ref:pid:xid:nid:opc:job ll_ost_io00_011:ed527c8c-2be3-96ff-ed07-99e9220c53c7+30:9976:x1653362876819568:12345-192.168.111.36@tcp:4:cat.0 00080000:00080000:0.0:1576770158.714330:0:13105:0:(osd_handler.c:2777:osd_declare_attr_set()) declare uid 0 -> 0 00000001:00080000:0.0:1576770158.714333:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 1 00000001:00080000:0.0:1576770158.714334:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=0 res=1 00000001:00080000:0.0:1576770158.714337:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 1 00000001:00080000:0.0:1576770158.714338:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 2 00000001:00080000:0.0:1576770158.714339:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=0 res=2 00000001:00080000:0.0:1576770158.714340:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 2 00080000:00080000:0.0:1576770158.714341:0:13105:0:(osd_handler.c:2785:osd_declare_attr_set()) declare gid 0 -> 0 00000001:00080000:0.0:1576770158.714341:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 1 00000001:00080000:0.0:1576770158.714342:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=1 res=1 00000001:00080000:0.0:1576770158.714343:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 1 00000001:00080000:0.0:1576770158.714343:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 2 00000001:00080000:0.0:1576770158.714344:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=1 res=2 00000001:00080000:0.0:1576770158.714345:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 2 00000001:00080000:0.0:1576770158.714345:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 2 res 1 00000001:00080000:0.0:1576770158.714346:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=2 res=1 00000001:00080000:0.0:1576770158.714347:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 2 res 1 00000001:00080000:0.0:1576770158.714347:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 2 res 2 00000001:00080000:0.0:1576770158.714348:0:13105:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=2 res=2 00000001:00080000:0.0:1576770158.714349:0:13105:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 2 res 2 00000100:00100000:1.0:1576770158.734168:0:13086:0:(nrs_fifo.c:179:nrs_fifo_req_get()) NRS start fifo request from 12345-192.168.111.33@tcp, seq: 202067 00000100:00100000:1.0:1576770158.734170:0:13086:0:(service.c:2227:ptlrpc_server_handle_request()) Handling RPC req@ffff89778bdd2880 pname:cluuid+ref:pid:xid:nid:opc:job ll_ost00_008:lustre-MDT0001-mdtlov_UUID+17:11480:x1653362832391888:12345-192.168.111.33@tcp:2:osp-syn-1-1.0 00080000:00080000:1.0:1576770158.734179:0:13086:0:(osd_handler.c:2777:osd_declare_attr_set()) declare uid 0 -> 1892 00000001:00080000:1.0:1576770158.734179:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 1892 type 0 res 1 00000001:00080000:1.0:1576770158.734180:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 73 credits for quota uid=1892 type=0 res=1 00000001:00080000:1.0:1576770158.734181:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 1 00000001:00080000:1.0:1576770158.734182:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=0 res=1 00000001:00080000:1.0:1576770158.734183:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 1892 type 0 res 2 00000001:00080000:1.0:1576770158.734184:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 73 credits for quota uid=1892 type=0 res=2 00000001:00080000:1.0:1576770158.734215:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 2 00000001:00080000:1.0:1576770158.734216:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=0 res=2 00080000:00080000:1.0:1576770158.734218:0:13086:0:(osd_handler.c:2785:osd_declare_attr_set()) declare gid 0 -> 23392 00000001:00080000:1.0:1576770158.734219:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 23392 type 1 res 1 00000001:00080000:1.0:1576770158.734221:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 73 credits for quota uid=23392 type=1 res=1 00000001:00080000:1.0:1576770158.734222:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 1 00000001:00080000:1.0:1576770158.734223:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=1 res=1 00000001:00080000:1.0:1576770158.734224:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 23392 type 1 res 2 00000001:00080000:1.0:1576770158.734225:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 73 credits for quota uid=23392 type=1 res=2 00000001:00080000:1.0:1576770158.734257:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 2 00000001:00080000:1.0:1576770158.734259:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=1 res=2 00000001:00080000:1.0:1576770158.734278:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 0 res 2 00000001:00080000:1.0:1576770158.734280:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 1 res 2 00000001:00080000:1.0:1576770158.734281:0:13086:0:(osd_quota.c:576:osd_declare_qid()) uid 0 type 2 res 2 00000001:00080000:1.0:1576770158.734282:0:13086:0:(osd_quota.c:614:osd_declare_qid()) adding 1 credits for quota uid=0 type=2 res=2 00080000:00080000:1.0:1576770158.734285:0:13086:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 512, transaction 512 00080000:00080000:1.0:1576770158.734286:0:13086:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 512, transaction 512 00080000:00080000:1.0:1576770158.734288:0:13086:0:(osd_handler.c:2933:osd_quota_transfer()) executing vfs_dq_transfer inode 293 uid 0 -> 1892 gid 0 -> 23392 00080000:00080000:1.0:1576770158.734311:0:13086:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 512, transaction 512 00080000:00080000:1.0:1576770158.734313:0:13086:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 512, transaction 512 00080000:00080000:1.0:1576770158.734317:0:13086:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 512, transaction 511 00002000:00010000:1.0:1576770158.734328:0:13086:0:(ofd_lvb.c:307:ofd_lvbo_update()) res: [0x280000400:0xe81:0x0] updating lvb size from disk: 10642 -> 10642 00002000:00010000:1.0:1576770158.734330:0:13086:0:(ofd_lvb.c:314:ofd_lvbo_update()) res: [0x280000400:0xe81:0x0] updating lvb mtime from disk: 1576770157 -> 1576770157 00002000:00010000:1.0:1576770158.734331:0:13086:0:(ofd_lvb.c:320:ofd_lvbo_update()) res: [0x280000400:0xe81:0x0] updating lvb atime from disk: 0 -> 0 00002000:00010000:1.0:1576770158.734333:0:13086:0:(ofd_lvb.c:326:ofd_lvbo_update()) res: [0x280000400:0xe81:0x0] updating lvb ctime from disk: 1576770158 -> 1576770158 00002000:00010000:1.0:1576770158.734334:0:13086:0:(ofd_lvb.c:332:ofd_lvbo_update()) res: [0x280000400:0xe81:0x0] updating lvb blocks from disk: 0 -> 0 00000100:00100000:1.0:1576770158.734368:0:13086:0:(service.c:2277:ptlrpc_server_handle_request()) Handled RPC req@ffff89778bdd2880 pname:cluuid+ref:pid:xid:nid:opc:job ll_ost00_008:lustre-MDT0001-mdtlov_UUID+18:11480:x1653362832391888:12345-192.168.111.33@tcp:2:osp-syn-1-1.0 Request processed in 196us (383us total) trans 4295022387 rc 0/0 00000100:00100000:1.0:1576770158.734373:0:13086:0:(nrs_fifo.c:241:nrs_fifo_req_stop()) NRS stop fifo request from 12345-192.168.111.33@tcp, seq: 202067 00080000:00080000:0.0:1576770158.744271:0:13105:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 291, transaction 291 00080000:00080000:0.0:1576770158.744272:0:13105:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 291, transaction 291 00080000:00080000:0.0:1576770158.744273:0:13105:0:(osd_handler.c:2933:osd_quota_transfer()) executing vfs_dq_transfer inode 293 uid 1892 -> 0 gid 23392 -> 0 00080000:00080000:0.0:1576770158.744394:0:13105:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 291, transaction 284 00080000:00080000:0.0:1576770158.744395:0:13105:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 291, transaction 284 00080000:00080000:0.0:1576770158.744397:0:13105:0:(osd_handler.c:2978:osd_show_credits()) opcode 0: before 291, transaction 283 00080000:00000400:0.0:1576770158.744398:0:13105:0:(osd_internal.h:1288:osd_trans_exec_check()) lustre-OST0001: opcode 0: used 8, used now 8, reserved 1 00080000:00000400:0.0:1576770158.750040:0:13105:0:(osd_handler.c:1801:osd_trans_dump_creds()) create: 0/0/0, destroy: 0/0/0 00080000:00000400:0.0:1576770158.755956:0:13105:0:(osd_handler.c:1808:osd_trans_dump_creds()) attr_set: 1/1/8, xattr_set: 2/274/0 00080000:00000400:0.0:1576770158.760212:0:13105:0:(osd_handler.c:1818:osd_trans_dump_creds()) write: 0/0/0, punch: 0/0/0, quota 6/6/0 00080000:00000400:0.0:1576770158.764684:0:13105:0:(osd_handler.c:1825:osd_trans_dump_creds()) insert: 0/0/0, delete: 0/0/0 00080000:00000400:0.0:1576770158.768719:0:13105:0:(osd_handler.c:1832:osd_trans_dump_creds()) ref_add: 0/0/0, ref_del: 0/0/0 00080000:00040000:0.0:1576770158.772504:0:13105:0:(osd_internal.h:1291:osd_trans_exec_check()) LBUG
So here, brw_write reserved credit for uid 0 -> uid 0 quota. But after OST_SETATTR happened and changed uid from 0 to 1892. brw_write continued to set attr and failed because not enough credits for 1892 -> 0 transfer.
It looks like locking fot setattr with ofd_read_lock is broken since commit dcb0f2f03f7 LU-10048 ofd: take local locks within transaction.
Attachments
Issue Links
- is related to
-
LU-7021 osd_trans_exec_check()) LBUG for ofd_write_attr_set
-
- Resolved
-
still hitting this with racer...