[LU-5794] improve credit checks in osd-ldiskfs Created: 23/Oct/14 Updated: 08/Nov/16 Resolved: 10/Jul/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0, Lustre 2.7.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Alex Zhuravlev | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
ldiskfs |
||
| Issue Links: |
|
||||||||||||||||
| Rank (Obsolete): | 16258 | ||||||||||||||||
| Description |
|
the current schema can't detect how the credits are consumed while a transaction is going. so, it's hard to understand which specific operation consumes too much. also, the schema doesn't support rollback operations properly which may cause false assertions. |
| Comments |
| Comment by Alex Zhuravlev [ 23/Oct/14 ] |
|
the proposed patch - http://review.whamcloud.com/#/c/12169/ - tracks how many credits every operation has used and asserts when too many credits were consumed. it reuses the existing rollback table of the operations to allow zero-credits reverse operations (e.g. create |
| Comment by Andreas Dilger [ 24/Oct/14 ] |
23:14:23:Lustre: DEBUG MARKER: == recovery-small test 110g: drop reply during migration == 05:14:10 (1412745250) 23:14:23:Lustre: 7852:0:(osd_internal.h:1048:osd_trans_exec_check()) op 9: used 4, used now 4, reserved 1 23:14:23:Lustre: 7852:0:(osd_handler.c:985:osd_trans_dump_creds()) create: 1/4/0, destroy: 0/0/0 23:14:23:Lustre: 7852:0:(osd_handler.c:992:osd_trans_dump_creds()) attr_set: 1/1/0, xattr_set: 1/64/0 23:14:23:Lustre: 7852:0:(osd_handler.c:1002:osd_trans_dump_creds()) write: 2/4/0, punch: 0/0/0, quota 2/2/0 23:14:23:Lustre: 7852:0:(osd_handler.c:1009:osd_trans_dump_creds()) insert: 3/50/0, delete: 1/1/4 23:14:23:Lustre: 7852:0:(osd_handler.c:1016:osd_trans_dump_creds()) ref_add: 0/0/0, ref_del: 0/0/0 23:14:23:LustreError: 7852:0:(osd_internal.h:1050:osd_trans_exec_check()) LBUG 23:14:23:Pid: 7852, comm: mdt_out00_002 23:14:23: 23:14:23:Call Trace: 23:14:23: [<ffffffffa0760895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 23:14:23: [<ffffffffa0760e97>] lbug_with_loc+0x47/0xb0 [libcfs] 23:14:23: [<ffffffffa1396dfc>] osd_index_ea_delete+0x9cc/0xd80 [osd_ldiskfs] 23:14:23: [<ffffffffa0dff316>] out_obj_index_delete+0x156/0x1f0 [ptlrpc] 23:14:23: [<ffffffffa0e061f8>] out_tx_index_delete_exec+0x28/0x170 [ptlrpc] 23:14:23: [<ffffffffa0dfea2a>] out_tx_end+0xda/0x5c0 [ptlrpc] 23:14:23: [<ffffffffa0e05480>] out_handle+0x7c0/0xe50 [ptlrpc] 23:14:23: [<ffffffffa0dfba9e>] tgt_request_handle+0x71e/0xb10 [ptlrpc] 23:14:23: [<ffffffffa0dab384>] ptlrpc_main+0xe64/0x1990 [ptlrpc] 23:14:23: [<ffffffff8109abf6>] kthread+0x96/0xa0 |
| Comment by Andreas Dilger [ 24/Oct/14 ] |
01:41:14:Lustre: DEBUG MARKER: == sanityn test 40a: pdirops: create vs others ================ 07:40:48 (1413358848) 01:41:14:Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000145 01:41:14:LustreError: 7196:0:(fail.c:132:__cfs_fail_timeout_set()) cfs_fail_timeout id 145 sleeping for 15000ms 01:41:14:LustreError: 7196:0:(fail.c:136:__cfs_fail_timeout_set()) cfs_fail_timeout id 145 awake 01:41:14:Lustre: 7195:0:(osd_internal.h:1048:osd_trans_exec_check()) op 9: used 6, used now 6, reserved 1 01:41:14:Lustre: 7195:0:(osd_handler.c:985:osd_trans_dump_creds()) create: 0/0/0, destroy: 0/0/0 01:41:14:Lustre: 7195:0:(osd_handler.c:992:osd_trans_dump_creds()) attr_set: 2/2/0, xattr_set: 1/64/0 01:41:14:Lustre: 7195:0:(osd_handler.c:1002:osd_trans_dump_creds()) write: 2/4/0, punch: 0/0/0, quota 2/2/0 01:41:14:Lustre: 7195:0:(osd_handler.c:1009:osd_trans_dump_creds()) insert: 0/0/0, delete: 1/1/6 01:41:14:Lustre: 7195:0:(osd_handler.c:1016:osd_trans_dump_creds()) ref_add: 0/0/0, ref_del: 1/1/0 01:41:14:LustreError: 7195:0:(osd_internal.h:1050:osd_trans_exec_check()) LBUG 01:41:14:Pid: 7195, comm: mdt_out00_002 01:41:14: 01:41:14:Call Trace: 01:41:14: [<ffffffffa0483895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 01:41:14: [<ffffffffa0483e97>] lbug_with_loc+0x47/0xb0 [libcfs] 01:41:14: [<ffffffffa0d2cdfc>] osd_index_ea_delete+0x9cc/0xd80 [osd_ldiskfs] 01:41:14: [<ffffffffa0884316>] out_obj_index_delete+0x156/0x1f0 [ptlrpc] 01:41:14: [<ffffffffa088b1f8>] out_tx_index_delete_exec+0x28/0x170 [ptlrpc] 01:41:14: [<ffffffffa0883a2a>] out_tx_end+0xda/0x5c0 [ptlrpc] 01:41:14: [<ffffffffa088a480>] out_handle+0x7c0/0xe50 [ptlrpc] 01:41:14: [<ffffffffa0880a9e>] tgt_request_handle+0x71e/0xb10 [ptlrpc] 01:41:14: [<ffffffffa0830384>] ptlrpc_main+0xe64/0x1990 [ptlrpc] |
| Comment by Andreas Dilger [ 24/Oct/14 ] |
18:47:27:Lustre: DEBUG MARKER: == sanity-lfsck test 22a: LFSCK can repair unmatched pairs (1) == 00:47:21 (1412729241) 18:47:27:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x161e 18:47:27:Lustre: *** cfs_fail_loc=161e, val=0*** 18:47:27:Lustre: Skipped 3 previous similar messages 18:47:27:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0 18:47:27:Lustre: DEBUG MARKER: /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace -A -r 18:47:27:Lustre: 21068:0:(osd_internal.h:1048:osd_trans_exec_check()) op 9: used 4, used now 4, reserved 1 18:47:27:LustreError: 21068:0:(osd_internal.h:1050:osd_trans_exec_check()) LBUG 18:47:27:Pid: 21068, comm: lfsck_namespace 18:47:27: 18:47:27:Call Trace: 18:47:27: [<ffffffffa0483895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 18:47:27: [<ffffffffa0483e97>] lbug_with_loc+0x47/0xb0 [libcfs] 18:47:27: [<ffffffffa0d2cdfc>] osd_index_ea_delete+0x9cc/0xd80 [osd_ldiskfs] 18:47:27: [<ffffffffa0fbe2d7>] lod_index_delete+0xa7/0x140 [lod] 18:47:27: [<ffffffffa0e50005>] dt_delete+0x95/0x1a0 [lfsck] 18:47:27: [<ffffffffa0e56182>] lfsck_namespace_repair_unmatched_pairs+0x6e2/0x840 [lfsck] 18:47:27: [<ffffffffa0e56939>] lfsck_namespace_dsd_single+0x659/0x7e0 [lfsck] 18:47:27: [<ffffffffa0e57309>] lfsck_namespace_double_scan_dir+0x849/0x1590 [lfsck] 18:47:27: [<ffffffffa0e5cb44>] lfsck_namespace_double_scan_one+0x2f4/0x1250 [lfsck] 18:47:27: [<ffffffffa0e5e278>] lfsck_namespace_assistant_handler_p2+0x7d8/0xa50 [lfsck] 18:47:27: [<ffffffffa0e4b9ed>] lfsck_assistant_engine+0x130d/0x1c50 [lfsck] |
| Comment by Andreas Dilger [ 24/Oct/14 ] |
04:05:40:Lustre: DEBUG MARKER: == sanity test 1: mkdir; remkdir; rmdir ================================ 10:05:29 (1412589929) 04:05:40:Lustre: cli-ctl-lustre-MDT0001: Allocated super-sequence [0x0000000440000400-0x0000000480000400):1:mdt] 04:05:40:Lustre: 3448:0:(osd_internal.h:1048:osd_trans_exec_check()) op 9: used 6, used now 6, reserved 1 04:05:40:Lustre: 3448:0:(osd_handler.c:985:osd_trans_dump_creds()) create: 0/0/0, destroy: 0/0/0 04:05:40:Lustre: 3448:0:(osd_handler.c:992:osd_trans_dump_creds()) attr_set: 2/2/0, xattr_set: 1/64/0 04:05:40:Lustre: 3448:0:(osd_handler.c:1002:osd_trans_dump_creds()) write: 2/4/0, punch: 0/0/0, quota 2/2/0 04:05:40:Lustre: 3448:0:(osd_handler.c:1009:osd_trans_dump_creds()) insert: 0/0/0, delete: 1/1/6 04:05:40:Lustre: 3448:0:(osd_handler.c:1016:osd_trans_dump_creds()) ref_add: 0/0/0, ref_del: 1/1/0 04:05:40:LustreError: 3448:0:(osd_internal.h:1050:osd_trans_exec_check()) LBUG 04:05:40:Pid: 3448, comm: mdt_out00_001 04:05:40: 04:05:40:Call Trace: 04:05:40: [<ffffffffa0483895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 04:05:40: [<ffffffffa0483e97>] lbug_with_loc+0x47/0xb0 [libcfs] 04:05:41: [<ffffffffa0d2cdfc>] osd_index_ea_delete+0x9cc/0xd80 [osd_ldiskfs] 04:05:41: [<ffffffffa0884316>] out_obj_index_delete+0x156/0x1f0 [ptlrpc] 04:05:41: [<ffffffffa088b1f8>] out_tx_index_delete_exec+0x28/0x170 [ptlrpc] 04:05:41: [<ffffffffa0883a2a>] out_tx_end+0xda/0x5c0 [ptlrpc] 04:05:41: [<ffffffffa088a480>] out_handle+0x7c0/0xe50 [ptlrpc] 04:05:41: [<ffffffffa0880a9e>] tgt_request_handle+0x71e/0xb10 [ptlrpc] 04:05:41: [<ffffffffa0830384>] ptlrpc_main+0xe64/0x1990 [ptlrpc] |
| Comment by Alex Zhuravlev [ 25/Oct/14 ] |
|
if I understand correctly, this is an delete in the index where an agent inode needs to be removed. |
| Comment by Gerrit Updater [ 17/Nov/14 ] |
|
Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/12169 |
| Comment by Gerrit Updater [ 17/Nov/14 ] |
|
Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/12169 |
| Comment by Gerrit Updater [ 10/Jul/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12169/ |
| Comment by Peter Jones [ 10/Jul/15 ] |
|
Landed for 2.8 |