[LU-5794] improve credit checks in osd-ldiskfs Created: 23/Oct/14  Updated: 08/Nov/16  Resolved: 10/Jul/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0, Lustre 2.7.0
Fix Version/s: Lustre 2.8.0

Type: Improvement Priority: Minor
Reporter: Alex Zhuravlev Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None
Environment:

ldiskfs


Issue Links:
Duplicate
Related
is related to LU-5770 wrong tx credit calculations in mdd_d... Resolved
is related to LU-4528 osd_trans_exec_op()) ASSERTION( oti->... Resolved
Rank (Obsolete): 16258

 Description   

the current schema can't detect how the credits are consumed while a transaction is going. so, it's hard to understand which specific operation consumes too much. also, the schema doesn't support rollback operations properly which may cause false assertions.



 Comments   
Comment by Alex Zhuravlev [ 23/Oct/14 ]

the proposed patch - http://review.whamcloud.com/#/c/12169/ - tracks how many credits every operation has used and asserts when too many credits were consumed. it reuses the existing rollback table of the operations to allow zero-credits reverse operations (e.g. create > destroy, insert>delete, etc).

Comment by Andreas Dilger [ 24/Oct/14 ]
23:14:23:Lustre: DEBUG MARKER: == recovery-small test 110g: drop reply during migration == 05:14:10 (1412745250)
23:14:23:Lustre: 7852:0:(osd_internal.h:1048:osd_trans_exec_check()) op 9: used 4, used now 4, reserved 1
23:14:23:Lustre: 7852:0:(osd_handler.c:985:osd_trans_dump_creds())   create: 1/4/0, destroy: 0/0/0
23:14:23:Lustre: 7852:0:(osd_handler.c:992:osd_trans_dump_creds())   attr_set: 1/1/0, xattr_set: 1/64/0
23:14:23:Lustre: 7852:0:(osd_handler.c:1002:osd_trans_dump_creds())   write: 2/4/0, punch: 0/0/0, quota 2/2/0
23:14:23:Lustre: 7852:0:(osd_handler.c:1009:osd_trans_dump_creds())   insert: 3/50/0, delete: 1/1/4
23:14:23:Lustre: 7852:0:(osd_handler.c:1016:osd_trans_dump_creds())   ref_add: 0/0/0, ref_del: 0/0/0
23:14:23:LustreError: 7852:0:(osd_internal.h:1050:osd_trans_exec_check()) LBUG
23:14:23:Pid: 7852, comm: mdt_out00_002
23:14:23:
23:14:23:Call Trace:
23:14:23: [<ffffffffa0760895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
23:14:23: [<ffffffffa0760e97>] lbug_with_loc+0x47/0xb0 [libcfs]
23:14:23: [<ffffffffa1396dfc>] osd_index_ea_delete+0x9cc/0xd80 [osd_ldiskfs]
23:14:23: [<ffffffffa0dff316>] out_obj_index_delete+0x156/0x1f0 [ptlrpc]
23:14:23: [<ffffffffa0e061f8>] out_tx_index_delete_exec+0x28/0x170 [ptlrpc]
23:14:23: [<ffffffffa0dfea2a>] out_tx_end+0xda/0x5c0 [ptlrpc]
23:14:23: [<ffffffffa0e05480>] out_handle+0x7c0/0xe50 [ptlrpc]
23:14:23: [<ffffffffa0dfba9e>] tgt_request_handle+0x71e/0xb10 [ptlrpc]
23:14:23: [<ffffffffa0dab384>] ptlrpc_main+0xe64/0x1990 [ptlrpc]
23:14:23: [<ffffffff8109abf6>] kthread+0x96/0xa0
Comment by Andreas Dilger [ 24/Oct/14 ]
01:41:14:Lustre: DEBUG MARKER: == sanityn test 40a: pdirops: create vs others ================ 07:40:48 (1413358848)
01:41:14:Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000145
01:41:14:LustreError: 7196:0:(fail.c:132:__cfs_fail_timeout_set()) cfs_fail_timeout id 145 sleeping for 15000ms
01:41:14:LustreError: 7196:0:(fail.c:136:__cfs_fail_timeout_set()) cfs_fail_timeout id 145 awake
01:41:14:Lustre: 7195:0:(osd_internal.h:1048:osd_trans_exec_check()) op 9: used 6, used now 6, reserved 1
01:41:14:Lustre: 7195:0:(osd_handler.c:985:osd_trans_dump_creds())   create: 0/0/0, destroy: 0/0/0
01:41:14:Lustre: 7195:0:(osd_handler.c:992:osd_trans_dump_creds())   attr_set: 2/2/0, xattr_set: 1/64/0
01:41:14:Lustre: 7195:0:(osd_handler.c:1002:osd_trans_dump_creds())   write: 2/4/0, punch: 0/0/0, quota 2/2/0
01:41:14:Lustre: 7195:0:(osd_handler.c:1009:osd_trans_dump_creds())   insert: 0/0/0, delete: 1/1/6
01:41:14:Lustre: 7195:0:(osd_handler.c:1016:osd_trans_dump_creds())   ref_add: 0/0/0, ref_del: 1/1/0
01:41:14:LustreError: 7195:0:(osd_internal.h:1050:osd_trans_exec_check()) LBUG
01:41:14:Pid: 7195, comm: mdt_out00_002
01:41:14:
01:41:14:Call Trace:
01:41:14: [<ffffffffa0483895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
01:41:14: [<ffffffffa0483e97>] lbug_with_loc+0x47/0xb0 [libcfs]
01:41:14: [<ffffffffa0d2cdfc>] osd_index_ea_delete+0x9cc/0xd80 [osd_ldiskfs]
01:41:14: [<ffffffffa0884316>] out_obj_index_delete+0x156/0x1f0 [ptlrpc]
01:41:14: [<ffffffffa088b1f8>] out_tx_index_delete_exec+0x28/0x170 [ptlrpc]
01:41:14: [<ffffffffa0883a2a>] out_tx_end+0xda/0x5c0 [ptlrpc]
01:41:14: [<ffffffffa088a480>] out_handle+0x7c0/0xe50 [ptlrpc]
01:41:14: [<ffffffffa0880a9e>] tgt_request_handle+0x71e/0xb10 [ptlrpc]
01:41:14: [<ffffffffa0830384>] ptlrpc_main+0xe64/0x1990 [ptlrpc]
Comment by Andreas Dilger [ 24/Oct/14 ]
18:47:27:Lustre: DEBUG MARKER: == sanity-lfsck test 22a: LFSCK can repair unmatched pairs (1) == 00:47:21 (1412729241)
18:47:27:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x161e
18:47:27:Lustre: *** cfs_fail_loc=161e, val=0***
18:47:27:Lustre: Skipped 3 previous similar messages
18:47:27:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0
18:47:27:Lustre: DEBUG MARKER: /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace -A -r
18:47:27:Lustre: 21068:0:(osd_internal.h:1048:osd_trans_exec_check()) op 9: used 4, used now 4, reserved 1
18:47:27:LustreError: 21068:0:(osd_internal.h:1050:osd_trans_exec_check()) LBUG
18:47:27:Pid: 21068, comm: lfsck_namespace
18:47:27:
18:47:27:Call Trace:
18:47:27: [<ffffffffa0483895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
18:47:27: [<ffffffffa0483e97>] lbug_with_loc+0x47/0xb0 [libcfs]
18:47:27: [<ffffffffa0d2cdfc>] osd_index_ea_delete+0x9cc/0xd80 [osd_ldiskfs]
18:47:27: [<ffffffffa0fbe2d7>] lod_index_delete+0xa7/0x140 [lod]
18:47:27: [<ffffffffa0e50005>] dt_delete+0x95/0x1a0 [lfsck]
18:47:27: [<ffffffffa0e56182>] lfsck_namespace_repair_unmatched_pairs+0x6e2/0x840 [lfsck]
18:47:27: [<ffffffffa0e56939>] lfsck_namespace_dsd_single+0x659/0x7e0 [lfsck]
18:47:27: [<ffffffffa0e57309>] lfsck_namespace_double_scan_dir+0x849/0x1590 [lfsck]
18:47:27: [<ffffffffa0e5cb44>] lfsck_namespace_double_scan_one+0x2f4/0x1250 [lfsck]
18:47:27: [<ffffffffa0e5e278>] lfsck_namespace_assistant_handler_p2+0x7d8/0xa50 [lfsck]
18:47:27: [<ffffffffa0e4b9ed>] lfsck_assistant_engine+0x130d/0x1c50 [lfsck]
Comment by Andreas Dilger [ 24/Oct/14 ]
04:05:40:Lustre: DEBUG MARKER: == sanity test 1: mkdir; remkdir; rmdir ================================ 10:05:29 (1412589929)
04:05:40:Lustre: cli-ctl-lustre-MDT0001: Allocated super-sequence [0x0000000440000400-0x0000000480000400):1:mdt]
04:05:40:Lustre: 3448:0:(osd_internal.h:1048:osd_trans_exec_check()) op 9: used 6, used now 6, reserved 1
04:05:40:Lustre: 3448:0:(osd_handler.c:985:osd_trans_dump_creds())   create: 0/0/0, destroy: 0/0/0
04:05:40:Lustre: 3448:0:(osd_handler.c:992:osd_trans_dump_creds())   attr_set: 2/2/0, xattr_set: 1/64/0
04:05:40:Lustre: 3448:0:(osd_handler.c:1002:osd_trans_dump_creds())   write: 2/4/0, punch: 0/0/0, quota 2/2/0
04:05:40:Lustre: 3448:0:(osd_handler.c:1009:osd_trans_dump_creds())   insert: 0/0/0, delete: 1/1/6
04:05:40:Lustre: 3448:0:(osd_handler.c:1016:osd_trans_dump_creds())   ref_add: 0/0/0, ref_del: 1/1/0
04:05:40:LustreError: 3448:0:(osd_internal.h:1050:osd_trans_exec_check()) LBUG
04:05:40:Pid: 3448, comm: mdt_out00_001
04:05:40:
04:05:40:Call Trace:
04:05:40: [<ffffffffa0483895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
04:05:40: [<ffffffffa0483e97>] lbug_with_loc+0x47/0xb0 [libcfs]
04:05:41: [<ffffffffa0d2cdfc>] osd_index_ea_delete+0x9cc/0xd80 [osd_ldiskfs]
04:05:41: [<ffffffffa0884316>] out_obj_index_delete+0x156/0x1f0 [ptlrpc]
04:05:41: [<ffffffffa088b1f8>] out_tx_index_delete_exec+0x28/0x170 [ptlrpc]
04:05:41: [<ffffffffa0883a2a>] out_tx_end+0xda/0x5c0 [ptlrpc]
04:05:41: [<ffffffffa088a480>] out_handle+0x7c0/0xe50 [ptlrpc]
04:05:41: [<ffffffffa0880a9e>] tgt_request_handle+0x71e/0xb10 [ptlrpc]
04:05:41: [<ffffffffa0830384>] ptlrpc_main+0xe64/0x1990 [ptlrpc]
Comment by Alex Zhuravlev [ 25/Oct/14 ]

if I understand correctly, this is an delete in the index where an agent inode needs to be removed.

Comment by Gerrit Updater [ 17/Nov/14 ]

Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/12169
Subject: LU-5794 osd: additional checks to verify credits calculation
Project: fs/lustre-release
Branch: master
Current Patch Set: 13
Commit: 6e77a93dca837d9f395bced9262ad0de2e2c81f7

Comment by Gerrit Updater [ 17/Nov/14 ]

Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/12169
Subject: LU-5794 osd: additional checks to verify credits calculation
Project: fs/lustre-release
Branch: master
Current Patch Set: 14
Commit: 3db6478913dac65b7bb8c9585f0ab56c39e1b192

Comment by Gerrit Updater [ 10/Jul/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12169/
Subject: LU-5794 osd: additional checks to verify credits calculation
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 92d28a7f35bcdc718e2bdcec48bcdeacd0e8f4fe

Comment by Peter Jones [ 10/Jul/15 ]

Landed for 2.8

Generated at Sat Feb 10 01:54:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.