[LU-4798] Too many transaction credits (28288 > 25600) Created: 21/Mar/14  Updated: 24/Mar/14  Resolved: 24/Mar/14

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.2
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Sebastien Buisson (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-4611 too many transaction credits (32279 >... Resolved
Severity: 3
Rank (Obsolete): 13203

 Description   

Hi,

When working with files striped across a large number of OSTs, CEA can see messages like:

Lustre: 12572:0:(osd_handler.c:833:osd_trans_start()) scratch3-MDT0000: too many transaction credits (28288 > 25600)
Lustre: 12572:0:(osd_handler.c:840:osd_trans_start()) create: 160/4000, delete: 2/35, destroy: 1/25
Lustre: 12572:0:(osd_handler.c:845:osd_trans_start()) attr_set: 2/2, xattr_set: 161/2254
Lustre: 12572:0:(osd_handler.c:852:osd_trans_start()) write: 1282/17948, punch: 320/1280, quota 4/4
Lustre: 12572:0:(osd_handler.c:857:osd_trans_start()) insert: 161/2736, delete: 1/25
Lustre: 12572:0:(osd_handler.c:862:osd_trans_start()) ref_add: 1/1, ref_del: 3/3
Pid: 12572, comm: mdt01_005

Call Trace:
 [<ffffffffa041b895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa0c3b31e>] osd_trans_start+0x65e/0x680 [osd_ldiskfs]
 [<ffffffffa0d3e309>] lod_trans_start+0x1b9/0x250 [lod]
 [<ffffffffa08d6357>] mdd_trans_start+0x17/0x20 [mdd]
 [<ffffffffa08cb3be>] mdd_unlink+0x41e/0xe30 [mdd]
 [<ffffffffa0cc2da8>] mdo_unlink+0x18/0x50 [mdt]
 [<ffffffffa0cc6280>] mdt_reint_unlink+0x820/0x1010 [mdt]
 [<ffffffffa0cc2aa1>] mdt_reint_rec+0x41/0xe0 [mdt]
 [<ffffffffa0ca7c73>] mdt_reint_internal+0x4c3/0x780 [mdt]
 [<ffffffffa0ca7f74>] mdt_reint+0x44/0xe0 [mdt]
 [<ffffffffa0cacc27>] mdt_handle_common+0x647/0x16d0 [mdt]
 [<ffffffffa0788b9c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
 [<ffffffffa0ce6835>] mds_regular_handle+0x15/0x20 [mdt]
 [<ffffffffa07983b8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
 [<ffffffffa041c5de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
 [<ffffffffa042dd9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
 [<ffffffffa078f719>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
 [<ffffffff81058bd3>] ? __wake_up+0x53/0x70
 [<ffffffffa079974e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
 [<ffffffffa0798c80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffffa0798c80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffffa0798c80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

This issue seems very similar to LU-4611, for which we would need a backport in b2_4 when a proper fix is identified.

Thanks,
Sebastien.



 Comments   
Comment by Andreas Dilger [ 21/Mar/14 ]

Please see http://review.whamcloud.com/9258

Comment by Peter Jones [ 22/Mar/14 ]

Niu

Could you please confirm whether this is a duplicate of LU-4611 and port the patch to b2_4?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 24/Mar/14 ]

Yes, it's duplicate of LU-4611, I'll back port the fix once it landed on master.

Generated at Sat Feb 10 01:45:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.