[LU-4479] Credit error with unlink for large number OSTs Created: 14/Jan/14  Updated: 20/Jun/16  Resolved: 20/Jun/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Alexey Lyashkov Assignee: Niu Yawei (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Environment:

OSTCOUNT=40 sh sanity.sh


Issue Links:
Duplicate
duplicates LU-4611 too many transaction credits (32279 >... Resolved
Severity: 3
Rank (Obsolete): 12270

 Description   

Lustre: lustre-OST0027: deleting orphan objects from 0x0:216 to 0x0:257
Lustre: 23291:0:(osd_handler.c:897:osd_trans_start()) lustre-MDT0000: too many transaction credits (7166 > 1024)
Lustre: 23291:0:(osd_handler.c:904:osd_trans_start()) create: 40/1000, delete: 2/35, destroy: 1/25
Lustre: 23291:0:(osd_handler.c:909:osd_trans_start()) attr_set: 2/2, xattr_set: 41/574
Lustre: 23291:0:(osd_handler.c:916:osd_trans_start()) write: 322/4508, punch: 80/320, quota 2/2
Lustre: 23291:0:(osd_handler.c:921:osd_trans_start()) insert: 41/696, delete: 1/25
Lustre: 23291:0:(osd_handler.c:926:osd_trans_start()) ref_add: 1/1, ref_del: 3/3
Pid: 23291, comm: mdt00_004

Call Trace:
[<ffffffffa03ea8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa0c1f393>] osd_trans_start+0x653/0x670 [osd_ldiskfs]
[<ffffffffa0e26529>] lod_trans_start+0x1b9/0x250 [lod]
[<ffffffffa0d1b187>] mdd_trans_start+0x17/0x20 [mdd]
[<ffffffffa0d0ff6f>] mdd_unlink+0x42f/0xd20 [mdd]
[<ffffffffa0d77d91>] ? mdt_reint_unlink+0x871/0x10f0 [mdt]
[<ffffffffa0d74738>] mdo_unlink+0x18/0x50 [mdt]
[<ffffffffa0d77dc8>] mdt_reint_unlink+0x8a8/0x10f0 [mdt]
[<ffffffffa0d74431>] mdt_reint_rec+0x41/0xe0 [mdt]
[<ffffffffa0d5ad53>] mdt_reint_internal+0x4c3/0x780 [mdt]
[<ffffffffa0d5b59b>] mdt_reint+0x6b/0x120 [mdt]
[<ffffffffa081b995>] tgt_request_handle+0x245/0xad0 [ptlrpc]
[<ffffffffa07cc261>] ptlrpc_main+0xce1/0x1960 [ptlrpc]
[<ffffffffa07cb580>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
[<ffffffff810996b6>] kthread+0x96/0xa0
[<ffffffff8100c1ca>] child_rip+0xa/0x20
[<ffffffff81534750>] ? _spin_unlock_irq+0x30/0x40
[<ffffffff8100bb10>] ? restore_args+0x0/0x30
[<ffffffff81099620>] ? kthread+0x0/0xa0
[<ffffffff8100c1c0>] ? child_rip+0x0/0x20s



 Comments   
Comment by Andreas Dilger [ 14/Jan/14 ]

Note that this debug message is only printed when transaction credit checking is enabled during test-framework test runs (i.e. llmount.sh and friends). This debugging is not enabled by default for normal users.

The core problem is that your MDT journal is too small for so many OSTs - only 16 MB. At the same time, 7166 blocks is probably far more than what is actually needed for transaction credits on a small MDT filesystem.

Two options exist to fix this :

  • improve credit calculations to avoid overhead that is counted multiple times (e.g. superblock, extra llog files, and such), but I don't think that will remove 6000+ blocks from the transaction. The benefit is that could actually improve MDS performance, since it will not request as many extra blocks, and avoid committing/checkpointing transactions early.
  • fix test-framework.sh to format a larger journal when there are so many OSTs. This is more like real life, but doesn't fix the root problem of such large transactions (they only get worse with more MDTs)
Comment by Alexey Lyashkov [ 12/Feb/14 ]

It's don't only limited to the unlink

Lustre: 26062:0:(osd_handler.c:926:osd_trans_start())   ref_add: 0/0, ref_del: 0/0
Pid: 26062, comm: mdt01_005

Call Trace:
 [<ffffffffa03ea8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa0c1f393>] osd_trans_start+0x653/0x670 [osd_ldiskfs]
 [<ffffffffa0e26529>] lod_trans_start+0x1b9/0x250 [lod]
 [<ffffffffa0d1b187>] mdd_trans_start+0x17/0x20 [mdd]
 [<ffffffffa0d0ecc9>] mdd_create+0x929/0x17a0 [mdd]
 [<ffffffffa0c27cb7>] ? osd_xattr_get+0x97/0x2e0 [osd_ldiskfs]
 [<ffffffffa0d7f968>] mdo_create+0x18/0x50 [mdt]
 [<ffffffffa0d89bda>] mdt_reint_open+0x12aa/0x2120 [mdt]
 [<ffffffff8153480b>] ? _spin_unlock+0x2b/0x40
 [<ffffffffa04073fc>] ? upcall_cache_get_entry+0x3dc/0x8a0 [libcfs]
 [<ffffffffa05aab00>] ? lu_ucred_global_init+0x20/0x30 [obdclass]
 [<ffffffffa0d74431>] mdt_reint_rec+0x41/0xe0 [mdt]
 [<ffffffffa0d5ad53>] mdt_reint_internal+0x4c3/0x780 [mdt]
 [<ffffffffa0d5b205>] mdt_intent_reint+0x1f5/0x520 [mdt]
 [<ffffffffa0d59929>] mdt_intent_policy+0x499/0xca0 [mdt]
 [<ffffffffa0771502>] ldlm_lock_enqueue+0x2f2/0x8b0 [ptlrpc]
 [<ffffffffa079acff>] ldlm_handle_enqueue0+0x4ff/0x10d0 [ptlrpc]
 [<ffffffffa081b5e2>] tgt_enqueue+0x62/0x1d0 [ptlrpc]
 [<ffffffffa081b995>] tgt_request_handle+0x245/0xad0 [ptlrpc]
 [<ffffffffa07cc261>] ptlrpc_main+0xce1/0x1960 [ptlrpc]
 [<ffffffffa07cb580>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
 [<ffffffff810996b6>] kthread+0x96/0xa0
 [<ffffffff8100c1ca>] child_rip+0xa/0x20
 [<ffffffff81534750>] ? _spin_unlock_irq+0x30/0x40
 [<ffffffff8100bb10>] ? restore_args+0x0/0x30
 [<ffffffff81099620>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20


Pid: 10821, comm: ldlm_elt

Call Trace:
 [<ffffffffa03ea8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa0c1f393>] osd_trans_start+0x653/0x670 [osd_ldiskfs]
 [<ffffffffa0e26529>] lod_trans_start+0x1b9/0x250 [lod]
 [<ffffffffa0d1b187>] mdd_trans_start+0x17/0x20 [mdd]
 [<ffffffffa0d02ec0>] mdd_close+0x250/0xb80 [mdd]
 [<ffffffffa0d83269>] mdt_mfd_close+0x349/0x1bb0 [mdt]
 [<ffffffffa058a16b>] ? keys_fini+0xcb/0x110 [obdclass]
 [<ffffffffa058a1dd>] ? lu_context_fini+0x2d/0xb0 [obdclass]
 [<ffffffffa0d56551>] ? mdt_ctxt_add_dirty_flag+0x141/0x190 [mdt]
 [<ffffffffa0d5693a>] mdt_obd_disconnect+0x39a/0x530 [mdt]
 [<ffffffffa0554140>] class_fail_export+0x250/0x560 [obdclass]
 [<ffffffffa07959f4>] expired_lock_main+0x4a4/0x870 [ptlrpc]
 [<ffffffff81064650>] ? default_wake_function+0x0/0x20
 [<ffffffffa0795550>] ? expired_lock_main+0x0/0x870 [ptlrpc]
 [<ffffffff810996b6>] kthread+0x96/0xa0
 [<ffffffff8100c1ca>] child_rip+0xa/0x20
 [<ffffffff81534750>] ? _spin_unlock_irq+0x30/0x40
 [<ffffffff8100bb10>] ? restore_args+0x0/0x30
 [<ffffffff81099620>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20

it's global problem in lustre.

Comment by Andreas Dilger [ 12/Feb/14 ]

See comments for overall issues in transaction accounting in LU-4611. I think there are a number of cases where the OSD operations are accounting for transaction credits that are only useful for the OST or the MDT, but not the reverse.

create: 40/1000, delete: 2/35, destroy: 1/25
attr_set: 2/2, xattr_set: 41/574
write: 322/4508, punch: 80/320, quota 2/2
insert: 41/696, delete: 1/25
ref_add: 1/1, ref_del: 3/3

It looks like one obvious fix for the unlink transaction accounting here is to avoid registering for punch operations when the inode has no blocks allocated. That would avoid accounting for punch operations on the MDT objects, which almost never have blocks allocated except for widely-striped files.

Similarly, it isn't necessary that there are credits for create (for llog destroy records) reserved for every unlink, only when there is no current llog file for that OSP or if the current llog is close to being full.

I think this can be fixed at the same time as LU-4611, but I think that some of the changes are unlink specific so I'm keeping this bug separate for now.

Comment by James A Simmons [ 09/Sep/15 ]

Just hit this in my testing. It was with sanity test 300k

Comment by Gu Zheng (Inactive) [ 16/Jun/16 ]

Hmm...hit similar problem on our local testing (racer.sh on rhel7.2):
Jun 15 12:00:31 localhost kernel: Lustre: 30911:0:(osd_handler.c:1092:osd_trans_start()) lustre-MDT0001: too many transaction credits (533 > 1024)
Jun 15 12:00:31 localhost kernel: Lustre: 30911:0:(osd_handler.c:1097:osd_trans_start()) create: 6/24, destroy: 0/0
Jun 15 12:00:31 localhost kernel: Lustre: 30911:0:(osd_handler.c:1102:osd_trans_start()) attr_set: 2/2, xattr_set: 5/57
Jun 15 12:00:31 localhost kernel: Lustre: 30911:0:(osd_handler.c:1109:osd_trans_start()) write: 2/4, punch: 0/0, quota 2/2
Jun 15 12:00:31 localhost kernel: Lustre: 30911:0:(osd_handler.c:1114:osd_trans_start()) insert: 18/240, delete: 0/0
Jun 15 12:00:31 localhost kernel: Lustre: 30911:0:(osd_handler.c:1119:osd_trans_start()) ref_add: 4/4, ref_del: 0/0
Jun 15 12:00:31 localhost kernel: Pid: 30911, comm: mdt00_002
Jun 15 12:00:31 localhost kernel: #012Call Trace:
Jun 15 12:00:31 localhost kernel: [<ffffffffa08bb853>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
Jun 15 12:00:31 localhost kernel: [<ffffffffa0f49c92>] osd_trans_start+0x642/0x670 [osd_ldiskfs]
Jun 15 12:00:31 localhost kernel: [<ffffffffa1133537>] lod_trans_start+0x67/0xf0 [lod]
Jun 15 12:00:31 localhost kernel: [<ffffffffa11be5a4>] mdd_trans_start+0x14/0x20 [mdd]
Jun 15 12:00:31 localhost kernel: [<ffffffffa11a864d>] mdd_create+0x9ed/0x12a0 [mdd]
Jun 15 12:00:31 localhost kernel: [<ffffffffa109f2ce>] mdt_reint_create+0xb3e/0xc70 [mdt]
Jun 15 12:00:31 localhost kernel: [<ffffffffa0b0937e>] ? lu_ucred+0x1e/0x30 [obdclass]
Jun 15 12:00:31 localhost kernel: [<ffffffffa108ff35>] ? mdt_ucred+0x15/0x20 [mdt]
Jun 15 12:00:31 localhost kernel: [<ffffffffa1090211>] ? mdt_root_squash+0x21/0x410 [mdt]
Jun 15 12:00:31 localhost kernel: [<ffffffffa0d4c0bf>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc]
Jun 15 12:00:31 localhost kernel: [<ffffffffa10a0ce0>] mdt_reint_rec+0x80/0x210 [mdt]
Jun 15 12:00:31 localhost kernel: [<ffffffffa1082699>] mdt_reint_internal+0x5d9/0xb40 [mdt]
Jun 15 12:00:31 localhost kernel: [<ffffffffa108cc27>] mdt_reint+0x67/0x140 [mdt]
Jun 15 12:00:31 localhost kernel: [<ffffffffa0db583b>] tgt_request_handle+0x8fb/0x11f0 [ptlrpc]
Jun 15 12:00:31 localhost kernel: [<ffffffffa0d5896b>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
Jun 15 12:00:31 localhost kernel: [<ffffffffa0d55a38>] ? ptlrpc_wait_event+0x98/0x330 [ptlrpc]
Jun 15 12:00:31 localhost kernel: [<ffffffff810af018>] ? __wake_up_common+0x58/0x90
Jun 15 12:00:31 localhost kernel: [<ffffffffa0d5c290>] ptlrpc_main+0xc00/0x1f50 [ptlrpc]
Jun 15 12:00:31 localhost kernel: [<ffffffffa0d5b690>] ? ptlrpc_main+0x0/0x1f50 [ptlrpc]
Jun 15 12:00:31 localhost kernel: [<ffffffff810a5acf>] kthread+0xcf/0xe0
Jun 15 12:00:31 localhost kernel: [<ffffffff810a5a00>] ? kthread+0x0/0xe0
Jun 15 12:00:31 localhost kernel: [<ffffffff81646018>] ret_from_fork+0x58/0x90
Jun 15 12:00:31 localhost kernel: [<ffffffff810a5a00>] ? kthread+0x0/0xe0
Jun 15 12:00:31 localhost kernel:

Lustre is based on 2.7.15.3, with additional patch, 1 MGS, 2 OSTs, 2 Clients

Comment by Niu Yawei (Inactive) [ 20/Jun/16 ]

Hmm...hit similar problem on our local testing (racer.sh on rhel7.2):
Jun 15 12:00:31 localhost kernel: Lustre: 30911:0:(osd_handler.c:1092:osd_trans_start()) lustre-MDT0001: too many transaction credits (533 > 1024)

This looks a tiny testing system with 1024 maximum journal buffers, it's not surprise to see such kind of warning messages. In the latest master code, such messages have been suppressed for tiny systems.

Comment by Niu Yawei (Inactive) [ 20/Jun/16 ]

dup of LU-4611.

Generated at Sat Feb 10 01:43:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.