[LU-5250] OSSes with LU-4611: hitting J_ASSERT_JH(jh, handle->h_buffer_credits > 0) - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.6.0, Lustre 2.5.2
Labels:
None

Severity:
3
Rank (Obsolete):
14647

Description

Since pulling ~~LU-4611~~ in to Cray's b2_5, we have begun hitting this assertion in jbd2_journal_dirty_metadata (kernel: fs/jdb2/transaction.c):
J_ASSERT_JH(jh, handle->h_buffer_credits > 0);

This bug kills an OSS, then the OSS hits it on startup after that. If we back off to a version without ~~LU-4611~~ and start the OSS, it works fine. Then we can go back to a version with ~~LU-4611~~ and start successfully. I assume the OSS is attempting to re-do the problematic operation each time, which is why starting once with an old version clears things up.

We're hitting these problems down a setattr & quota related path:
> [exception RIP: jbd2_journal_dirty_metadata+268]
> RIP: ffffffffa02cc86c RSP: ffff88087be375e0 RFLAGS: 00010246
> RAX: ffff8806485b3bc0 RBX: ffff8806f520d588 RCX: ffff88084223bcf8
> RDX: 0000000000000000 RSI: ffff88084223bcf8 RDI: 0000000000000000
> RBP: ffff88087be37600 R8: f010000000000000 R9: f79fde5390e73e02
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801eb760748
> R13: ffff88084223bcf8 R14: ffff88086b22d800 R15: 0000000000000c00
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #4 [ffff88087be37608] __ldiskfs_handle_dirty_metadata at ffffffffa02ee0bb [ldiskfs]
> #5 [ffff88087be37648] ldiskfs_quota_write at ffffffffa0324b95 [ldiskfs]
> #6 [ffff88087be376b8] write_blk at ffffffff811e44ae
> #7 [ffff88087be376c8] remove_tree at ffffffff811e4da1
> #8 [ffff88087be37738] remove_tree at ffffffff811e4bf8
> #9 [ffff88087be377a8] remove_tree at ffffffff811e4bf8
> #10 [ffff88087be37818] qtree_delete_dquot at ffffffff811e4fe3
> #11 [ffff88087be37838] qtree_release_dquot at ffffffff811e501f
> #12 [ffff88087be37848] v2_release_dquot at ffffffff811e3cc0
> #13 [ffff88087be37858] dquot_release at ffffffff811df8e5
> #14 [ffff88087be37898] ldiskfs_release_dquot at ffffffffa03235be [ldiskfs]
> #15 [ffff88087be378b8] dqput at ffffffff811e0489
> #16 [ffff88087be378e8] dquot_transfer at ffffffff811e3253
> #17 [ffff88087be379c8] vfs_dq_transfer at ffffffff811dfc0c
> #18 [ffff88087be379e8] osd_quota_transfer at ffffffffa0ba98a5 [osd_ldiskfs]
> #19 [ffff88087be37a58] osd_attr_set at ffffffffa0bbcb8a [osd_ldiskfs]
> #20 [ffff88087be37ab8] dt_attr_set.clone.2 at ffffffffa083a969 [ofd]
> #21 [ffff88087be37ac8] ofd_attr_set at ffffffffa083e472 [ofd]
> #22 [ffff88087be37b28] ofd_setattr at ffffffffa082fe68 [ofd]
> #23 [ffff88087be37bb8] ost_setattr at ffffffffa06461fb [ost]
> #24 [ffff88087be37c18] ost_handle at ffffffffa06491fd [ost]
> #25 [ffff88087be37d68] ptlrpc_server_handle_request at ffffffffa06df4d5 [ptlrpc]
> #26 [ffff88087be37e48] ptlrpc_main at ffffffffa06e083d [ptlrpc]
> #27 [ffff88087be37ee8] kthread at ffffffff81096136
> #28 [ffff88087be37f48] kernel_thread at ffffffff8100c0ca
> #0 [ffff88087be37400] die at ffffffff8100f18b

Looking in to ~~LU-4611~~, I think I've found the issue, in osd_declare_xattr_set
http://review.whamcloud.com/#/c/10407/2/lustre/osd-ldiskfs/osd_handler.c,cm

	/* optimistic optimization: LMA is set first and usually fit inode */
	if (strcmp(name, XATTR_NAME_LMA) == 0) {
		if (dt_object_exists(dt))
			credits = 0;
		else
			credits = 1;
	} else if (strcmp(name, XATTR_NAME_VERSION) == 0) {

Specifically, the "credits = 0" optimization for XATTR_NAME_LMA. There doesn't appear to be any special handling for this, so I think this is the path that eventually ends at our assertion.

If this is the problem, my question is, what's the correct number of credits here? Is it just 1, or some other number?

One other note of concern here.
This comment in osd_handler.c above where the osd credits are declared says some strange things about quotas, as does the name "...credits_noquota".

Now that quota accounting is on by default, what does this mean and does it relate to this issue?
/**

Note: we do not count into QUOTA here.
If we mount with --data_journal we may need more.
*/
const int osd_dto_credits_noquota[DTO_NR] = {

Attachments

Issue Links

is related to

LU-5040 kernel BUG at fs/jbd2/transaction.c:1033

Resolved

Activity

People

Assignee:: WC Triage

Reporter:: Patrick Farrell (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 24/Jun/14 8:35 PM

Updated:: 10/Oct/21 9:27 PM

Resolved:: 10/Oct/21 9:27 PM