Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5250

OSSes with LU-4611: hitting J_ASSERT_JH(jh, handle->h_buffer_credits > 0)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.6.0, Lustre 2.5.2
    • None
    • 3
    • 14647

    Description

      Since pulling LU-4611 in to Cray's b2_5, we have begun hitting this assertion in jbd2_journal_dirty_metadata (kernel: fs/jdb2/transaction.c):
      J_ASSERT_JH(jh, handle->h_buffer_credits > 0);

      This bug kills an OSS, then the OSS hits it on startup after that. If we back off to a version without LU-4611 and start the OSS, it works fine. Then we can go back to a version with LU-4611 and start successfully. I assume the OSS is attempting to re-do the problematic operation each time, which is why starting once with an old version clears things up.

      We're hitting these problems down a setattr & quota related path:
      > [exception RIP: jbd2_journal_dirty_metadata+268]
      > RIP: ffffffffa02cc86c RSP: ffff88087be375e0 RFLAGS: 00010246
      > RAX: ffff8806485b3bc0 RBX: ffff8806f520d588 RCX: ffff88084223bcf8
      > RDX: 0000000000000000 RSI: ffff88084223bcf8 RDI: 0000000000000000
      > RBP: ffff88087be37600 R8: f010000000000000 R9: f79fde5390e73e02
      > R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801eb760748
      > R13: ffff88084223bcf8 R14: ffff88086b22d800 R15: 0000000000000c00
      > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
      > #4 [ffff88087be37608] __ldiskfs_handle_dirty_metadata at ffffffffa02ee0bb [ldiskfs]
      > #5 [ffff88087be37648] ldiskfs_quota_write at ffffffffa0324b95 [ldiskfs]
      > #6 [ffff88087be376b8] write_blk at ffffffff811e44ae
      > #7 [ffff88087be376c8] remove_tree at ffffffff811e4da1
      > #8 [ffff88087be37738] remove_tree at ffffffff811e4bf8
      > #9 [ffff88087be377a8] remove_tree at ffffffff811e4bf8
      > #10 [ffff88087be37818] qtree_delete_dquot at ffffffff811e4fe3
      > #11 [ffff88087be37838] qtree_release_dquot at ffffffff811e501f
      > #12 [ffff88087be37848] v2_release_dquot at ffffffff811e3cc0
      > #13 [ffff88087be37858] dquot_release at ffffffff811df8e5
      > #14 [ffff88087be37898] ldiskfs_release_dquot at ffffffffa03235be [ldiskfs]
      > #15 [ffff88087be378b8] dqput at ffffffff811e0489
      > #16 [ffff88087be378e8] dquot_transfer at ffffffff811e3253
      > #17 [ffff88087be379c8] vfs_dq_transfer at ffffffff811dfc0c
      > #18 [ffff88087be379e8] osd_quota_transfer at ffffffffa0ba98a5 [osd_ldiskfs]
      > #19 [ffff88087be37a58] osd_attr_set at ffffffffa0bbcb8a [osd_ldiskfs]
      > #20 [ffff88087be37ab8] dt_attr_set.clone.2 at ffffffffa083a969 [ofd]
      > #21 [ffff88087be37ac8] ofd_attr_set at ffffffffa083e472 [ofd]
      > #22 [ffff88087be37b28] ofd_setattr at ffffffffa082fe68 [ofd]
      > #23 [ffff88087be37bb8] ost_setattr at ffffffffa06461fb [ost]
      > #24 [ffff88087be37c18] ost_handle at ffffffffa06491fd [ost]
      > #25 [ffff88087be37d68] ptlrpc_server_handle_request at ffffffffa06df4d5 [ptlrpc]
      > #26 [ffff88087be37e48] ptlrpc_main at ffffffffa06e083d [ptlrpc]
      > #27 [ffff88087be37ee8] kthread at ffffffff81096136
      > #28 [ffff88087be37f48] kernel_thread at ffffffff8100c0ca
      > #0 [ffff88087be37400] die at ffffffff8100f18b

      Looking in to LU-4611, I think I've found the issue, in osd_declare_xattr_set
      http://review.whamcloud.com/#/c/10407/2/lustre/osd-ldiskfs/osd_handler.c,cm

      	/* optimistic optimization: LMA is set first and usually fit inode */
      	if (strcmp(name, XATTR_NAME_LMA) == 0) {
      		if (dt_object_exists(dt))
      			credits = 0;
      		else
      			credits = 1;
      	} else if (strcmp(name, XATTR_NAME_VERSION) == 0) {
      

      Specifically, the "credits = 0" optimization for XATTR_NAME_LMA. There doesn't appear to be any special handling for this, so I think this is the path that eventually ends at our assertion.

      If this is the problem, my question is, what's the correct number of credits here? Is it just 1, or some other number?

      One other note of concern here.
      This comment in osd_handler.c above where the osd credits are declared says some strange things about quotas, as does the name "...credits_noquota".

      Now that quota accounting is on by default, what does this mean and does it relate to this issue?
      /**

      • Note: we do not count into QUOTA here.
      • If we mount with --data_journal we may need more.
        */
        const int osd_dto_credits_noquota[DTO_NR] = {

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              paf Patrick Farrell
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: