Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5250

OSSes with LU-4611: hitting J_ASSERT_JH(jh, handle->h_buffer_credits > 0)

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.6.0, Lustre 2.5.2
    • None
    • 3
    • 14647

    Description

      Since pulling LU-4611 in to Cray's b2_5, we have begun hitting this assertion in jbd2_journal_dirty_metadata (kernel: fs/jdb2/transaction.c):
      J_ASSERT_JH(jh, handle->h_buffer_credits > 0);

      This bug kills an OSS, then the OSS hits it on startup after that. If we back off to a version without LU-4611 and start the OSS, it works fine. Then we can go back to a version with LU-4611 and start successfully. I assume the OSS is attempting to re-do the problematic operation each time, which is why starting once with an old version clears things up.

      We're hitting these problems down a setattr & quota related path:
      > [exception RIP: jbd2_journal_dirty_metadata+268]
      > RIP: ffffffffa02cc86c RSP: ffff88087be375e0 RFLAGS: 00010246
      > RAX: ffff8806485b3bc0 RBX: ffff8806f520d588 RCX: ffff88084223bcf8
      > RDX: 0000000000000000 RSI: ffff88084223bcf8 RDI: 0000000000000000
      > RBP: ffff88087be37600 R8: f010000000000000 R9: f79fde5390e73e02
      > R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801eb760748
      > R13: ffff88084223bcf8 R14: ffff88086b22d800 R15: 0000000000000c00
      > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
      > #4 [ffff88087be37608] __ldiskfs_handle_dirty_metadata at ffffffffa02ee0bb [ldiskfs]
      > #5 [ffff88087be37648] ldiskfs_quota_write at ffffffffa0324b95 [ldiskfs]
      > #6 [ffff88087be376b8] write_blk at ffffffff811e44ae
      > #7 [ffff88087be376c8] remove_tree at ffffffff811e4da1
      > #8 [ffff88087be37738] remove_tree at ffffffff811e4bf8
      > #9 [ffff88087be377a8] remove_tree at ffffffff811e4bf8
      > #10 [ffff88087be37818] qtree_delete_dquot at ffffffff811e4fe3
      > #11 [ffff88087be37838] qtree_release_dquot at ffffffff811e501f
      > #12 [ffff88087be37848] v2_release_dquot at ffffffff811e3cc0
      > #13 [ffff88087be37858] dquot_release at ffffffff811df8e5
      > #14 [ffff88087be37898] ldiskfs_release_dquot at ffffffffa03235be [ldiskfs]
      > #15 [ffff88087be378b8] dqput at ffffffff811e0489
      > #16 [ffff88087be378e8] dquot_transfer at ffffffff811e3253
      > #17 [ffff88087be379c8] vfs_dq_transfer at ffffffff811dfc0c
      > #18 [ffff88087be379e8] osd_quota_transfer at ffffffffa0ba98a5 [osd_ldiskfs]
      > #19 [ffff88087be37a58] osd_attr_set at ffffffffa0bbcb8a [osd_ldiskfs]
      > #20 [ffff88087be37ab8] dt_attr_set.clone.2 at ffffffffa083a969 [ofd]
      > #21 [ffff88087be37ac8] ofd_attr_set at ffffffffa083e472 [ofd]
      > #22 [ffff88087be37b28] ofd_setattr at ffffffffa082fe68 [ofd]
      > #23 [ffff88087be37bb8] ost_setattr at ffffffffa06461fb [ost]
      > #24 [ffff88087be37c18] ost_handle at ffffffffa06491fd [ost]
      > #25 [ffff88087be37d68] ptlrpc_server_handle_request at ffffffffa06df4d5 [ptlrpc]
      > #26 [ffff88087be37e48] ptlrpc_main at ffffffffa06e083d [ptlrpc]
      > #27 [ffff88087be37ee8] kthread at ffffffff81096136
      > #28 [ffff88087be37f48] kernel_thread at ffffffff8100c0ca
      > #0 [ffff88087be37400] die at ffffffff8100f18b

      Looking in to LU-4611, I think I've found the issue, in osd_declare_xattr_set
      http://review.whamcloud.com/#/c/10407/2/lustre/osd-ldiskfs/osd_handler.c,cm

      	/* optimistic optimization: LMA is set first and usually fit inode */
      	if (strcmp(name, XATTR_NAME_LMA) == 0) {
      		if (dt_object_exists(dt))
      			credits = 0;
      		else
      			credits = 1;
      	} else if (strcmp(name, XATTR_NAME_VERSION) == 0) {
      

      Specifically, the "credits = 0" optimization for XATTR_NAME_LMA. There doesn't appear to be any special handling for this, so I think this is the path that eventually ends at our assertion.

      If this is the problem, my question is, what's the correct number of credits here? Is it just 1, or some other number?

      One other note of concern here.
      This comment in osd_handler.c above where the osd credits are declared says some strange things about quotas, as does the name "...credits_noquota".

      Now that quota accounting is on by default, what does this mean and does it relate to this issue?
      /**

      • Note: we do not count into QUOTA here.
      • If we mount with --data_journal we may need more.
        */
        const int osd_dto_credits_noquota[DTO_NR] = {

      Attachments

        Issue Links

          Activity

            [LU-5250] OSSes with LU-4611: hitting J_ASSERT_JH(jh, handle->h_buffer_credits > 0)

            Niu, is this now a duplicate of LU-5777?

            Yes, it's likely a dup of LU-5777. You need both fix of LU-5777 and LU-5040.

            niu Niu Yawei (Inactive) added a comment - Niu, is this now a duplicate of LU-5777 ? Yes, it's likely a dup of LU-5777 . You need both fix of LU-5777 and LU-5040 .
            haasken Ryan Haasken added a comment -

            Niu, is this now a duplicate of LU-5777?

            haasken Ryan Haasken added a comment - Niu, is this now a duplicate of LU-5777 ?

            Unfortunately, but not unexpectedly, the steps above do not reproduce the bug. Here's what I did on a freshly formatted file system.

            [root@centclient06 ~]# useradd test1
            [root@centclient06 ~]# useradd test2
            [root@centclient06 ~]# cd /mnt/centss08/
            [root@centclient06 centss08]# su test1
            [test1@centclient06 centss08]$ dd of=test1file if=/dev/zero count=16384
            16384+0 records in
            16384+0 records out
            8388608 bytes (8.4 MB) copied, 0.246592 s, 34.0 MB/s
            [test1@centclient06 centss08]$ exit
            exit
            [root@centclient06 centss08]# ls -lh
            total 8.0M
            -rw-rw-r-- 1 test1 test1 8.0M Oct 29 05:17 test1file
            [root@centclient06 centss08]# chown test2:test2 test1file 
            [root@centclient06 centss08]# ls -lh
            total 8.0M
            -rw-rw-r-- 1 test2 test2 8.0M Oct 29 05:17 test1file
            

            There was no OSS crash on this chown.

            [root@centclient06 testdir]# lfs quota -u test1 /mnt/centss06
            Disk quotas for user test1 (uid 501):
                 Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
              /mnt/centss06       0       0       0       -       0       0       0       -
            [root@centclient06 testdir]# lfs quota -u test2 /mnt/centss07
            open /mnt/centss07 failed: No such file or directory (2)
            [root@centclient06 testdir]# lfs quota -u test2 /mnt/centss06
            Disk quotas for user test2 (uid 502):
                 Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
              /mnt/centss06    8196       0       0       -       1       0       0       -
            
            haasken Ryan Haasken added a comment - Unfortunately, but not unexpectedly, the steps above do not reproduce the bug. Here's what I did on a freshly formatted file system. [root@centclient06 ~]# useradd test1 [root@centclient06 ~]# useradd test2 [root@centclient06 ~]# cd /mnt/centss08/ [root@centclient06 centss08]# su test1 [test1@centclient06 centss08]$ dd of=test1file if=/dev/zero count=16384 16384+0 records in 16384+0 records out 8388608 bytes (8.4 MB) copied, 0.246592 s, 34.0 MB/s [test1@centclient06 centss08]$ exit exit [root@centclient06 centss08]# ls -lh total 8.0M -rw-rw-r-- 1 test1 test1 8.0M Oct 29 05:17 test1file [root@centclient06 centss08]# chown test2:test2 test1file [root@centclient06 centss08]# ls -lh total 8.0M -rw-rw-r-- 1 test2 test2 8.0M Oct 29 05:17 test1file There was no OSS crash on this chown. [root@centclient06 testdir]# lfs quota -u test1 /mnt/centss06 Disk quotas for user test1 (uid 501): Filesystem kbytes quota limit grace files quota limit grace /mnt/centss06 0 0 0 - 0 0 0 - [root@centclient06 testdir]# lfs quota -u test2 /mnt/centss07 open /mnt/centss07 failed: No such file or directory (2) [root@centclient06 testdir]# lfs quota -u test2 /mnt/centss06 Disk quotas for user test2 (uid 502): Filesystem kbytes quota limit grace files quota limit grace /mnt/centss06 8196 0 0 - 1 0 0 -

            Niu, your comment seems to suggest a potential sequence of steps to reproduce this problem.

            1. Create a new user so that the user has no files on the Lustre file system. Do not set quota limits for this user.
            2. Use that user to create and write some blocks to a file.
            3. Change the owner/group of the file created in the previous step.

            Does that sound like it should reproduce the problem? I'll try it when I get a chance. Are you still thinking that this bug is different from LU-5040?

            haasken Ryan Haasken added a comment - Niu, your comment seems to suggest a potential sequence of steps to reproduce this problem. Create a new user so that the user has no files on the Lustre file system. Do not set quota limits for this user. Use that user to create and write some blocks to a file. Change the owner/group of the file created in the previous step. Does that sound like it should reproduce the problem? I'll try it when I get a chance. Are you still thinking that this bug is different from LU-5040 ?

            see my comment from LU-5612:

            The stack trace (the crash in dqput path) is indeed different with LU-5040, it probably reveals another problem: When changing owner/group, if the original owner/group has no limits and current inode is the last file for original user/group, the quota entry could be deleted, that requires additional journal credits. This sounds a quite rare comparing with LU-5040, I'll try to work out a fix in LU-5250. Thanks for bringing this to my attention.

            niu Niu Yawei (Inactive) added a comment - see my comment from LU-5612 : The stack trace (the crash in dqput path) is indeed different with LU-5040 , it probably reveals another problem: When changing owner/group, if the original owner/group has no limits and current inode is the last file for original user/group, the quota entry could be deleted, that requires additional journal credits. This sounds a quite rare comparing with LU-5040 , I'll try to work out a fix in LU-5250 . Thanks for bringing this to my attention.
            haasken Ryan Haasken added a comment -

            ofd_attr_set calls ofd_trans_start, which calls dt_declare_record_write, which calls osd_declare_write. The patch for LU-4611 makes significant changes to osd_declare_write. Is there possibly a bug in the credits calculation in that function which causes this assertion failure?

            haasken Ryan Haasken added a comment - ofd_attr_set calls ofd_trans_start, which calls dt_declare_record_write, which calls osd_declare_write. The patch for LU-4611 makes significant changes to osd_declare_write. Is there possibly a bug in the credits calculation in that function which causes this assertion failure?

            LMA is supposed to be fitting the inode body (not external block). ofd_attr_set() calls into osd_declare_attr_set() which counts that inode with credit=1 (a single block). I don't think LMA is an issue here. I'd rather check credits to transfer the quota.

            bzzz Alex Zhuravlev added a comment - LMA is supposed to be fitting the inode body (not external block). ofd_attr_set() calls into osd_declare_attr_set() which counts that inode with credit=1 (a single block). I don't think LMA is an issue here. I'd rather check credits to transfer the quota.

            Ah, it looks like I've misread here. The call from ofd_attr_set is done with XATTR_NAME_FID - XATTR_NAME_LMA seems to be limited to the local_storage case.

            The rest of the information above remains unchanged.

            paf Patrick Farrell (Inactive) added a comment - Ah, it looks like I've misread here. The call from ofd_attr_set is done with XATTR_NAME_FID - XATTR_NAME_LMA seems to be limited to the local_storage case. The rest of the information above remains unchanged.

            I can make a dump of this available if requested.

            paf Patrick Farrell (Inactive) added a comment - I can make a dump of this available if requested.

            People

              wc-triage WC Triage
              paf Patrick Farrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: