Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.4.1
-
None
-
3
-
12616
Description
Getting odd quota errors with the following errors on the mdt. This is a newly created filesystem.
Filesystem volume name: nbp9-MDT0000 Last mounted on: / Filesystem UUID: 4615f09e-ac04-44de-a4d1-b463f280d6da Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery flex_bg dirdata sparse_super large_file huge_file uninit_bg dir_nlink extra_isize quota Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 644251648 Block count: 322122752 Reserved block count: 0 Free blocks: 241421671 Free inodes: 644239829 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 1024 Blocks per group: 16384 Fragments per group: 16384 Inodes per group: 32768 Inode blocks per group: 4096 Flex block group size: 16 Filesystem created: Wed Feb 5 15:31:14 2014 Last mount time: Mon Feb 10 08:45:54 2014 Last write time: Mon Feb 10 08:45:54 2014 Mount count: 5 Maximum mount count: -1 Last checked: Wed Feb 5 15:31:14 2014 Check interval: 0 (<none>) Lifetime writes: 307 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 512 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: f50d0c82-4edf-4e98-94ef-69ed3ad456d0 Journal backup: inode blocks User quota inode: 3 Group quota inode: 4
nbp9-mds /var/log # tunefs.lustre /dev/mapper/nbp9--vg-mdt9 checking for existing Lustre data: found Reading CONFIGS/mountdata Read previous values: Target: nbp9-MDT0000 Index: 0 Lustre FS: nbp9 Mount type: ldiskfs Flags: 0x1 (MDT ) Persistent mount opts: user_xattr,errors=remount-ro Parameters: mgsnode=10.151.26.5@o2ib lov.stripesize=1048576 lov.stripecount=4
Feb 10 07:37:42 nbp9-mds kernel: Lustre: 5858:0:(osd_handler.c:828:osd_trans_start()) nbp9-MDT0000: too many transaction credits (32279 > 25600) Feb 10 07:37:42 nbp9-mds kernel: Lustre: 5858:0:(osd_handler.c:835:osd_trans_start()) create: 170/4250, delete: 0/0, destroy: 0/0 Feb 10 07:37:43 nbp9-mds kernel: Lustre: 5858:0:(osd_handler.c:840:osd_trans_start()) attr_set: 2/2, xattr_set: 172/2395 Feb 10 07:37:43 nbp9-mds kernel: Lustre: 5858:0:(osd_handler.c:847:osd_trans_start()) write: 1523/21322, punch: 338/1352, quota 4/52 Feb 10 07:37:43 nbp9-mds kernel: Lustre: 5858:0:(osd_handler.c:852:osd_trans_start()) insert: 171/2906, delete: 0/0 Feb 10 07:37:43 nbp9-mds kernel: Lustre: 5858:0:(osd_handler.c:857:osd_trans_start()) ref_add: 0/0, ref_del: 0/0 Feb 10 07:37:43 nbp9-mds kernel: Pid: 5858, comm: mdt02_043 Feb 10 07:37:43 nbp9-mds kernel: Feb 10 07:37:43 nbp9-mds kernel: Call Trace: Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa0514895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa0cfc41e>] osd_trans_start+0x65e/0x680 [osd_ldiskfs] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa0e68309>] lod_trans_start+0x1b9/0x250 [lod] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa0773117>] mdd_trans_start+0x17/0x20 [mdd] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa076712a>] mdd_create+0x91a/0x1790 [mdd] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa0d03937>] ? osd_xattr_get+0x97/0x2d0 [osd_ldiskfs] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa0ddf2a2>] mdt_reint_open+0x1362/0x20e0 [mdt] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa053185e>] ? upcall_cache_get_entry+0x28e/0x860 [libcfs] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa07fadcc>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa0dc9981>] mdt_reint_rec+0x41/0xe0 [mdt] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa0daeb03>] mdt_reint_internal+0x4c3/0x780 [mdt] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa0daf090>] mdt_intent_reint+0x1f0/0x530 [mdt] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa0dacf3e>] mdt_intent_policy+0x39e/0x720 [mdt] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa07b2831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa07d91ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa0dad3c6>] mdt_enqueue+0x46/0xe0 [mdt] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa0db3ad7>] mdt_handle_common+0x647/0x16d0 [mdt] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa0ded615>] mds_regular_handle+0x15/0x20 [mdt] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa080b3c8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa05155de>] ? cfs_timer_arm+0xe/0x10 [libcfs] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa0526d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa0802729>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffff81055813>] ? __wake_up+0x53/0x70 Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa080c75e>] ptlrpc_main+0xace/0x1700 [ptlrpc] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa080bc90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20 Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa080bc90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffffa080bc90>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] Feb 10 07:37:43 nbp9-mds kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 Feb 10 07:37:43 nbp9-mds kernel: Feb 10 07:37:43 nbp9-mds kernel: Lustre: 5858:0:(osd_handler.c:828:osd_trans_start()) nbp9-MDT0000: too many transaction credits (32279 > 25600) Feb 10 07:37:43 nbp9-mds kernel: Lustre: 5858:0:(osd_handler.c:835:osd_trans_start()) create: 170/4250, delete: 0/0, destroy: 0/0 Feb 10 07:37:43 nbp9-mds kernel: Lustre: 5858:0:(osd_handler.c:840:osd_trans_start()) attr_set: 2/2, xattr_set: 172/2395 Feb 10 07:37:43 nbp9-mds kernel: Lustre: 5858:0:(osd_handler.c:847:osd_trans_start()) write: 1523/21322, punch: 338/1352, quota 4/52 Feb 10 07:37:43 nbp9-mds kernel: Lustre: 5858:0:(osd_handler.c:852:osd_trans_start()) insert: 171/2906, delete: 0/0 Feb 10 07:37:43 nbp9-mds kernel: Lustre: 5858:0:(osd_handler.c:857:osd_trans_start()) ref_add: 0/0, ref_del: 0/0 Feb 10 07:38:48 nbp9-mds kernel: Lustre: 5791:0:(osd_handler.c:828:osd_trans_start()) nbp9-MDT0000: too many transaction credits (32279 > 25600) Feb 10 07:38:48 nbp9-mds kernel: Lustre: 5791:0:(osd_handler.c:828:osd_trans_start()) Skipped 2 previous similar messages Feb 10 07:38:48 nbp9-mds kernel: Lustre: 5791:0:(osd_handler.c:835:osd_trans_start()) create: 170/4250, delete: 0/0, destroy: 0/0 Feb 10 07:38:48 nbp9-mds kernel: Lustre: 5791:0:(osd_handler.c:835:osd_trans_start()) Skipped 2 previous similar messages Feb 10 07:38:48 nbp9-mds kernel: Lustre: 5791:0:(osd_handler.c:840:osd_trans_start()) attr_set: 2/2, xattr_set: 172/2395 Feb 10 07:38:48 nbp9-mds kernel: Lustre: 5791:0:(osd_handler.c:840:osd_trans_start()) Skipped 2 previous similar messages Feb 10 07:38:48 nbp9-mds kernel: Lustre: 5791:0:(osd_handler.c:847:osd_trans_start()) write: 1523/21322, punch: 338/1352, quota 4/52 Feb 10 07:38:48 nbp9-mds kernel: Lustre: 5791:0:(osd_handler.c:847:osd_trans_start()) Skipped 2 previous similar messages Feb 10 07:38:48 nbp9-mds kernel: Lustre: 5791:0:(osd_handler.c:852:osd_trans_start()) insert: 171/2906, delete: 0/0 Feb 10 07:38:48 nbp9-mds kernel: Lustre: 5791:0:(osd_handler.c:852:osd_trans_start()) Skipped 2 previous similar messages Feb 10 07:38:48 nbp9-mds kernel: Lustre: 5791:0:(osd_handler.c:857:osd_trans_start()) ref_add: 0/0, ref_del: 0/0 Feb 10 07:38:48 nbp9-mds kernel: Lustre: 5791:0:(osd_handler.c:857:osd_trans_start()) Skipped 2 previous similar messages Feb 10 07:43:07 nbp9-mds kernel: Lustre: 5860:0:(osd_handler.c:828:osd_trans_start()) nbp9-MDT0000: too many transaction credits (32279 > 25600) Feb 10 07:43:07 nbp9-mds kernel: Lustre: 5860:0:(osd_handler.c:835:osd_trans_start()) create: 170/4250, delete: 0/0, destroy: 0/0 Feb 10 07:43:07 nbp9-mds kernel: Lustre: 5860:0:(osd_handler.c:840:osd_trans_start()) attr_set: 2/2, xattr_set: 172/2395 Feb 10 07:43:07 nbp9-mds kernel: Lustre: 5860:0:(osd_handler.c:847:osd_trans_start()) write: 1523/21322, punch: 338/1352, quota 4/52 Feb 10 07:43:07 nbp9-mds kernel: Lustre: 5860:0:(osd_handler.c:852:osd_trans_start()) insert: 171/2906, delete: 0/0 Feb 10 07:43:07 nbp9-mds kernel: Lustre: 5860:0:(osd_handler.c:857:osd_trans_start()) ref_add: 0/0, ref_del: 0/0
I don't think we need to keep the full number of threads with the full transaction credits busy. There will always be some fraction of threads that are outside the transaction at a given time. That said, reducing the credits count by any amount will always help to avoid premature transaction commit and checkpoint.
For the llog punch, it surprises me that we would need to reserve extra credits to truncate data just written? At most any llog record could be 2 blocks long, so we shouldn't need more than {2*(bitmap + GDT to free), 1 block to overwrite} + inode = 5 blocks to truncate a single llog record. If we are already reserving this much for the write (assume yes?) why reserve extra for truncate?
At one point, there were some explicit declares added to handle the error cases, but in the new accounting we also allow the "undo" updates for any declared update for error handling purpose. I would hope that means we can get rid of the explicit punch calls entirely? Is it possible to get rid of the other explicit undo declarations as well (ref_del, delete)?