Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4494

sanity-scrub test 4: lustre-MDT0001: too many transaction credits (309 > 256)

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.6.0

    • Lustre branch: master
      MDSCOUNT=4
    • 3
    • 12298

    Description

      While testing patch http://review.whamcloud.com/8523 on master branch with MDSCOUNT=4, the following issues occurred on one of the MDTs during create:

      Lustre: DEBUG MARKER: mkdir -p /mnt/mds4; mount -t lustre -o user_xattr,acl                                /dev/lvm-Role_MDS/P4 /mnt/mds4^M
      LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. quota=on. Opts: ^M
      LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. quota=on. Opts: ^M
      Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u^M
      Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P4 2>/dev/null^M
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20^M
      Lustre: DEBUG MARKER: Using TIMEOUT=20^M
      Lustre: cli-ctl-lustre-MDT0001: Allocated super-sequence [0x0000000440000400-0x0000000480000400):1:mdt]^M
      Lustre: 32030:0:(osd_handler.c:872:osd_trans_start()) lustre-MDT0001: too many transaction credits (309 > 256)
      Lustre: 32030:0:(osd_handler.c:879:osd_trans_start())   create: 2/50, delete: 0/0, destroy: 0/0
      Lustre: 32030:0:(osd_handler.c:884:osd_trans_start())   attr_set: 2/2, xattr_set: 4/43
      Lustre: 32030:0:(osd_handler.c:891:osd_trans_start())   write: 11/154, punch: 2/8, quota 2/2
      Lustre: 32030:0:(osd_handler.c:896:osd_trans_start())   insert: 3/50, delete: 0/0
      Lustre: 32030:0:(osd_handler.c:901:osd_trans_start())   ref_add: 0/0, ref_del: 0/0
      Pid: 32030, comm: mdt00_002
      
      Call Trace:
       [<ffffffffa0903895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa0a95a43>] osd_trans_start+0x653/0x670 [osd_ldiskfs]
       [<ffffffffa0dfc579>] lod_trans_start+0x1b9/0x250 [lod]
       [<ffffffffa0e72317>] mdd_trans_start+0x17/0x20 [mdd]
       [<ffffffffa0e65dd9>] mdd_create+0x929/0x17a0 [mdd]
       [<ffffffffa0d52fa8>] mdo_create+0x18/0x50 [mdt]
       [<ffffffffa0d5d037>] mdt_reint_open+0x12a7/0x2120 [mdt]
       [<ffffffffa0d47a91>] mdt_reint_rec+0x41/0xe0 [mdt]
       [<ffffffffa0d2ddb3>] mdt_reint_internal+0x4c3/0x780 [mdt]
       [<ffffffffa0d2e265>] mdt_intent_reint+0x1f5/0x520 [mdt]
       [<ffffffffa0d2c989>] mdt_intent_policy+0x499/0xca0 [mdt]
       [<ffffffffa0527509>] ldlm_lock_enqueue+0x359/0x920 [ptlrpc]
       [<ffffffffa0550edf>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
       [<ffffffffa05cb492>] tgt_enqueue+0x62/0x1d0 [ptlrpc]
       [<ffffffffa05cdfea>] tgt_handle_request0+0x2ea/0x1490 [ptlrpc]
       [<ffffffffa05cf5ca>] tgt_request_handle+0x43a/0x980 [ptlrpc]
       [<ffffffffa0582725>] ptlrpc_main+0xd25/0x1970 [ptlrpc]
      

      And then the nodes were re-provisioned by autotest.

      Maloo report: https://maloo.whamcloud.com/test_sets/12c1c02a-7e6f-11e3-925a-52540035b04c

      Attachments

        Issue Links

          Activity

            [LU-4494] sanity-scrub test 4: lustre-MDT0001: too many transaction credits (309 > 256)
            adilger Andreas Dilger made changes -
            Resolution New: Duplicate [ 3 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]

            Closing as a duplicate of LU-4611, which shows the problem much more significantly with a large number of stripes.

            adilger Andreas Dilger added a comment - Closing as a duplicate of LU-4611 , which shows the problem much more significantly with a large number of stripes.
            adilger Andreas Dilger made changes -
            Description Original: While testing patch http://review.whamcloud.com/8523 on master branch with MDSCOUNT=4, the following issues occurred on one of the MDTs:

            {noformat}
            Lustre: DEBUG MARKER: mkdir -p /mnt/mds4; mount -t lustre -o user_xattr,acl /dev/lvm-Role_MDS/P4 /mnt/mds4^M
            LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. quota=on. Opts: ^M
            LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. quota=on. Opts: ^M
            Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u^M
            Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P4 2>/dev/null^M
            Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20^M
            Lustre: DEBUG MARKER: Using TIMEOUT=20^M
            Lustre: cli-ctl-lustre-MDT0001: Allocated super-sequence [0x0000000440000400-0x0000000480000400):1:mdt]^M
            Lustre: Skipped 8 previous similar messages^M
            Lustre: 26727:0:(osd_handler.c:872:osd_trans_start()) lustre-MDT0001: too many transaction credits (309 > 256)^M
            Lustre: 26727:0:(osd_handler.c:872:osd_trans_start()) Skipped 665 previous similar messages^M
            Lustre: 26727:0:(osd_handler.c:879:osd_trans_start()) create: 2/50, delete: 0/0, destroy: 0/0^M
            Lustre: 26727:0:(osd_handler.c:879:osd_trans_start()) Skipped 665 previous similar messages^M
            Lustre: 26727:0:(osd_handler.c:884:osd_trans_start()) attr_set: 2/2, xattr_set: 4/43^M
            Lustre: 26727:0:(osd_handler.c:884:osd_trans_start()) Skipped 665 previous similar messages^M
            Lustre: 26727:0:(osd_handler.c:891:osd_trans_start()) write: 11/154, punch: 2/8, quota 2/2^M
            Lustre: 26727:0:(osd_handler.c:891:osd_trans_start()) Skipped 665 previous similar messages^M
            Lustre: 26727:0:(osd_handler.c:896:osd_trans_start()) insert: 3/50, delete: 0/0^M
            Lustre: 26727:0:(osd_handler.c:896:osd_trans_start()) Skipped 665 previous similar messages^M
            Lustre: 26727:0:(osd_handler.c:901:osd_trans_start()) ref_add: 0/0, ref_del: 0/0^M
            Lustre: 26727:0:(osd_handler.c:901:osd_trans_start()) Skipped 665 previous similar messages^M
            SysRq : Show State^M
              task PC stack pid father^M
            init S 0000000000000001 0 1 0 0x00000000^M
             ffff88007e4b5908 0000000000000086 0000000000000000 ffff88007e4b58cc^M
             0000000000000000 ffff88007f823240 ffff880002216740 0000000000000400^M
             ffff88007e4b3ab8 ffff88007e4b5fd8 000000000000fb88 ffff88007e4b3ab8^M
            Call Trace:^M
            ......
            {noformat}

            And then the nodes were re-provisioned by autotest.

            Maloo report: https://maloo.whamcloud.com/test_sets/12c1c02a-7e6f-11e3-925a-52540035b04c
            New: While testing patch http://review.whamcloud.com/8523 on master branch with MDSCOUNT=4, the following issues occurred on one of the MDTs during create:

            {noformat}
            Lustre: DEBUG MARKER: mkdir -p /mnt/mds4; mount -t lustre -o user_xattr,acl /dev/lvm-Role_MDS/P4 /mnt/mds4^M
            LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. quota=on. Opts: ^M
            LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. quota=on. Opts: ^M
            Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u^M
            Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P4 2>/dev/null^M
            Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20^M
            Lustre: DEBUG MARKER: Using TIMEOUT=20^M
            Lustre: cli-ctl-lustre-MDT0001: Allocated super-sequence [0x0000000440000400-0x0000000480000400):1:mdt]^M
            Lustre: 32030:0:(osd_handler.c:872:osd_trans_start()) lustre-MDT0001: too many transaction credits (309 > 256)
            Lustre: 32030:0:(osd_handler.c:879:osd_trans_start()) create: 2/50, delete: 0/0, destroy: 0/0
            Lustre: 32030:0:(osd_handler.c:884:osd_trans_start()) attr_set: 2/2, xattr_set: 4/43
            Lustre: 32030:0:(osd_handler.c:891:osd_trans_start()) write: 11/154, punch: 2/8, quota 2/2
            Lustre: 32030:0:(osd_handler.c:896:osd_trans_start()) insert: 3/50, delete: 0/0
            Lustre: 32030:0:(osd_handler.c:901:osd_trans_start()) ref_add: 0/0, ref_del: 0/0
            Pid: 32030, comm: mdt00_002

            Call Trace:
             [<ffffffffa0903895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
             [<ffffffffa0a95a43>] osd_trans_start+0x653/0x670 [osd_ldiskfs]
             [<ffffffffa0dfc579>] lod_trans_start+0x1b9/0x250 [lod]
             [<ffffffffa0e72317>] mdd_trans_start+0x17/0x20 [mdd]
             [<ffffffffa0e65dd9>] mdd_create+0x929/0x17a0 [mdd]
             [<ffffffffa0d52fa8>] mdo_create+0x18/0x50 [mdt]
             [<ffffffffa0d5d037>] mdt_reint_open+0x12a7/0x2120 [mdt]
             [<ffffffffa0d47a91>] mdt_reint_rec+0x41/0xe0 [mdt]
             [<ffffffffa0d2ddb3>] mdt_reint_internal+0x4c3/0x780 [mdt]
             [<ffffffffa0d2e265>] mdt_intent_reint+0x1f5/0x520 [mdt]
             [<ffffffffa0d2c989>] mdt_intent_policy+0x499/0xca0 [mdt]
             [<ffffffffa0527509>] ldlm_lock_enqueue+0x359/0x920 [ptlrpc]
             [<ffffffffa0550edf>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
             [<ffffffffa05cb492>] tgt_enqueue+0x62/0x1d0 [ptlrpc]
             [<ffffffffa05cdfea>] tgt_handle_request0+0x2ea/0x1490 [ptlrpc]
             [<ffffffffa05cf5ca>] tgt_request_handle+0x43a/0x980 [ptlrpc]
             [<ffffffffa0582725>] ptlrpc_main+0xd25/0x1970 [ptlrpc]
            {noformat}

            And then the nodes were re-provisioned by autotest.

            Maloo report: https://maloo.whamcloud.com/test_sets/12c1c02a-7e6f-11e3-925a-52540035b04c
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-4611 [ LU-4611 ]
            green Oleg Drokin added a comment -

            Well, it almost looks like one of the filesystems was formatted with 4M journal size (256 blocks max transactions).
            So then when you try to do a create it fails.

            Could it just be that mdscount=4 overwhelms the journal size by doing a larger transaction than in normal case for some sort of a special purpose small filesystem?
            If so, this woul likely be visible on any test run run in such conditions and we probably can do nothing about it other than changing the fs journal size used for this test

            green Oleg Drokin added a comment - Well, it almost looks like one of the filesystems was formatted with 4M journal size (256 blocks max transactions). So then when you try to do a create it fails. Could it just be that mdscount=4 overwhelms the journal size by doing a larger transaction than in normal case for some sort of a special purpose small filesystem? If so, this woul likely be visible on any test run run in such conditions and we probably can do nothing about it other than changing the fs journal size used for this test
            yujian Jian Yu created issue -

            People

              wc-triage WC Triage
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: