Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4798

Too many transaction credits (28288 > 25600)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.4.2
    • None
    • 3
    • 13203

    Description

      Hi,

      When working with files striped across a large number of OSTs, CEA can see messages like:

      Lustre: 12572:0:(osd_handler.c:833:osd_trans_start()) scratch3-MDT0000: too many transaction credits (28288 > 25600)
      Lustre: 12572:0:(osd_handler.c:840:osd_trans_start()) create: 160/4000, delete: 2/35, destroy: 1/25
      Lustre: 12572:0:(osd_handler.c:845:osd_trans_start()) attr_set: 2/2, xattr_set: 161/2254
      Lustre: 12572:0:(osd_handler.c:852:osd_trans_start()) write: 1282/17948, punch: 320/1280, quota 4/4
      Lustre: 12572:0:(osd_handler.c:857:osd_trans_start()) insert: 161/2736, delete: 1/25
      Lustre: 12572:0:(osd_handler.c:862:osd_trans_start()) ref_add: 1/1, ref_del: 3/3
      Pid: 12572, comm: mdt01_005
      
      Call Trace:
       [<ffffffffa041b895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa0c3b31e>] osd_trans_start+0x65e/0x680 [osd_ldiskfs]
       [<ffffffffa0d3e309>] lod_trans_start+0x1b9/0x250 [lod]
       [<ffffffffa08d6357>] mdd_trans_start+0x17/0x20 [mdd]
       [<ffffffffa08cb3be>] mdd_unlink+0x41e/0xe30 [mdd]
       [<ffffffffa0cc2da8>] mdo_unlink+0x18/0x50 [mdt]
       [<ffffffffa0cc6280>] mdt_reint_unlink+0x820/0x1010 [mdt]
       [<ffffffffa0cc2aa1>] mdt_reint_rec+0x41/0xe0 [mdt]
       [<ffffffffa0ca7c73>] mdt_reint_internal+0x4c3/0x780 [mdt]
       [<ffffffffa0ca7f74>] mdt_reint+0x44/0xe0 [mdt]
       [<ffffffffa0cacc27>] mdt_handle_common+0x647/0x16d0 [mdt]
       [<ffffffffa0788b9c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
       [<ffffffffa0ce6835>] mds_regular_handle+0x15/0x20 [mdt]
       [<ffffffffa07983b8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
       [<ffffffffa041c5de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
       [<ffffffffa042dd9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
       [<ffffffffa078f719>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
       [<ffffffff81058bd3>] ? __wake_up+0x53/0x70
       [<ffffffffa079974e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
       [<ffffffffa0798c80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffffa0798c80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
       [<ffffffffa0798c80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      This issue seems very similar to LU-4611, for which we would need a backport in b2_4 when a proper fix is identified.

      Thanks,
      Sebastien.

      Attachments

        Issue Links

          Activity

            People

              niu Niu Yawei (Inactive)
              sebastien.buisson Sebastien Buisson (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: