Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7045

NOT enough transaction credits for osp_write_local_file()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.8.0
    • Lustre 2.8.0
    • None
    • RHEL 7.1 client and server
    • 3
    • 9223372036854775807

    Description

      Test Lustre master branch on RHEL 7.1 client and server, conf-sanity test 81 hung and syslog on MDS showed that:

      Jun 10 16:28:25 shadow-15vm8 kernel: ------------[ cut here ]------------
      Jun 10 16:28:25 shadow-15vm8 kernel: WARNING: at /var/lib/jenkins/workspace/lustre-master/arch/x86_64/build_type/server/distro/el7/ib_stack/inkernel/BUILD/BUILD/lustre-2.7.54/ldiskfs/ext4_jbd2.c:260 __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]()
      Jun 10 16:28:25 shadow-15vm8 kernel: Modules linked in: osp(OF) mdd(OF) lod(OF) mdt(OF) lfsck(OF) mgs(OF) mgc(OF) osd_ldiskfs(OF) lquota(OF) fid(OF) fld(OF) ksocklnd(OF) ptlrpc(OF) obdclass(OF) lnet(OF) libcfs(OF) ldiskfs(OF) sha512_generic dm_mod nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd fscache xprtrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ppdev pcspkr serio_raw parport_pc virtio_balloon i2c_piix4 parport ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper ttm 8139too virtio_pci 8139cp virtio_ring virtio mii drm ata_piix libata i2c_core floppy [last unloaded: libcfs]
      Jun 10 16:28:25 shadow-15vm8 kernel: 
      Jun 10 16:28:25 shadow-15vm8 kernel: CPU: 1 PID: 18882 Comm: llog_process_th Tainted: GF       W  O--------------   3.10.0-229.4.2.el7_lustre.g1fee634.x86_64 #1
      Jun 10 16:28:25 shadow-15vm8 kernel: Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      Jun 10 16:28:25 shadow-15vm8 kernel: 0000000000000000 000000005e4d00bc ffff88006d6a77f0 ffffffff816050da
      Jun 10 16:28:25 shadow-15vm8 kernel: ffff88006d6a7828 ffffffff8106e34b ffff88000083fc98 ffff88007b5877e0
      Jun 10 16:28:25 shadow-15vm8 kernel: ffff88005c362708 ffffffffa062745c 0000000000000325 ffff88006d6a7838
      Jun 10 16:28:25 shadow-15vm8 kernel: Call Trace:
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff816050da>] dump_stack+0x19/0x1b
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff8106e34b>] warn_slowpath_common+0x6b/0xb0
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff8106e49a>] warn_slowpath_null+0x1a/0x20
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa05d26b2>] __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa05e30c1>] ldiskfs_getblk+0x131/0x200 [ldiskfs]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa05e31b7>] ldiskfs_bread+0x27/0xc0 [ldiskfs]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0bca919>] osd_ldiskfs_write_record+0x169/0x350 [osd_ldiskfs]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0bcabdf>] osd_write+0xdf/0x200 [osd_ldiskfs]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0765399>] dt_record_write+0x39/0x120 [obdclass]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0eed261>] osp_write_local_file.isra.16+0x284/0x30c [osp]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0eed7ca>] osp_last_used_init+0x1e7/0xaa6 [osp]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0ecef84>] osp_init0.isra.17+0x1b44/0x2060 [osp]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0ecf549>] osp_device_alloc+0xa9/0x1a0 [osp]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa074b6c4>] obd_setup+0x114/0x2a0 [obdclass]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa074f430>] class_setup+0x2a0/0x820 [obdclass]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0753b0c>] class_process_config+0x224c/0x2e50 [obdclass]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0571967>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0754f6f>] class_config_llog_handler+0x85f/0x16e0 [obdclass]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa071945a>] llog_process_thread+0x7aa/0xe90 [obdclass]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa071a4f0>] ? llog_backup+0x500/0x500 [obdclass]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa071a53c>] llog_process_thread_daemonize+0x4c/0x80 [obdclass]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff8109739f>] kthread+0xcf/0xe0
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
      Jun 10 16:28:25 shadow-15vm8 kernel: ---[ end trace ddd376db706c5663 ]---
      Jun 10 16:28:25 shadow-15vm8 kernel: LDISKFS-fs: ldiskfs_getblk:805: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata
      Jun 10 16:28:25 shadow-15vm8 kernel: LDISKFS-fs error (device dm-0): ldiskfs_getblk:805: inode #132: block 16827: comm llog_process_th: journal_dirty_metadata failed: handle type 0 started at line 967, credits 6/0, errcode -28
      Jun 10 16:28:25 shadow-15vm8 kernel: Aborting journal on device dm-0-8.
      Jun 10 16:28:25 shadow-15vm8 kernel: LDISKFS-fs (dm-0): Remounting filesystem read-only
      Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18882:0:(osd_io.c:1645:osd_ldiskfs_write_record()) dm-0: error reading offset 84152 (block 20): rc = -28
      Jun 10 16:28:25 shadow-15vm8 kernel: LDISKFS-fs error (device dm-0) in osd_trans_stop:1052: error 28
      Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18458:0:(osd_handler.c:829:osd_trans_commit_cb()) transaction @0xffff8800700b43c0 commit error: 2
      Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18882:0:(osd_handler.c:1055:osd_trans_stop()) lustre-MDT0000-osd: failed to stop transaction: rc = -28
      Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18882:0:(osp_dev.c:283:osp_init_last_objid()) lustre-OST2917-osc-MDT0000: can't initialize lov_objid: rc = -28
      Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18882:0:(osp_dev.c:369:osp_last_used_init()) lustre-OST2917-osc-MDT0000: Can not get ids -28 from old objid!
      Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18882:0:(obd_config.c:558:class_setup()) setup lustre-OST2917-osc-MDT0000 failed (-28)
      Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18882:0:(obd_config.c:1601:class_config_llog_handler()) MGC10.1.4.179@tcp: cfg command failed: rc = -28
      Jun 10 16:28:25 shadow-15vm8 kernel: Lustre:    cmd=cf003 0:lustre-OST2917-osc-MDT0000  1:lustre-OST2917_UUID  2:10.1.4.178@tcp  
      
      Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18477:0:(mgc_request.c:527:do_requeue()) failed processing log: -28
      

      The Maloo instances:
      https://testing.hpdd.intel.com/sub_tests/ebb78d16-1044-11e5-8fc4-5254006e85c2

      Attachments

        Issue Links

          Activity

            People

              yong.fan nasf (Inactive)
              yong.fan nasf (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: