Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7045

NOT enough transaction credits for osp_write_local_file()

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.8.0
    • Lustre 2.8.0
    • None
    • RHEL 7.1 client and server
    • 3
    • 9223372036854775807

    Description

      Test Lustre master branch on RHEL 7.1 client and server, conf-sanity test 81 hung and syslog on MDS showed that:

      Jun 10 16:28:25 shadow-15vm8 kernel: ------------[ cut here ]------------
      Jun 10 16:28:25 shadow-15vm8 kernel: WARNING: at /var/lib/jenkins/workspace/lustre-master/arch/x86_64/build_type/server/distro/el7/ib_stack/inkernel/BUILD/BUILD/lustre-2.7.54/ldiskfs/ext4_jbd2.c:260 __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]()
      Jun 10 16:28:25 shadow-15vm8 kernel: Modules linked in: osp(OF) mdd(OF) lod(OF) mdt(OF) lfsck(OF) mgs(OF) mgc(OF) osd_ldiskfs(OF) lquota(OF) fid(OF) fld(OF) ksocklnd(OF) ptlrpc(OF) obdclass(OF) lnet(OF) libcfs(OF) ldiskfs(OF) sha512_generic dm_mod nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd fscache xprtrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ppdev pcspkr serio_raw parport_pc virtio_balloon i2c_piix4 parport ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper ttm 8139too virtio_pci 8139cp virtio_ring virtio mii drm ata_piix libata i2c_core floppy [last unloaded: libcfs]
      Jun 10 16:28:25 shadow-15vm8 kernel: 
      Jun 10 16:28:25 shadow-15vm8 kernel: CPU: 1 PID: 18882 Comm: llog_process_th Tainted: GF       W  O--------------   3.10.0-229.4.2.el7_lustre.g1fee634.x86_64 #1
      Jun 10 16:28:25 shadow-15vm8 kernel: Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      Jun 10 16:28:25 shadow-15vm8 kernel: 0000000000000000 000000005e4d00bc ffff88006d6a77f0 ffffffff816050da
      Jun 10 16:28:25 shadow-15vm8 kernel: ffff88006d6a7828 ffffffff8106e34b ffff88000083fc98 ffff88007b5877e0
      Jun 10 16:28:25 shadow-15vm8 kernel: ffff88005c362708 ffffffffa062745c 0000000000000325 ffff88006d6a7838
      Jun 10 16:28:25 shadow-15vm8 kernel: Call Trace:
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff816050da>] dump_stack+0x19/0x1b
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff8106e34b>] warn_slowpath_common+0x6b/0xb0
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff8106e49a>] warn_slowpath_null+0x1a/0x20
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa05d26b2>] __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa05e30c1>] ldiskfs_getblk+0x131/0x200 [ldiskfs]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa05e31b7>] ldiskfs_bread+0x27/0xc0 [ldiskfs]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0bca919>] osd_ldiskfs_write_record+0x169/0x350 [osd_ldiskfs]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0bcabdf>] osd_write+0xdf/0x200 [osd_ldiskfs]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0765399>] dt_record_write+0x39/0x120 [obdclass]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0eed261>] osp_write_local_file.isra.16+0x284/0x30c [osp]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0eed7ca>] osp_last_used_init+0x1e7/0xaa6 [osp]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0ecef84>] osp_init0.isra.17+0x1b44/0x2060 [osp]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0ecf549>] osp_device_alloc+0xa9/0x1a0 [osp]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa074b6c4>] obd_setup+0x114/0x2a0 [obdclass]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa074f430>] class_setup+0x2a0/0x820 [obdclass]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0753b0c>] class_process_config+0x224c/0x2e50 [obdclass]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0571967>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0754f6f>] class_config_llog_handler+0x85f/0x16e0 [obdclass]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa071945a>] llog_process_thread+0x7aa/0xe90 [obdclass]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa071a4f0>] ? llog_backup+0x500/0x500 [obdclass]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa071a53c>] llog_process_thread_daemonize+0x4c/0x80 [obdclass]
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff8109739f>] kthread+0xcf/0xe0
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0
      Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
      Jun 10 16:28:25 shadow-15vm8 kernel: ---[ end trace ddd376db706c5663 ]---
      Jun 10 16:28:25 shadow-15vm8 kernel: LDISKFS-fs: ldiskfs_getblk:805: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata
      Jun 10 16:28:25 shadow-15vm8 kernel: LDISKFS-fs error (device dm-0): ldiskfs_getblk:805: inode #132: block 16827: comm llog_process_th: journal_dirty_metadata failed: handle type 0 started at line 967, credits 6/0, errcode -28
      Jun 10 16:28:25 shadow-15vm8 kernel: Aborting journal on device dm-0-8.
      Jun 10 16:28:25 shadow-15vm8 kernel: LDISKFS-fs (dm-0): Remounting filesystem read-only
      Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18882:0:(osd_io.c:1645:osd_ldiskfs_write_record()) dm-0: error reading offset 84152 (block 20): rc = -28
      Jun 10 16:28:25 shadow-15vm8 kernel: LDISKFS-fs error (device dm-0) in osd_trans_stop:1052: error 28
      Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18458:0:(osd_handler.c:829:osd_trans_commit_cb()) transaction @0xffff8800700b43c0 commit error: 2
      Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18882:0:(osd_handler.c:1055:osd_trans_stop()) lustre-MDT0000-osd: failed to stop transaction: rc = -28
      Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18882:0:(osp_dev.c:283:osp_init_last_objid()) lustre-OST2917-osc-MDT0000: can't initialize lov_objid: rc = -28
      Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18882:0:(osp_dev.c:369:osp_last_used_init()) lustre-OST2917-osc-MDT0000: Can not get ids -28 from old objid!
      Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18882:0:(obd_config.c:558:class_setup()) setup lustre-OST2917-osc-MDT0000 failed (-28)
      Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18882:0:(obd_config.c:1601:class_config_llog_handler()) MGC10.1.4.179@tcp: cfg command failed: rc = -28
      Jun 10 16:28:25 shadow-15vm8 kernel: Lustre:    cmd=cf003 0:lustre-OST2917-osc-MDT0000  1:lustre-OST2917_UUID  2:10.1.4.178@tcp  
      
      Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18477:0:(mgc_request.c:527:do_requeue()) failed processing log: -28
      

      The Maloo instances:
      https://testing.hpdd.intel.com/sub_tests/ebb78d16-1044-11e5-8fc4-5254006e85c2

      Attachments

        Issue Links

          Activity

            [LU-7045] NOT enough transaction credits for osp_write_local_file()

            Landed for 2.8

            jgmitter Joseph Gmitter (Inactive) added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16330/
            Subject: LU-7045 osd: enough credits for single indirect block write
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 85c6c099d83b19480dd4160de57e7ffac5b312af

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16330/ Subject: LU-7045 osd: enough credits for single indirect block write Project: fs/lustre-release Branch: master Current Patch Set: Commit: 85c6c099d83b19480dd4160de57e7ffac5b312af

            I mean that the non-extents block allocation method. In our case, the patch fixes the improper logic that was used for optimising single indirect block.

            yong.fan nasf (Inactive) added a comment - I mean that the non-extents block allocation method. In our case, the patch fixes the improper logic that was used for optimising single indirect block.

            triple indirect? the patch changes single-indirect logic only.

            bzzz Alex Zhuravlev added a comment - triple indirect? the patch changes single-indirect logic only.

            Debug patch shows that when the issue happened, the "extents" feature is disabled on the MDT. So the block allocation follows the traditional triple indirect blocks mechanism. The patch http://review.whamcloud.com/16330 should have fixed the issue.

            yong.fan nasf (Inactive) added a comment - Debug patch shows that when the issue happened, the "extents" feature is disabled on the MDT. So the block allocation follows the traditional triple indirect blocks mechanism. The patch http://review.whamcloud.com/16330 should have fixed the issue.

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/16330
            Subject: LU-7045 osd: enough credits for single indirect block write
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 95e2c1d58959af73dd6c08357e398f5593d7d4aa

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/16330 Subject: LU-7045 osd: enough credits for single indirect block write Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 95e2c1d58959af73dd6c08357e398f5593d7d4aa
            yujian Jian Yu added a comment -

            Hi Nasf,

            Could you please create a patch for master branch? Thank you.

            yujian Jian Yu added a comment - Hi Nasf, Could you please create a patch for master branch? Thank you.

            I'd rather think it was osd_calc_bkmap_credits() returning (1 + 2), then osd_declare_write() adding 1, so 4 and the remaining 2 came from osd_declare_inode_qid()

            bzzz Alex Zhuravlev added a comment - I'd rather think it was osd_calc_bkmap_credits() returning (1 + 2), then osd_declare_write() adding 1, so 4 and the remaining 2 came from osd_declare_inode_qid()
            yong.fan nasf (Inactive) added a comment - - edited

            I am also confused considering the ldiskfs_new_inode(). But if "extents" was not enabled, how to explain the declared credits is "6", not "10"?

            yong.fan nasf (Inactive) added a comment - - edited I am also confused considering the ldiskfs_new_inode(). But if "extents" was not enabled, how to explain the declared credits is "6", not "10"?
            bzzz Alex Zhuravlev added a comment - - edited

            ldiskfs_new_inode():

            if (LDISKFS_HAS_INCOMPAT_FEATURE(sb, LDISKFS_FEATURE_INCOMPAT_EXTENTS)) {
            /* set extent flag only for directory, file and normal symlink*/
            if (S_ISDIR(mode) || S_ISREG(mode) || S_ISLNK(mode)) {
            ldiskfs_set_inode_flag(inode, LDISKFS_INODE_EXTENTS);

            bzzz Alex Zhuravlev added a comment - - edited ldiskfs_new_inode(): if (LDISKFS_HAS_INCOMPAT_FEATURE(sb, LDISKFS_FEATURE_INCOMPAT_EXTENTS)) { /* set extent flag only for directory, file and normal symlink*/ if (S_ISDIR(mode) || S_ISREG(mode) || S_ISLNK(mode)) { ldiskfs_set_inode_flag(inode, LDISKFS_INODE_EXTENTS);

            People

              yong.fan nasf (Inactive)
              yong.fan nasf (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: