[LU-7045] NOT enough transaction credits for osp_write_local_file() Created: 26/Aug/15 Updated: 14/Oct/15 Resolved: 14/Oct/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | nasf (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RHEL 7.1 client and server |
||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Test Lustre master branch on RHEL 7.1 client and server, conf-sanity test 81 hung and syslog on MDS showed that: Jun 10 16:28:25 shadow-15vm8 kernel: ------------[ cut here ]------------ Jun 10 16:28:25 shadow-15vm8 kernel: WARNING: at /var/lib/jenkins/workspace/lustre-master/arch/x86_64/build_type/server/distro/el7/ib_stack/inkernel/BUILD/BUILD/lustre-2.7.54/ldiskfs/ext4_jbd2.c:260 __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]() Jun 10 16:28:25 shadow-15vm8 kernel: Modules linked in: osp(OF) mdd(OF) lod(OF) mdt(OF) lfsck(OF) mgs(OF) mgc(OF) osd_ldiskfs(OF) lquota(OF) fid(OF) fld(OF) ksocklnd(OF) ptlrpc(OF) obdclass(OF) lnet(OF) libcfs(OF) ldiskfs(OF) sha512_generic dm_mod nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd fscache xprtrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ppdev pcspkr serio_raw parport_pc virtio_balloon i2c_piix4 parport ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper ttm 8139too virtio_pci 8139cp virtio_ring virtio mii drm ata_piix libata i2c_core floppy [last unloaded: libcfs] Jun 10 16:28:25 shadow-15vm8 kernel: Jun 10 16:28:25 shadow-15vm8 kernel: CPU: 1 PID: 18882 Comm: llog_process_th Tainted: GF W O-------------- 3.10.0-229.4.2.el7_lustre.g1fee634.x86_64 #1 Jun 10 16:28:25 shadow-15vm8 kernel: Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 Jun 10 16:28:25 shadow-15vm8 kernel: 0000000000000000 000000005e4d00bc ffff88006d6a77f0 ffffffff816050da Jun 10 16:28:25 shadow-15vm8 kernel: ffff88006d6a7828 ffffffff8106e34b ffff88000083fc98 ffff88007b5877e0 Jun 10 16:28:25 shadow-15vm8 kernel: ffff88005c362708 ffffffffa062745c 0000000000000325 ffff88006d6a7838 Jun 10 16:28:25 shadow-15vm8 kernel: Call Trace: Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff816050da>] dump_stack+0x19/0x1b Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff8106e34b>] warn_slowpath_common+0x6b/0xb0 Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff8106e49a>] warn_slowpath_null+0x1a/0x20 Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa05d26b2>] __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs] Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa05e30c1>] ldiskfs_getblk+0x131/0x200 [ldiskfs] Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa05e31b7>] ldiskfs_bread+0x27/0xc0 [ldiskfs] Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0bca919>] osd_ldiskfs_write_record+0x169/0x350 [osd_ldiskfs] Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0bcabdf>] osd_write+0xdf/0x200 [osd_ldiskfs] Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0765399>] dt_record_write+0x39/0x120 [obdclass] Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0eed261>] osp_write_local_file.isra.16+0x284/0x30c [osp] Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0eed7ca>] osp_last_used_init+0x1e7/0xaa6 [osp] Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0ecef84>] osp_init0.isra.17+0x1b44/0x2060 [osp] Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0ecf549>] osp_device_alloc+0xa9/0x1a0 [osp] Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa074b6c4>] obd_setup+0x114/0x2a0 [obdclass] Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa074f430>] class_setup+0x2a0/0x820 [obdclass] Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0753b0c>] class_process_config+0x224c/0x2e50 [obdclass] Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0571967>] ? libcfs_debug_msg+0x57/0x80 [libcfs] Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa0754f6f>] class_config_llog_handler+0x85f/0x16e0 [obdclass] Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa071945a>] llog_process_thread+0x7aa/0xe90 [obdclass] Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa071a4f0>] ? llog_backup+0x500/0x500 [obdclass] Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffffa071a53c>] llog_process_thread_daemonize+0x4c/0x80 [obdclass] Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff8109739f>] kthread+0xcf/0xe0 Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140 Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0 Jun 10 16:28:25 shadow-15vm8 kernel: [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140 Jun 10 16:28:25 shadow-15vm8 kernel: ---[ end trace ddd376db706c5663 ]--- Jun 10 16:28:25 shadow-15vm8 kernel: LDISKFS-fs: ldiskfs_getblk:805: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata Jun 10 16:28:25 shadow-15vm8 kernel: LDISKFS-fs error (device dm-0): ldiskfs_getblk:805: inode #132: block 16827: comm llog_process_th: journal_dirty_metadata failed: handle type 0 started at line 967, credits 6/0, errcode -28 Jun 10 16:28:25 shadow-15vm8 kernel: Aborting journal on device dm-0-8. Jun 10 16:28:25 shadow-15vm8 kernel: LDISKFS-fs (dm-0): Remounting filesystem read-only Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18882:0:(osd_io.c:1645:osd_ldiskfs_write_record()) dm-0: error reading offset 84152 (block 20): rc = -28 Jun 10 16:28:25 shadow-15vm8 kernel: LDISKFS-fs error (device dm-0) in osd_trans_stop:1052: error 28 Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18458:0:(osd_handler.c:829:osd_trans_commit_cb()) transaction @0xffff8800700b43c0 commit error: 2 Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18882:0:(osd_handler.c:1055:osd_trans_stop()) lustre-MDT0000-osd: failed to stop transaction: rc = -28 Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18882:0:(osp_dev.c:283:osp_init_last_objid()) lustre-OST2917-osc-MDT0000: can't initialize lov_objid: rc = -28 Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18882:0:(osp_dev.c:369:osp_last_used_init()) lustre-OST2917-osc-MDT0000: Can not get ids -28 from old objid! Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18882:0:(obd_config.c:558:class_setup()) setup lustre-OST2917-osc-MDT0000 failed (-28) Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18882:0:(obd_config.c:1601:class_config_llog_handler()) MGC10.1.4.179@tcp: cfg command failed: rc = -28 Jun 10 16:28:25 shadow-15vm8 kernel: Lustre: cmd=cf003 0:lustre-OST2917-osc-MDT0000 1:lustre-OST2917_UUID 2:10.1.4.178@tcp Jun 10 16:28:25 shadow-15vm8 kernel: LustreError: 18477:0:(mgc_request.c:527:do_requeue()) failed processing log: -28 The Maloo instances: |
| Comments |
| Comment by nasf (Inactive) [ 08/Sep/15 ] |
|
Here are some failure instances: LDISKFS-fs error (device dm-0): ldiskfs_getblk:805: inode #132: block 16827: comm llog_process_th: journal_dirty_metadata failed: handle type 0 started at line 967, credits 6/0, errcode -28 LDISKFS-fs error (device dm-0): ldiskfs_getblk:810: inode #123: block 17157: comm llog_process_th: journal_dirty_metadata failed: handle type 0 started at line 1083, credits 6/0, errcode -28 LDISKFS-fs error (device dm-0): ldiskfs_getblk:810: inode #123: block 17148: comm llog_process_th: journal_dirty_metadata failed: handle type 0 started at line 1083, credits 6/0, errcode -28 LDISKFS-fs error (device dm-0): ldiskfs_getblk:810: inode #133: block 16800: comm llog_process_th: journal_dirty_metadata failed: handle type 0 started at line 1083, credits 6/0, errcode -28 The directly reason for the failure is that when the the ldiskfs_bread() wanted to allocate new block for osp_write_local_file(), it found that all the declared 6 credits have been exhausted. static ssize_t osd_declare_write(const struct lu_env *env, struct dt_object *dt, const struct lu_buf *buf, loff_t _pos, struct thandle *handle) { ... if (osd_extents_enabled(sb, inode)) { /* * many concurrent threads may grow tree by the time * our transaction starts. so, consider 2 is a min depth * for every level we may need to allocate a new block * and take some entries from the old one. so, 3 blocks * to allocate (bitmap, gd, itself) + old block - 4 per * level. */ depth = inode != NULL ? ext_depth(inode) : 0; depth = max(depth, 1) + 1; credits = depth; /* if not append, then split may need to modify * existing blocks moving entries into the new ones */ if (_pos == -1) credits += depth; /* blocks to store data: bitmap,gd,itself */ credits += blocks * 3; } else { credits = osd_calc_bkmap_credits(sb, inode, size, _pos, blocks); } /* if inode is created as part of the transaction, * then it's counted already by the creation method */ if (inode != NULL) credits++; ... } Generally, for OSP on the MDT, we will NOT enable "extents" feature, so the logic path should be: osd_declare_write() => osd_calc_bkmap_credits(). According to the block# (to be modified), we can know the write offset, and then we can know how the osd_declare_write() will prepare the transaction credits: it should be (1 + 2) * 3 + 1 = 10. But according to the failure log, the declared credits is 6, such value was calculated via the logic path with "extents" feature enable! That is conflict with our former conclusion for the OSP on the MDT. So I suspect that the "extents" feature has been enabled by wrong for some unknown reason. I will make a patch to verify that. |
| Comment by Alex Zhuravlev [ 08/Sep/15 ] |
|
whether extents feature is enabled on MDT or not shouldn't make credits calculation wrong. this logic is per-inode for the reason. |
| Comment by nasf (Inactive) [ 08/Sep/15 ] |
|
If the "extents" is disabled, then the credits for osp_write_local_file() at the block 16xxx (or 17xxx) should be 10, otherwise, the credits will be 6. According to the error message, it is 6, so the "extents" feature has been enabled when declare the credits. But on the MDT, it should NOT. What I suspect is that: from the inode view, the "extents" is NOT enabled, so when allocate the block, it used traditional triple-indirect blocks mechanism; but from the super block view, it is enabled, so when declare the credits, it used "extents" mode. Such difference will cause the credits trouble. Currently, I cannot find which code will cause such confused logic, but I will make patch to verify whether such case exists or not. |
| Comment by Alex Zhuravlev [ 08/Sep/15 ] |
|
extents is per-inode and once inode is allocated osd_extents_enabled() uses that specific inode to learn the exact method: if (inode != NULL) { if (LDISKFS_I(inode)->i_flags & LDISKFS_EXTENTS_FL) return 1; } else if (LDISKFS_HAS_INCOMPAT_FEATURE(sb, LDISKFS_FEATURE_INCOMPAT_EXTENTS)) { return 1; } |
| Comment by nasf (Inactive) [ 08/Sep/15 ] |
|
But we do not know whether the "inode" is NULL or not when declare. It can be NULL. |
| Comment by Alex Zhuravlev [ 08/Sep/15 ] |
|
ldiskfs_new_inode():
|
| Comment by nasf (Inactive) [ 08/Sep/15 ] |
|
I am also confused considering the ldiskfs_new_inode(). But if "extents" was not enabled, how to explain the declared credits is "6", not "10"? |
| Comment by Alex Zhuravlev [ 08/Sep/15 ] |
|
I'd rather think it was osd_calc_bkmap_credits() returning (1 + 2), then osd_declare_write() adding 1, so 4 and the remaining 2 came from osd_declare_inode_qid() |
| Comment by Jian Yu [ 09/Sep/15 ] |
|
Hi Nasf, Could you please create a patch for master branch? Thank you. |
| Comment by Gerrit Updater [ 09/Sep/15 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/16330 |
| Comment by nasf (Inactive) [ 10/Sep/15 ] |
|
Debug patch shows that when the issue happened, the "extents" feature is disabled on the MDT. So the block allocation follows the traditional triple indirect blocks mechanism. The patch http://review.whamcloud.com/16330 should have fixed the issue. |
| Comment by Alex Zhuravlev [ 10/Sep/15 ] |
|
triple indirect? the patch changes single-indirect logic only. |
| Comment by nasf (Inactive) [ 10/Sep/15 ] |
|
I mean that the non-extents block allocation method. In our case, the patch fixes the improper logic that was used for optimising single indirect block. |
| Comment by Gerrit Updater [ 14/Oct/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16330/ |
| Comment by Joseph Gmitter (Inactive) [ 14/Oct/15 ] |
|
Landed for 2.8 |