[LU-6722] sanity-lfsck test_1a: FAIL: (3) Fail to start LFSCK for namespace! Created: 15/Jun/15  Updated: 21/Apr/17  Resolved: 26/Aug/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0, Lustre 2.9.0

Type: Bug Priority: Critical
Reporter: Jian Yu Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Lustre Build: https://build.hpdd.intel.com/job/lustre-master/3059
Distro/Arch: RHEL7.1/x86_64


Issue Links:
Duplicate
is duplicated by LU-6726 ost-pools test 18 hung Resolved
is duplicated by LU-6435 ldiskfs bug in __ldiskfs_handle_dirty... Resolved
is duplicated by LU-6739 EL7 mds-survey test_1: mds-survey failed Resolved
is duplicated by LU-6740 racer test_1: test failed to respond ... Resolved
is duplicated by LU-6752 sanity-lfsck test_15c: fail to repair... Resolved
Related
is related to LU-8983 kernel BUG at fs/jbd2/transaction.c:1... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity-lfsck test 1a failed as follows:

CMD: shadow-5vm12 /usr/sbin/lctl set_param fail_loc=0x1501
CMD: shadow-5vm12 /usr/sbin/lctl set_param fail_loc=0
10.1.4.53@tcp:/lustre /mnt/lustre lustre rw,flock,user_xattr 0 0
CMD: shadow-5vm1.shadow.whamcloud.com grep -c /mnt/lustre' ' /proc/mounts
Stopping client shadow-5vm1.shadow.whamcloud.com /mnt/lustre (opts:)
CMD: shadow-5vm1.shadow.whamcloud.com lsof -t /mnt/lustre
CMD: shadow-5vm1.shadow.whamcloud.com umount  /mnt/lustre 2>&1
CMD: shadow-5vm12 /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace -r
shadow-5vm12: Fail to start LFSCK: Read-only file system
 sanity-lfsck test_1a: @@@@@@ FAIL: (3) Fail to start LFSCK for namespace! 

Maloo report: https://testing.hpdd.intel.com/test_sets/8520310e-108d-11e5-a2d3-5254006e85c2



 Comments   
Comment by Oleg Drokin [ 16/Jun/15 ]

Right before the remount r/o we are getting a transaction credits overflow:

[18669.823920] Lustre: 2374:0:(osd_handler.c:918:osd_trans_start()) lustre-MDT0000: too many transaction credits (185 > 256)
[18669.828228] Lustre: 2374:0:(osd_handler.c:923:osd_trans_start())   create: 2/8, destroy: 1/4
[18669.831572] Lustre: 2374:0:(osd_handler.c:928:osd_trans_start())   attr_set: 2/2, xattr_set: 2/15
[18669.835192] Lustre: 2374:0:(osd_handler.c:935:osd_trans_start())   write: 7/29, punch: 0/0, quota 2/2
[18669.836831] Lustre: 2374:0:(osd_handler.c:940:osd_trans_start())   insert: 4/67, delete: 2/5
[18669.838251] Lustre: 2374:0:(osd_handler.c:945:osd_trans_start())   ref_add: 1/1, ref_del: 2/2
[18669.839654] Pid: 2374, comm: mdt00_001
[18669.840329] 
Call Trace:
[18669.841044]  [<ffffffffa0620843>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
[18669.842210]  [<ffffffffa0ba8af2>] osd_trans_start+0x642/0x670 [osd_ldiskfs]
[18669.861939]  [<ffffffffa0a2362e>] top_trans_start+0x60e/0x830 [ptlrpc]
[18669.872487]  [<ffffffffa0e1b421>] lod_trans_start+0x31/0x40 [lod]
[18669.873091]  [<ffffffffa0ea9394>] mdd_trans_start+0x14/0x20 [mdd]
[18669.873892]  [<ffffffffa0e99235>] mdd_unlink+0x4b5/0xa50 [mdd]
[18669.874503]  [<ffffffffa0763eb2>] ? dt_version_get+0x72/0x1f0 [obdclass]
[18669.875302]  [<ffffffffa0d6a41b>] mdt_reint_unlink+0xa7b/0x11c0 [mdt]
[18669.876120]  [<ffffffffa07811ae>] ? lu_ucred+0x1e/0x30 [obdclass]
[18669.884730]  [<ffffffffa0d6d920>] mdt_reint_rec+0x80/0x210 [mdt]
[18669.885320]  [<ffffffffa0d511ac>] mdt_reint_internal+0x58c/0x780 [mdt]
[18669.886148]  [<ffffffffa0d5a597>] mdt_reint+0x67/0x140 [mdt]
[18669.886709]  [<ffffffffa0a10235>] tgt_request_handle+0x6d5/0x1060 [ptlrpc]
[18669.887388]  [<ffffffffa09c020b>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
[18669.888540]  [<ffffffffa09bdd88>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
[18669.889192]  [<ffffffff810a9662>] ? default_wake_function+0x12/0x20
[18669.890231]  [<ffffffff810a0898>] ? __wake_up_common+0x58/0x90
[18669.890908]  [<ffffffffa09c43f8>] ptlrpc_main+0xaf8/0x1ea0 [ptlrpc]
[18669.891495]  [<ffffffff810ad8b6>] ? __dequeue_entity+0x26/0x40
[18669.892495]  [<ffffffffa09c3900>] ? ptlrpc_main+0x0/0x1ea0 [ptlrpc]
[18669.893083]  [<ffffffff8109739f>] kthread+0xcf/0xe0
[18669.893725]  [<ffffffff810972d0>] ? kthread+0x0/0xe0
[18669.894181]  [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0
[18669.894869]  [<ffffffff810972d0>] ? kthread+0x0/0xe0

IT's strange that the claim is that 185 is bigger than 256 when it's not.

And then we get:

[18680.472086] Lustre: DEBUG MARKER: /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace -r
[18680.583847] ------------[ cut here ]------------
[18680.585330] WARNING: at /var/lib/jenkins/workspace/lustre-master/arch/x86_64/build_type/server/distro/el7/ib_stack/inkernel/BUILD/BUILD/lustre-2.7.54/ldiskfs/ext4_jbd2.c:260 __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]()
[18680.607767] Modules linked in: osp(OF) mdd(OF) lod(OF) mdt(OF) lfsck(OF) mgs(OF) mgc(OF) osd_ldiskfs(OF) lquota(OF) fid(OF) fld(OF) ksocklnd(OF) ptlrpc(OF) obdclass(OF) lnet(OF) sha512_generic libcfs(OF) ldiskfs(OF) dm_mod nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd fscache xprtrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm floppy ib_sa ib_mad i2c_piix4 virtio_balloon ib_core ppdev pcspkr serio_raw ib_addr parport_pc parport ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk cirrus syscopyarea sysfillrect sysimgblt 8139too drm_kms_helper ttm virtio_pci virtio_ring virtio drm 8139cp mii ata_piix libata i2c_core [last unloaded: obdecho]

[18680.614887] CPU: 1 PID: 3261 Comm: lctl Tainted: GF       W  O--------------   3.10.0-229.4.2.el7_lustre.g1fee634.x86_64 #1
[18680.615845] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[18680.616350]  0000000000000000 0000000069f92699 ffff8800649af7c8 ffffffff816050da
[18680.617033]  ffff8800649af800 ffffffff8106e34b ffff8800610cb410 ffff88007bbe34b0
[18680.617731]  0000000000000000 ffffffffa05bd9c0 00000000000013f2 ffff8800649af810
[18680.618420] Call Trace:
[18680.618645]  [<ffffffff816050da>] dump_stack+0x19/0x1b
[18680.619098]  [<ffffffff8106e34b>] warn_slowpath_common+0x6b/0xb0
[18680.619622]  [<ffffffff8106e49a>] warn_slowpath_null+0x1a/0x20
[18680.620131]  [<ffffffffa05686b2>] __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]
[18680.620825]  [<ffffffffa057eec4>] ? ldiskfs_dirty_inode+0x54/0x60 [ldiskfs]
[18680.621456]  [<ffffffffa058b556>] ldiskfs_free_blocks+0x5e6/0xb90 [ldiskfs]
[18680.622082]  [<ffffffffa057fe95>] ldiskfs_xattr_release_block+0x275/0x330 [ldiskfs]
[18680.622733]  [<ffffffffa05830cb>] ldiskfs_xattr_delete_inode+0x2bb/0x300 [ldiskfs]
[18680.623388]  [<ffffffffa057d9f5>] ldiskfs_evict_inode+0x1b5/0x610 [ldiskfs]
[18680.623976]  [<ffffffff811e23d7>] evict+0xa7/0x170
[18680.624410]  [<ffffffff811e2c15>] iput+0xf5/0x180
[18680.624822]  [<ffffffffa0ba7b23>] osd_object_delete+0x1d3/0x300 [osd_ldiskfs]
[18680.625477]  [<ffffffffa075f04d>] lu_object_free.isra.30+0x9d/0x1a0 [obdclass]
[18680.626111]  [<ffffffffa075f212>] lu_object_put+0xc2/0x320 [obdclass]
[18680.626676]  [<ffffffffa075f486>] lu_object_put_nocache+0x16/0x20 [obdclass]
[18680.627307]  [<ffffffffa0742674>] local_object_unlink+0x374/0xc10 [obdclass]
[18680.627913]  [<ffffffffa0c92afd>] lfsck_namespace_load_one_trace_file.isra.66+0x2d/0x70 [lfsck]
[18680.628680]  [<ffffffffa0c988ad>] lfsck_namespace_reset+0x14d/0x510 [lfsck]
[18680.629338]  [<ffffffffa0c85a76>] lfsck_start+0x2f6/0x1410 [lfsck]
[18680.629876]  [<ffffffff811acba3>] ? __kmalloc+0x1f3/0x230
[18680.630394]  [<ffffffffa05b0002>] ? ldiskfs_move_extents+0x532/0x990 [ldiskfs]
[18680.631222]  [<ffffffff811abe3e>] ? kmem_cache_alloc_trace+0x1ce/0x1f0
[18680.632151]  [<ffffffffa0edb060>] ? osp_key_init+0x20/0x140 [osp]
[18680.632764]  [<ffffffffa0e840a9>] mdd_iocontrol+0x89/0xa60 [mdd]
[18680.633317]  [<ffffffffa076150f>] ? lu_context_init+0xff/0x260 [obdclass]
[18680.633961]  [<ffffffffa0d5843d>] mdt_iocontrol+0x1cd/0x990 [mdt]
[18680.634521]  [<ffffffffa072b6cc>] class_handle_ioctl+0x1b3c/0x22b0 [obdclass]
[18680.635200]  [<ffffffff812693a8>] ? security_capable+0x18/0x20
[18680.635747]  [<ffffffffa07115e2>] obd_class_ioctl+0xd2/0x170 [obdclass]
[18680.636341]  [<ffffffff811da2c5>] do_vfs_ioctl+0x2e5/0x4c0
[18680.636848]  [<ffffffff811e4e17>] ? __fd_install+0x47/0x60
[18680.637336]  [<ffffffff811da541>] SyS_ioctl+0xa1/0xc0
[18680.637813]  [<ffffffff81615029>] system_call_fastpath+0x16/0x1b
[18680.638346] ---[ end trace 3f466fc704c541d0 ]---
[18680.638793] LDISKFS-fs: ldiskfs_free_blocks:5106: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata
[18680.639699] LDISKFS: jbd2_journal_dirty_metadata failed: handle type 5 started at line 240, credits 3/0, errcode -28
[18680.640631] LDISKFS-fs error (device dm-0) in ldiskfs_free_blocks:5118: error 28
[18680.660378] Aborting journal on device dm-0-8.
[18680.672709] LDISKFS-fs (dm-0): Remounting filesystem read-only
[18680.673249] LDISKFS-fs warning (device dm-0): ldiskfs_evict_inode:274: couldn't extend journal (err -30)
[18680.674068] LDISKFS-fs error (device dm-0) in ldiskfs_evict_inode:303: error 28
[18680.674093] LustreError: 2353:0:(osd_handler.c:829:osd_trans_commit_cb()) transaction @0xffff88007b6430c0 commit error: 2
Comment by Andreas Dilger [ 16/Jun/15 ]

How often is this failing?

Comment by Andreas Dilger [ 16/Jun/15 ]

The reason it is complaining is because of a change to the underlying JBD2 transaction limits:

static int osd_param_is_not_sane(const struct osd_device *dev,
                                 const struct thandle *th)
{               
        struct osd_thandle *oh = container_of(th, typeof(*oh), ot_super);
                
        return oh->ot_credits > osd_transaction_size(dev);
}

#ifdef LDISKFS_HT_MISC
# define osd_transaction_size(dev) (osd_journal(dev)->j_max_transaction_buffers / 2)
#else
# define osd_transaction_size(dev) (osd_journal(dev)->j_max_transaction_buffers)
#endif

static int osd_trans_start(const struct lu_env *env, struct dt_device *d,
                           struct thandle *th)
{
        :
        :
                CWARN("%.16s: too many transaction credits (%d > %d)\n",
                      LDISKFS_SB(osd_sb(dev))->s_es->s_volume_name,
                      oh->ot_credits,   
                      osd_journal(dev)->j_max_transaction_buffers);

The CERROR() message is bad, and should use osd_transaction_size(dev) instead of accessing j_max_transaction_buffers directly. This was missed from http://review.whamcloud.com/10376 originally.

I expect that LDISKFS_HT_MISC is only true for RHEL 7 (the original upstream patch 8f7d89f36 is in 3.10 and later), and this is cutting the maximum transaction size in half. I don't think that this will be a problem in real usage, only if the journal size is very small, since this is a single-transaction limit.

An easy solution would be to increase the minimum journal size from 4MB to 8MB for RHEL7. Does the test environment specify the journal size explicitly? It seems that the default journal size should be 4096 blocks for filesystems over 128MB (according to mke2fs ext2fs_default_journal_size(). According to lustre/tests/cfg/local.sh the default OSTSIZE and MDSSIZE are 200MB.

Comment by Sarah Liu [ 18/Jun/15 ]

this issue caused many failures of EL7 client/server

https://testing.hpdd.intel.com/test_sessions/9d7790da-135d-11e5-b4b0-5254006e85c2

Comment by nasf (Inactive) [ 18/Jun/15 ]

The http://review.whamcloud.com/10376 mainly affects the the OSD-ldiskfs to complain that someone declare too much credits, but it is not fatal. The root reason for this trouble is inside ldiskfs itself because of not consider quota modification when ldiskfs_xattr_delete_inode().

Comment by Gerrit Updater [ 18/Jun/15 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/15334
Subject: LU-6722 ldiskfs: more credits to destroy inode with large EA
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4f54dc215328cc9dbc2cab1236db84de29afe6ad

Comment by Andreas Dilger [ 18/Jun/15 ]

Fan Yong,
there is also a generic issue that RHEL7 only allows transactions 1/2 the size of RHEL6, so in addition to your patch to fix the transaction credits for setxattr, there also needs to be a second patch to double the minimum journal size from 4MB to 8MB when running on kernels 3.10 and later.

Comment by Gerrit Updater [ 19/Jun/15 ]

Deleted irrelevant comment.

Comment by Gerrit Updater [ 26/Jun/15 ]

nevermind

Comment by Gerrit Updater [ 03/Jul/15 ]

Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/15401/
Subject: LU-6722 jbd: double minimum journal size for RHEL7
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: 15d2f58cd493a73f860a4415cac0da48c932e72a

Comment by Gerrit Updater [ 05/Jul/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15334/
Subject: LU-6722 ldiskfs: declare credits for quota when destroy inode
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e065d04c3cdcf40ae4d4fd4aee0b2548aca4d045

Comment by Peter Jones [ 05/Jul/15 ]

Landed for 2.8

Comment by James A Simmons [ 29/Jul/15 ]

We are going to need a patch for SLES12 now that it is supported.

Comment by Jian Yu [ 25/Aug/15 ]

On master branch, in the following report:
https://testing.hpdd.intel.com/sub_tests/ebb78d16-1044-11e5-8fc4-5254006e85c2

Dmesg on MDS 1 (shadow-15vm8):

[12947.228164] ------------[ cut here ]------------
[12947.228652] WARNING: at /var/lib/jenkins/workspace/lustre-master/arch/x86_64/build_type/server/distro/el7/ib_stack/inkernel/BUILD/BUILD/lustre-2.7.54/ldiskfs/ext4_jbd2.c:260 __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]()
[12947.230373] Modules linked in: osp(OF) mdd(OF) lod(OF) mdt(OF) lfsck(OF) mgs(OF) mgc(OF) osd_ldiskfs(OF) lquota(OF) fid(OF) fld(OF) ksocklnd(OF) ptlrpc(OF) obdclass(OF) lnet(OF) libcfs(OF) ldiskfs(OF) sha512_generic dm_mod nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd fscache xprtrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ppdev pcspkr serio_raw parport_pc virtio_balloon i2c_piix4 parport ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper ttm 8139too virtio_pci 8139cp virtio_ring virtio mii drm ata_piix libata i2c_core floppy [last unloaded: libcfs]

[12947.237641] CPU: 1 PID: 18882 Comm: llog_process_th Tainted: GF       W  O--------------   3.10.0-229.4.2.el7_lustre.g1fee634.x86_64 #1
[12947.238718] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[12947.239582]  0000000000000000 000000005e4d00bc ffff88006d6a77f0 ffffffff816050da
[12947.240672]  ffff88006d6a7828 ffffffff8106e34b ffff88000083fc98 ffff88007b5877e0
[12947.241691]  ffff88005c362708 ffffffffa062745c 0000000000000325 ffff88006d6a7838
[12947.242719] Call Trace:
[12947.243043]  [<ffffffff816050da>] dump_stack+0x19/0x1b
[12947.243689]  [<ffffffff8106e34b>] warn_slowpath_common+0x6b/0xb0
[12947.244632]  [<ffffffff8106e49a>] warn_slowpath_null+0x1a/0x20
[12947.245577]  [<ffffffffa05d26b2>] __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]
[12947.246666]  [<ffffffffa05e30c1>] ldiskfs_getblk+0x131/0x200 [ldiskfs]
[12947.247508]  [<ffffffffa05e31b7>] ldiskfs_bread+0x27/0xc0 [ldiskfs]
[12947.248258]  [<ffffffffa0bca919>] osd_ldiskfs_write_record+0x169/0x350 [osd_ldiskfs]
[12947.248974]  [<ffffffffa0bcabdf>] osd_write+0xdf/0x200 [osd_ldiskfs]
[12947.249558]  [<ffffffffa0765399>] dt_record_write+0x39/0x120 [obdclass]
[12947.250194]  [<ffffffffa0eed261>] osp_write_local_file.isra.16+0x284/0x30c [osp]
[12947.250845]  [<ffffffffa0eed7ca>] osp_last_used_init+0x1e7/0xaa6 [osp]
[12947.251417]  [<ffffffffa0ecef84>] osp_init0.isra.17+0x1b44/0x2060 [osp]
[12947.252042]  [<ffffffffa0ecf549>] osp_device_alloc+0xa9/0x1a0 [osp]
[12947.252606]  [<ffffffffa074b6c4>] obd_setup+0x114/0x2a0 [obdclass]
[12947.253147]  [<ffffffffa074f430>] class_setup+0x2a0/0x820 [obdclass]
[12947.253735]  [<ffffffffa0753b0c>] class_process_config+0x224c/0x2e50 [obdclass]
[12947.254369]  [<ffffffffa0571967>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[12947.254979]  [<ffffffffa0754f6f>] class_config_llog_handler+0x85f/0x16e0 [obdclass]
[12947.255645]  [<ffffffffa071945a>] llog_process_thread+0x7aa/0xe90 [obdclass]
[12947.256291]  [<ffffffffa071a4f0>] ? llog_backup+0x500/0x500 [obdclass]
[12947.256883]  [<ffffffffa071a53c>] llog_process_thread_daemonize+0x4c/0x80 [obdclass]
[12947.257551]  [<ffffffff8109739f>] kthread+0xcf/0xe0
[12947.258000]  [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
[12947.258565]  [<ffffffff81614f7c>] ret_from_fork+0x7c/0xb0
[12947.259066]  [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
[12947.259689] ---[ end trace ddd376db706c5663 ]---
[12947.260092] LDISKFS-fs: ldiskfs_getblk:805: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata
[12947.260934] LDISKFS-fs error (device dm-0): ldiskfs_getblk:805: inode #132: block 16827: comm llog_process_th: journal_dirty_metadata failed: handle type 0 started at line 967, credits 6/0, errcode -28
[12947.275334] Aborting journal on device dm-0-8.
[12947.287404] LDISKFS-fs (dm-0): Remounting filesystem read-only
[12947.288619] LustreError: 18882:0:(osd_io.c:1645:osd_ldiskfs_write_record()) dm-0: error reading offset 84152 (block 20): rc = -28
[12947.290466] LDISKFS-fs error (device dm-0) in osd_trans_stop:1052: error 28
[12947.290556] LustreError: 18458:0:(osd_handler.c:829:osd_trans_commit_cb()) transaction @0xffff8800700b43c0 commit error: 2
[12947.309035] LustreError: 18882:0:(osd_handler.c:1055:osd_trans_stop()) lustre-MDT0000-osd: failed to stop transaction: rc = -28
[12947.310070] LustreError: 18882:0:(osp_dev.c:283:osp_init_last_objid()) lustre-OST2917-osc-MDT0000: can't initialize lov_objid: rc = -28
[12947.311186] LustreError: 18882:0:(osp_dev.c:369:osp_last_used_init()) lustre-OST2917-osc-MDT0000: Can not get ids -28 from old objid!
[12947.312480] LustreError: 18882:0:(obd_config.c:558:class_setup()) setup lustre-OST2917-osc-MDT0000 failed (-28)
[12947.313339] LustreError: 18882:0:(obd_config.c:1601:class_config_llog_handler()) MGC10.1.4.179@tcp: cfg command failed: rc = -28
[12947.314362] Lustre:    cmd=cf003 0:lustre-OST2917-osc-MDT0000  1:lustre-OST2917_UUID  2:10.1.4.178@tcp  

[12947.315420] LustreError: 18477:0:(mgc_request.c:527:do_requeue()) failed processing log: -28
[12948.576155] LustreError: 18880:0:(tgt_lastrcvd.c:591:tgt_client_new()) lustre-MDT0000: Failed to write client lcd at idx 3, rc -30
[12953.576890] LustreError: 18480:0:(tgt_lastrcvd.c:591:tgt_client_new()) lustre-MDT0000: Failed to write client lcd at idx 3, rc -30
Comment by nasf (Inactive) [ 26/Aug/15 ]

The latest MDS corruption is different. The transaction was aborted because the osp_write_local_file() did not declare enough credits for write, the original failure was caused by less credits when delete/destroy. I have open a new ticket LU-7045 for the new issue.

Comment by nasf (Inactive) [ 26/Aug/15 ]

The new issue will be resolved in LU-7045

Comment by Gerrit Updater [ 22/Apr/16 ]

Alexander Boyko (alexander.boyko@seagate.com) uploaded a new patch: http://review.whamcloud.com/19732
Subject: LU-6722 ldiskfs: fix credits at ldiskfs_delete_inode
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 881c189401b7dd0d3bcc1f7846e77d7d95ebc9c7

Comment by Gerrit Updater [ 02/May/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19732/
Subject: LU-6722 ldiskfs: fix credits at ldiskfs_delete_inode
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d36f00b3b7c977acf1a092d188535d34d079c57a

Generated at Sat Feb 10 02:02:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.