[LU-6446] Warn-on in ldiskfs_orphan_add/del Created: 10/Apr/15  Updated: 15/Nov/19  Resolved: 28/Jul/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Major
Reporter: Oleg Drokin Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-4252 Failure on test suite racer test_1 Resolved
is related to LU-3373 ldiskfs patches for FC19 Resolved
is related to LU-12977 fix i_mutex for ldiskfs_truncate() in... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Testing on RHEL7.1 kernel servers, I see this warning in the dmesg:

[ 2083.721470] ------------[ cut here ]------------
[ 2083.721664] WARNING: at /home/green/git/lustre-release/ldiskfs/namei.c:3167 ldiskfs_orphan_add+0x11e/0x280 [ldiskfs]()
[ 2083.721992] Modules linked in: lustre(OF) ofd(OF) osp(OF) lod(OF) ost(OF) mdt(OF) mdd(OF) mgs(OF) osd_ldiskfs(OF) ldiskfs(OF) lquota(OF) lfsck(OF) obdecho(OF) mgc(OF) lov(OF) osc(OF) mdc(OF) lmv(OF) fid(OF) fld(OF) ptlrpc(OF) obdclass(OF) ksocklnd(OF) lnet(OF) libcfs(OF) ext4(F) loop(F) mbcache(F) jbd2(F) sha512_generic(F) rpcsec_gss_krb5(F) ata_generic(F) pata_acpi(F) ttm(F) drm_kms_helper(F) drm(F) ata_piix(F) i2c_piix4(F) virtio_blk(F) serio_raw(F) virtio_balloon(F) libata(F) pcspkr(F) floppy(F) i2c_core(F) virtio_console(F) [last unloaded: libcfs]
[ 2083.723513] CPU: 4 PID: 8690 Comm: ll_ost_io01_002 Tainted: GF       W  O--------------   3.10.0-debug #5
[ 2083.723827] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 2083.723997]  0000000000000000 000000001b75d70c ffff88002193fa60 ffffffff816ccb68
[ 2083.724317]  ffff88002193fa98 ffffffff81070efb ffff8800707419f8 0000000000000000
[ 2083.724627]  ffff880046645800 ffff8800a72affa0 ffff880046644800 ffff88002193faa8
[ 2083.724934] Call Trace:
[ 2083.725080]  [<ffffffff816ccb68>] dump_stack+0x19/0x1b
[ 2083.725247]  [<ffffffff81070efb>] warn_slowpath_common+0x6b/0xb0
[ 2083.725421]  [<ffffffff8107104a>] warn_slowpath_null+0x1a/0x20
[ 2083.725599]  [<ffffffffa093d41e>] ldiskfs_orphan_add+0x11e/0x280 [ldiskfs]
[ 2083.725790]  [<ffffffffa096b732>] ? ldiskfs_block_zero_page_range+0x292/0x3e0 [ldiskfs]
[ 2083.726104]  [<ffffffffa096df08>] ldiskfs_truncate+0x1c8/0x430 [ldiskfs]
[ 2083.726295]  [<ffffffffa09fde7f>] osd_punch+0x12f/0x5e0 [osd_ldiskfs]
[ 2083.726481]  [<ffffffffa0c601cf>] ofd_object_punch+0x6df/0xc10 [ofd]
[ 2083.726702]  [<ffffffffa0c517b6>] ofd_punch_hdl+0x466/0x9e0 [ofd]
[ 2083.726913]  [<ffffffffa075a3f5>] tgt_request_handle+0x645/0xfe0 [ptlrpc]
[ 2083.727123]  [<ffffffffa070bc31>] ptlrpc_server_handle_request+0x231/0xab0 [ptlrpc]
[ 2083.727434]  [<ffffffffa07097a8>] ? ptlrpc_wait_event+0xb8/0x360 [ptlrpc]
[ 2083.727635]  [<ffffffffa070fb80>] ptlrpc_main+0xae0/0x1ee0 [ptlrpc]
[ 2083.727814]  [<ffffffff816d5b97>] ? _raw_spin_unlock_irq+0x27/0x50
[ 2083.727989]  [<ffffffff816d38be>] ? __schedule+0x2fe/0x810
[ 2083.728182]  [<ffffffffa070f0a0>] ? ptlrpc_register_service+0xf20/0xf20 [ptlrpc]
[ 2083.728475]  [<ffffffff8109c00a>] kthread+0xea/0xf0
[ 2083.728639]  [<ffffffff8109bf20>] ? kthread_create_on_node+0x140/0x140
[ 2083.728819]  [<ffffffff816df2bc>] ret_from_fork+0x7c/0xb0
[ 2083.728986]  [<ffffffff8109bf20>] ? kthread_create_on_node+0x140/0x140
[ 2083.729175] ---[ end trace 44cb2ff8a147856d ]---
[ 2083.729370] ------------[ cut here ]------------
[ 2083.729539] WARNING: at /home/green/git/lustre-release/ldiskfs/namei.c:3249 ldiskfs_orphan_del+0x17d/0x2a0 [ldiskfs]()
[ 2083.729866] Modules linked in: lustre(OF) ofd(OF) osp(OF) lod(OF) ost(OF) mdt(OF) mdd(OF) mgs(OF) osd_ldiskfs(OF) ldiskfs(OF) lquota(OF) lfsck(OF) obdecho(OF) mgc(OF) lov(OF) osc(OF) mdc(OF) lmv(OF) fid(OF) fld(OF) ptlrpc(OF) obdclass(OF) ksocklnd(OF) lnet(OF) libcfs(OF) ext4(F) loop(F) mbcache(F) jbd2(F) sha512_generic(F) rpcsec_gss_krb5(F) ata_generic(F) pata_acpi(F) ttm(F) drm_kms_helper(F) drm(F) ata_piix(F) i2c_piix4(F) virtio_blk(F) serio_raw(F) virtio_balloon(F) libata(F) pcspkr(F) floppy(F) i2c_core(F) virtio_console(F) [last unloaded: libcfs]
[ 2083.731374] CPU: 4 PID: 8690 Comm: ll_ost_io01_002 Tainted: GF       W  O--------------   3.10.0-debug #5
[ 2083.731693] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 2083.731862]  0000000000000000 000000001b75d70c ffff88002193fa50 ffffffff816ccb68
[ 2083.732180]  ffff88002193fa88 ffffffff81070efb ffff8800707419f8 ffff8800707419f8
[ 2083.732488]  ffff8800a72affa0 ffff880046645800 0000000000000000 ffff88002193fa98
[ 2083.732798] Call Trace:
[ 2083.732932]  [<ffffffff816ccb68>] dump_stack+0x19/0x1b
[ 2083.733108]  [<ffffffff81070efb>] warn_slowpath_common+0x6b/0xb0
[ 2083.733282]  [<ffffffff8107104a>] warn_slowpath_null+0x1a/0x20
[ 2083.733456]  [<ffffffffa093e6bd>] ldiskfs_orphan_del+0x17d/0x2a0 [ldiskfs]
[ 2083.733647]  [<ffffffffa096e10b>] ldiskfs_truncate+0x3cb/0x430 [ldiskfs]
[ 2083.733834]  [<ffffffffa09fde7f>] osd_punch+0x12f/0x5e0 [osd_ldiskfs]
[ 2083.734027]  [<ffffffffa0c601cf>] ofd_object_punch+0x6df/0xc10 [ofd]
[ 2083.734208]  [<ffffffffa0c517b6>] ofd_punch_hdl+0x466/0x9e0 [ofd]
[ 2083.734401]  [<ffffffffa075a3f5>] tgt_request_handle+0x645/0xfe0 [ptlrpc]
[ 2083.734604]  [<ffffffffa070bc31>] ptlrpc_server_handle_request+0x231/0xab0 [ptlrpc]
[ 2083.734913]  [<ffffffffa07097a8>] ? ptlrpc_wait_event+0xb8/0x360 [ptlrpc]
[ 2083.735123]  [<ffffffffa070fb80>] ptlrpc_main+0xae0/0x1ee0 [ptlrpc]
[ 2083.735302]  [<ffffffff816d5b97>] ? _raw_spin_unlock_irq+0x27/0x50
[ 2083.735478]  [<ffffffff816d38be>] ? __schedule+0x2fe/0x810
[ 2083.735661]  [<ffffffffa070f0a0>] ? ptlrpc_register_service+0xf20/0xf20 [ptlrpc]
[ 2083.735954]  [<ffffffff8109c00a>] kthread+0xea/0xf0
[ 2083.736127]  [<ffffffff8109bf20>] ? kthread_create_on_node+0x140/0x140
[ 2083.736307]  [<ffffffff816df2bc>] ret_from_fork+0x7c/0xb0
[ 2083.736474]  [<ffffffff8109bf20>] ? kthread_create_on_node+0x140/0x140
[ 2083.736657] ---[ end trace 44cb2ff8a147856e ]---

The warning in question both times is:

        WARN_ON_ONCE(!(inode->i_state & (I_NEW | I_FREEING)) &&
                     !mutex_is_locked(&inode->i_mutex));

So I imagine we need some sort of extra semaphore?



 Comments   
Comment by Andreas Dilger [ 10/Apr/15 ]

It looks like there is a requirement to hold i_mutex when calling ext4_orphan_add() and ext4_orphan_del() because the internal locking has changed to reduce contention on s_orphan_lock (landed upstream in 3.15 but backported to RHEL7.1):

commit d745a8c20c1f864c10ca78d0f89219633861b7e9
Author: Jan Kara <jack@suse.cz>
Date:   Mon May 26 11:56:53 2014 -0400

    ext4: reduce contention on s_orphan_lock
    
    Shuffle code around in ext4_orphan_add() and ext4_orphan_del() so that
    we avoid taking global s_orphan_lock in some cases and hold it for
    shorter time in other cases.
    
    Signed-off-by: Jan Kara <jack@suse.cz>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

At a minimum, we need to hold i_mutex in osd_punch() around calling ldiskfs_truncate(). There may be other callpaths that call ext4_orphan_add() or ext4_orphan_del(), but they may be hidden because this is a WARN_ON_ONCE(). They will show up once osd_punch() is fixed.

Comment by Yang Sheng [ 13/Apr/15 ]

We did lock i_mutex around ldiskfs_truncate, But it causes a deadlock. (LU-4252) And add a ldiskfs patch to remove some related warning message. Looks like ext4_orphan_add() and ext4_orphan_del() just were called in ldiskfs_truncate in Lustre.

Comment by Bob Glossman (Inactive) [ 15/Apr/15 ]

As Yang Sheng mentioned above holding i_mutex led to deadlocks. Instead we made a patch for ldiskfs, ext4-remove-truncate-warning.patch, to delete the warning. A similar patch was later added to the patches for el6.6 when a similar new warning was seen there. Strongly suspect that el7.1 has similar warnings in more places than in el7.0. We need to either extend the patch to take out more warnings or figure out a way to hold i_mutex without causing deadlocks.

Comment by Gerrit Updater [ 06/May/15 ]

Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/14690
Subject: LU-6446 ldiskfs: remove WARN_ON from ldiskfs_orphan_add

{del}

Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 34395e31c1a1da84f412f928eacc05a7001d56f3

Comment by Gerrit Updater [ 27/Jul/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14690/
Subject: LU-6446 ldiskfs: remove WARN_ON from ldiskfs_orphan_add

{del}

Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3daba43a191a0a2c9358d39b3f48b1b759a41819

Comment by Yang Sheng [ 28/Jul/15 ]

Patch landed. Close ticket.

Generated at Sat Feb 10 02:00:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.