[LU-5771] Crashed OSSs when unmounting OST without cleanuping orphan inodes properly Created: 20/Oct/14  Updated: 14/Jun/15  Resolved: 31/Dec/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.3
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Major
Reporter: Wang Shilong (Inactive) Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Related
Severity: 3
Rank (Obsolete): 16196

 Description   

During Tests, we hit something like following:

<2>LDISKFS-fs error (device dm-2): __ldiskfs_ext_check_block: bad header/extent in inode #659: invalid magic - magic e000, entries 456, max 0(0), depth 51424(0)
<3>Aborting journal on device dm-2-8.
<2>LDISKFS-fs error (device dm-2) in ldiskfs_free_blocks: Journal has aborted
<2>LDISKFS-fs error (device dm-2) in ldiskfs_free_blocks: Journal has aborted
<2>LDISKFS-fs (dm-2): Remounting filesystem read-only
<2>LDISKFS-fs error (device dm-2) in ldiskfs_free_blocks: Journal has aborted
<2>LDISKFS-fs error (device dm-2) in ldiskfs_ext_remove_space: Journal has aborted
<2>LDISKFS-fs error (device dm-2) in ldiskfs_reserve_inode_write: Journal has aborted
<2>LDISKFS-fs error (device dm-2) in ldiskfs_ext_truncate: Journal has aborted
<4>LDISKFS-fs warning (device dm-2): ldiskfs_delete_inode: couldn't extend journal (err -5)
<3>LDISKFS-fs (dm-2): Inode 280 (ffff8803a9ecb6d8): orphan list check failed!

Some bad thing happen which forces filesystem to readonly, and there are still in memory orphan inode that was not cleared, which cause following problem:

<4>Pid: 45622, comm: umount Not tainted 2.6.32-431.17.1.el6_lustre.2.5.18.ddn2.x86_64 #1 Dell Inc. PowerEdge R620/01W23F
<4>RIP: 0010:[<ffffffffa169081a>] [<ffffffffa169081a>] ldiskfs_put_super+0x33a/0x380 [ldiskfs]
<4>RSP: 0018:ffff88027ecf39f8 EFLAGS: 00010296
<4>RAX: 003fffffffffffd4 RBX: ffff88102359c800 RCX: 00400000000000a4
<4>RDX: 0000000000000000 RSI: 0000000000000046 RDI: ffffffffa16a55b8
<4>RBP: ffff88027ecf3a38 R08: 0000000000000000 R09: ffffffff81645da0
<4>R10: 0000000000000001 R11: 0000000000000000 R12: ffff88102359c000
<4>R13: ffff88102359c980 R14: ffff88102359c9f0 R15: 004000000000006c
<4>FS: 00007fe81d661740(0000) GS:ffff880061c00000(0000) knlGS:0000000000000000
<4>CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>CR2: 00007fff13c76010 CR3: 0000001d707f0000 CR4: 00000000001407f0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process umount (pid: 45622, threadinfo ffff88027ecf2000, task ffff8810258ac040)
<4>Stack:
<4> ffff880200000000 ffff88027ecf39f8 ffff88102359c000 ffff88102359c000
<4><d> ffffffffa169d5a0 ffffffff81c06500 ffff88102359c000 ffff880f0371e138
<4><d> ffff88027ecf3a58 ffffffff8118af0b ffff881fe54cb540 0000000000000003
<4>Call Trace:
<4> [<ffffffff8118af0b>] generic_shutdown_super+0x5b/0xe0
<4> [<ffffffff8118afc1>] kill_block_super+0x31/0x50
<4> [<ffffffff8118b797>] deactivate_super+0x57/0x80
<4> [<ffffffff811aa79f>] mntput_no_expire+0xbf/0x110
<4> [<ffffffffa1719d99>] osd_umount+0x79/0x150 [osd_ldiskfs]
<4> [<ffffffffa171e9b7>] osd_device_fini+0x147/0x190 [osd_ldiskfs]
<4> [<ffffffffa103b973>] class_cleanup+0x573/0xd30 [obdclass]
<4> [<ffffffffa100e366>] ? class_name2dev+0x56/0xe0 [obdclass]
<4> [<ffffffffa103d69a>] class_process_config+0x156a/0x1ad0 [obdclass]
<4> [<ffffffffa1035ff3>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass]
<4> [<ffffffffa103dd79>] class_manual_cleanup+0x179/0x6f0 [obdclass]
<4> [<ffffffffa100c83b>] ? class_export_put+0x10b/0x2c0 [obdclass]
<4> [<ffffffffa1723c65>] osd_obd_disconnect+0x1c5/0x1d0 [osd_ldiskfs]
<4> [<ffffffffa104031b>] lustre_put_lsi+0x1ab/0x11a0 [obdclass]
<4> [<ffffffffa10488c8>] lustre_common_put_super+0x5d8/0xbf0 [obdclass]
<4> [<ffffffffa1070f6d>] server_put_super+0x1bd/0xf60 [obdclass]
<4> [<ffffffff8118af0b>] generic_shutdown_super+0x5b/0xe0
<4> [<ffffffff8118aff6>] kill_anon_super+0x16/0x60
<4> [<ffffffffa103fc26>] lustre_kill_super+0x36/0x60 [obdclass]
<4> [<ffffffff8118b797>] deactivate_super+0x57/0x80
<4> [<ffffffff811aa79f>] mntput_no_expire+0xbf/0x110
<4> [<ffffffff811ab2eb>] sys_umount+0x7b/0x3a0
<4> [<ffffffff8108a281>] ? sigprocmask+0x71/0x110
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
<4>Code: 01 00 00 4d 39 fe 75 11 4c 3b b3 f0 01 00 00 0f 84 81 fe ff ff 0f 0b eb fe 49 8d 87 68 ff ff ff 49 8d 4f 38 48 c7 c7 b8 55 6a a1 <48> 8b b0 d8 01 00 00 44 8b 88 1c 01 00 00 44 0f b7 80 7e 01 00
<1>RIP [<ffffffffa169081a>] ldiskfs_put_super+0x33a/0x380 [ldiskfs]
<4> RSP <ffff88027ecf39f8>

This maybe because a free after accessing problem, from codes inode memory is freed, and in ext4_put_super, we will access it which maybe cause problem(I am not sure about this part analysis.)

But even above problem is not true, we still can run into:

J_ASSERT(list_empty(&sbi->s_orphan))

which will crash kernel, so we need fix this problem.



 Comments   
Comment by Wang Shilong (Inactive) [ 20/Oct/14 ]

This is patch that i tried to fix this problem:

http://review.whamcloud.com/#/c/12349/

Comment by Peter Jones [ 20/Oct/14 ]

Yang Sheng

Could you please advise on this issue and proposed patch?

Thanks

Peter

Comment by Andreas Dilger [ 21/Oct/14 ]

It looks like this was fixed in the upstream kernel commit in 2.6.35:

commit 4538821993f4486c76090dfb377c60c0a0e71ba3
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Thu Jul 29 15:06:10 2010 -0400

    ext4: drop inode from orphan list if ext4_delete_inode() fails
    
    There were some error paths in ext4_delete_inode() which was not
    dropping the inode from the orphan list.  This could lead to a BUG_ON
    on umount when the orphan list is discovered to be non-empty.
    
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index a52d5af..533b607 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -221,6 +221,7 @@ void ext4_delete_inode(struct inode *inode)
                                     "couldn't extend journal (err %d)", err);
                stop_handle:
                        ext4_journal_stop(handle);
+                       ext4_orphan_del(NULL, inode);
                        goto no_delete;
                }
        }
Comment by Wang Shilong (Inactive) [ 22/Oct/14 ]

Hello Andreas Dilger,

Thanks for confirming, i missed it, so i will keep original commit message and resend my patch.

Best Regard,
Wang Shilong

Comment by Andreas Dilger [ 25/Oct/14 ]

This patch will also be needed for RHEL6.6.

Wang Shilong, is it possible for you to submit a bug upstream to RH asking them to merge this patch into their RHEL6 kernel patches? Please include the reference to the upstream kernel patch.

If not, Yang Sheng, can you do this?

Comment by Wang Shilong (Inactive) [ 25/Oct/14 ]

Hello Andreas Dilger,

I am glad to do this!
https://bugzilla.redhat.com/show_bug.cgi?id=1156661

Best Regards,
Wang Shilong

Comment by Peter Jones [ 27/Oct/14 ]

Thanks Wang Shilong! I have passed along to Red Hat that we are also interested in seeing this fix land.

Comment by Wang Shilong (Inactive) [ 31/Oct/14 ]

Hello,

Seems this patch applies only for rhel6.5,with previous version there are conflicts,
so my question is whether i need give separate patch for different version?

Best Regards,
Wang Shilong

Comment by Wang Shilong (Inactive) [ 31/Oct/14 ]

One more question:

I noticed Latest Lustre master seems not applying cleanly for rhel6.4, see following messages:

[root@localhost linux-2.6.32-358.el6.x86_64]# quilt push -av
Applying patch patches/mpt-fusion-max-sge-rhel6.patch
patching file drivers/message/fusion/Kconfig
patching file drivers/message/fusion/mptbase.h
Hunk #1 succeeded at 166 (offset 1 line).

Applying patch patches/raid5-mmp-unplug-dev-rhel6.patch
patching file drivers/md/raid5.c
Hunk #1 FAILED at 2177.
Hunk #2 succeeded at 4198 (offset 66 lines).
1 out of 2 hunks FAILED – rejects in file drivers/md/raid5.c
Restoring drivers/md/raid5.c
Patch patches/raid5-mmp-unplug-dev-rhel6.patch does not apply (enforce with -f)
Restoring drivers/md/raid5.c

So latest Lustre did not apply patches cleanly for rhel6.4, but i use series is
lustre-release/lustre/kernel_patches/series/2.6-rhel6.series series

So my question is master could not guarantee applying patches cleanly for all rhel6 series?
Best regards,
Wang Shilong

Comment by James A Simmons [ 31/Oct/14 ]

We should see if this fix is needed for SLES11SP3.

Comment by Andreas Dilger [ 02/Dec/14 ]

James, this shouldn't be needed for SLES11 since that is based on at least 3.0 kernels, and the bug was fixed in the upstream kernel in 2.6.35. Only the RHEL6 kernels are originally based on 2.6.32 (with a large number of other ext4 patches, but strangely not this one).

Comment by Gerrit Updater [ 17/Dec/14 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12349/
Subject: LU-5771 ldiskfs: cleanup orphan inode in error path
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2dc56b1132a1680d664e8093a33f5ce799865abb

Comment by Yang Sheng [ 31/Dec/14 ]

Patch landed. Close this ticket.

Comment by Gerrit Updater [ 27/Jan/15 ]

Shilong Wang (wshilong@ddn.com) uploaded a new patch: http://review.whamcloud.com/13533
Subject: LU-5771 ldiskfs: cleanup orphan inode in error path
Project: fs/lustre-release
Branch: b2_5
Current Patch Set: 1
Commit: d4c07f1ef0ce637861a0f40de4dfacde11e392af

Generated at Sat Feb 10 01:54:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.