[LU-11582] LBUG: ASSERTION( inode->i_data.nrpages == 0 ) failed Created: 29/Oct/18 Updated: 05/Nov/19 Resolved: 27/Nov/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.5 |
| Fix Version/s: | Lustre 2.12.0, Lustre 2.10.7 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Stephane Thiell | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Client: CentOS 7.5 Lustre 2.10.5 |
||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Hi, [11497.465606] LustreError: 132407:0:(llite_lib.c:2047:ll_delete_inode()) ASSERTION( inode->i_data.nrpages == 0 ) failed: inode=[0x200018e83:0x1ba2c:0x0](ffff8aa85a298510) nrpages=1, see LU-118 [11497.487939] LustreError: 132407:0:(llite_lib.c:2047:ll_delete_inode()) LBUG [11497.495730] Pid: 132407, comm: spades 3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018 [11497.505939] Call Trace: [11497.508685] [<ffffffffc09947cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [11497.516009] [<ffffffffc099487c>] lbug_with_loc+0x4c/0xa0 [libcfs] [11497.522955] [<ffffffffc0f25c87>] ll_delete_inode+0x1b7/0x1c0 [lustre] [11497.530291] [<ffffffff8d43c504>] evict+0xb4/0x180 [11497.535663] [<ffffffff8d43ce0c>] iput+0xfc/0x190 [11497.540940] [<ffffffff8d43126e>] do_unlinkat+0x1ae/0x2d0 [11497.546990] [<ffffffff8d432326>] SyS_unlink+0x16/0x20 [11497.552753] [<ffffffff8d92579b>] system_call_fastpath+0x22/0x27 [11497.559484] [<ffffffffffffffff>] 0xffffffffffffffff [11497.565069] Kernel panic - not syncing: LBUG [11497.569837] CPU: 7 PID: 132407 Comm: spades Kdump: loaded Tainted: G OE ------------ 3.10.0-862.14.4.el7.x86_64 #1 [11497.582928] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.8.0 005/17/2018 [11497.591379] Call Trace: [11497.594105] [<ffffffff8d913754>] dump_stack+0x19/0x1b [11497.599845] [<ffffffff8d90d29f>] panic+0xe8/0x21f [11497.605211] [<ffffffffc09948cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [11497.612131] [<ffffffffc0f25c87>] ll_delete_inode+0x1b7/0x1c0 [lustre] [11497.619421] [<ffffffff8d43c504>] evict+0xb4/0x180 [11497.624775] [<ffffffff8d43ce0c>] iput+0xfc/0x190 [11497.630033] [<ffffffff8d43126e>] do_unlinkat+0x1ae/0x2d0 [11497.636064] [<ffffffff8d42175e>] ? ____fput+0xe/0x10 [11497.641709] [<ffffffff8d2bab90>] ? task_work_run+0xc0/0xe0 [11497.647935] [<ffffffff8d432326>] SyS_unlink+0x16/0x20 [11497.653679] [<ffffffff8d92579b>] system_call_fastpath+0x22/0x27 Thanks, |
| Comments |
| Comment by Stephane Thiell [ 29/Oct/18 ] |
|
vmcore uploaded to your ftp server, the file is vmcore-sh-112-03-2018-10-26-22-17-26_ kernel used is CentOS 7 3.10.0-862.14.4.el7.x86_64 ( http://debuginfo.centos.org/7/x86_64/ ) Thanks! |
| Comment by Peter Jones [ 30/Oct/18 ] |
|
Bobijam Could you please advise? Thanks Peter |
| Comment by Zhenyu Xu [ 11/Nov/18 ] |
|
I think the assertion is reading the nrpages without supposed being protected under mapping->tree_lock, and truncate_inode_pages() is traverse the mapping's radix tree without tree_lock, and could miss finding the page being removed from the radix in __remove_mapping() truncate_inode_pages_final()
nrpages = mapping->nrpages;
smp_rmb();
nrexceptional = mapping->nrexceptional;
if (nrpages || nrexceptional) {
/*
* As truncation uses a lockless tree lookup, cycle
* the tree lock to make sure any ongoing tree
* modification that does not see AS_EXITING is
* completed before starting the final truncate.
*/
spin_lock_irq(&mapping->tree_lock);
spin_unlock_irq(&mapping->tree_lock);
// race window, that __remove_mapping() removes the page from the radix,
// but nrpages hasn't been decreased yet.
truncate_inode_pages(mapping, 0);
}
And I think our truncate_inode_pages_final() in lustre/include/lustre_compat.h made the right sequence call #ifndef HAVE_TRUNCATE_INODE_PAGES_FINAL static inline void truncate_inode_pages_final(struct address_space *map) { truncate_inode_pages(map, 0); /* Workaround for LU-118 */ if (map->nrpages) { spin_lock_irq(&map->tree_lock); // after get the tree_lock, we avoid the race spin_unlock_irq(&map->tree_lock); } /* Workaround end */ } #endif I think the fix could be add a tree_lock for checking the nrpages in ll_delete_inode, or just delete this assertion. |
| Comment by Gerrit Updater [ 11/Nov/18 ] |
|
Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/33639 |
| Comment by Gerrit Updater [ 18/Nov/18 ] |
|
Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/33681 |
| Comment by Gerrit Updater [ 27/Nov/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33639/ |
| Comment by Peter Jones [ 27/Nov/18 ] |
|
Landed for 2.12 |
| Comment by Gerrit Updater [ 05/Jan/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33681/ |