[LU-1414] ll_delete_inode() LBUG Created: 16/May/12 Updated: 29/May/17 Resolved: 29/May/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Shuichi Ihara (Inactive) | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
SLES11SP1 - Client |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 10263 |
| Description |
|
We hit the following LBUG on the cient at our customer site. May 7 01:42:31 alps1-36 kernel: [5827658.348638] LustreError: 23918:0:(llite_lib.c:1382:ll_delete_inode()) ASSERTION(inode->i_data.nrpages == 0) failed: inode=139398948/2615229393(ffff8818d28e6d50) nrpages=1199638, see http://jira.whamcloud.com/browse/LU-118 May 7 01:42:31 alps1-36 kernel: [5827658.348688] LustreError: 23918:0:(llite_lib.c:1382:ll_delete_inode()) LBUG As far as I can see |
| Comments |
| Comment by Peter Jones [ 16/May/12 ] |
|
Jinshan Does this seem to be related to the issues previously handled under LU118? Peter |
| Comment by Jinshan Xiong (Inactive) [ 16/May/12 ] |
|
This definitely is not Can I have a backtrace and recent error messages in lustre log? |
| Comment by Shuichi Ihara (Inactive) [ 17/May/12 ] |
|
client's /var/log/messages. will get /tmp/lustre-log.1336326151.23918 |
| Comment by Alexander Boyko [ 15/Nov/12 ] |
|
We saw the same issue. My point of view and analyze below. kmem -g 200000000001058
FLAGS: 200000000001058
PAGE-FLAG BIT VALUE
PG_referenced 3 0000008
PG_uptodate 4 0000010
PG_lru 6 0000040
PG_private 12 0001000
PG_slob_free 12 0001000
struct ll_async_page {
llap_magic = 98764321,
llap_write_queued = 0,
llap_defer_uptodate = 0,
llap_origin = 3,
llap_ra_used = 0,
llap_ignore_quota = 0,
llap_reserved = 0,
llap_pglist_cpu = 0,
llap_cookie = 0xffff880106be60e8,
llap_page = 0xffffea0004f58908,
llap_pending_write = {
next = 0x0,
prev = 0x0
},
llap_pglist_item = {
next = 0xffff880106be7e28,
prev = 0xffff880106be6218
},
llap_checksum = 0,
llap_lockh_granted = {
cookie = 0
}
}
No dirty, no lock, no writeback. Root cause. The first thread with LASSERT is ldlm_bl_61 - trace LustreError: 30477:0:(llite_lib.c:1384:ll_delete_inode()) ASSERTION(inode->i_data.nrpages == 0) failed: inode=326383344/3842751679(ffff88011f44c210) nrpages=2583694, see http://jira.whamcloud.com/browse/LU-118 LustreError: 30477:0:(llite_lib.c:1384:ll_delete_inode()) LBUG Pid: 30477, comm: ldlm_bl_61 Call Trace: [<ffffffff810072e9>] try_stack_unwind+0x149/0x190 [<ffffffff81005ca0>] dump_trace+0x90/0x300 [<ffffffffa01349b2>] libcfs_debug_dumpstack+0x52/0x80 [libcfs] [<ffffffffa0134f21>] lbug_with_loc+0x71/0xe0 [libcfs] [<ffffffffa04c32ef>] ll_delete_inode+0x14f/0x150 [lustre] [<ffffffff8111ac23>] generic_delete_inode+0xb3/0x180 [<ffffffff8111ad4d>] generic_drop_inode+0x5d/0x80 [<ffffffff811197cd>] iput+0x5d/0x70 [<ffffffffa04e7d46>] ll_mdc_blocking_ast+0x1b6/0x5a0 [lustre] [<ffffffffa026758d>] ldlm_cancel_callback+0x5d/0xe0 [ptlrpc] [<ffffffffa027ef0c>] ldlm_cli_cancel_local+0x6c/0x340 [ptlrpc] [<ffffffffa0283b3e>] ldlm_cli_cancel+0x5e/0x3f0 [ptlrpc] [<ffffffffa04e7d87>] ll_mdc_blocking_ast+0x1f7/0x5a0 [lustre] [<ffffffffa028763c>] ldlm_handle_bl_callback+0xbc/0x270 [ptlrpc] [<ffffffffa0287981>] ldlm_bl_thread_main+0x191/0x3e0 [ptlrpc] [<ffffffff810035ba>] child_rip+0xa/0x20 the second is ldlm_bl_67 - trace PID: 31380 TASK: ffff88069c4c90c0 CPU: 28 COMMAND: "ldlm_bl_67" #0 [ffff8807f5055ad0] schedule at ffffffff812bb7f5 #1 [ffff8807f5055ba8] __cond_resched at ffffffff8103ea65 #2 [ffff8807f5055bc8] __cond_resched_lock at ffffffff8103eab7 #3 [ffff8807f5055be8] cache_remove_lock at ffffffffa03c9eae [osc] #4 [ffff8807f5055c48] osc_extent_blocking_cb at ffffffffa03b3606 [osc] #5 [ffff8807f5055cb8] ldlm_cancel_callback at ffffffffa026758d [ptlrpc] #6 [ffff8807f5055cd8] ldlm_cli_cancel_local at ffffffffa027ef0c [ptlrpc] #7 [ffff8807f5055d38] ldlm_cli_cancel at ffffffffa0283b3e [ptlrpc] #8 [ffff8807f5055db8] osc_extent_blocking_cb at ffffffffa03b34cf [osc] #9 [ffff8807f5055e28] ldlm_handle_bl_callback at ffffffffa028763c [ptlrpc] #10 [ffff8807f5055e78] ldlm_bl_thread_main at ffffffffa0287981 [ptlrpc] #11 [ffff8807f5055f48] kernel_thread at ffffffff810035ba From the stack of ldlm_bl_67 I got ldlm_lock pointer bt -f
....
#1 [ffff8807f5055ba8] __cond_resched at ffffffff8103ea65
ffff8807f5055bb0: ffffffffffffff10 ffff88010456b568 <-- this is spinlock_t l_extents_list_lock;
ffff8807f5055bc0: ffff8807f5055be0 ffffffff8103eab7
#2 [ffff8807f5055bc8] __cond_resched_lock at ffffffff8103eab7
ffff8807f5055bd0: ffff88061718e640 ffff88061718e640
ffff8807f5055be0: ffff8807f5055c40 ffffffffa03c9eae
#3 [ffff8807f5055be8] cache_remove_lock at ffffffffa03c9eae [osc]
ffff8807f5055bf0: ffff880815d647c0 ffff88010456b400
ffff8807f5055c00: ffff880815d647d8 ffff88010456b570
ffff8807f5055c10: ffff88010456b568 ffff88011f44c210
....
crash> p/x (0xffff88010456b568 -360)
$18 = 0xffff88010456b400
crash> ldlm_lock 0xffff88010456b400
struct ldlm_lock {
l_handle = {
h_link = {
next = 0xffffc90010719050,
prev = 0xffffc90010719050
},
h_cookie = 17652665684890613763,
h_addref = 0xffffffffa0262210 <lock_handle_addref>,
h_lock = {
raw_lock = {
slock = 1542
}
},
...
l_lvb_len = 40,
l_lvb_data = 0xffff88012d525e40,
l_lvb_swabber = 0xffffffffa02b1c20,
l_ast_data = 0xffff88011f44c210, <--- this is inode
l_extents_list_lock = {
raw_lock = {
slock = 6425
}
},
.....
So we can see inode from ldlm_lock is the same as LASSERT inode ffff88011f44c210. |
| Comment by Andreas Dilger [ 29/May/17 ] |
|
Close old ticket. |