[LU-7777] toss 3 client kernel panic in ll_get_dir_page() Created: 15/Feb/16  Updated: 23/Feb/16  Resolved: 23/Feb/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.5
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Ruth Klundt (Inactive) Assignee: Zhenyu Xu
Resolution: Duplicate Votes: 0
Labels: llnl
Environment:

toss-release-3.0-36alpha.ch6 lustre 2.5.5-3chaos-CHANGED-3.10.0-327.0.0.1chaos.ch6.x86_64


Issue Links:
Related
is related to LU-5162 sanity test_65ic oops on rhel7 Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We've started some client testing of TOSS 3 alpha against servers running toss-release-2.4-2.ch5.4.

This is more of a heads up, perhaps there are already newer builds of the client in action somewhere? If not and you cannot reproduce I'll start an Intel issue.

We can repeatably cause a kernel panic by doing active I/O to one file system (cp -ar <something large>) while simultaneously running ls -l in the other mounted file system, or occasionally in the same file system from a different session or node. This is regardless of the backing fs of the file system.

Here's a representative trace:
12:27:45 [10713.441784] BUG: unable to handle kernel paging request at 000000011df5000a
12:27:45 [10713.442712] IP: [<ffffffffa0d20ed5>] ll_get_dir_page+0x3e5/0x1650 [lustre]
12:27:45 [10713.442712] PGD 40dacd067 PUD 0
12:27:45 [10713.442712] Oops: 0000 1 SMP
12:27:45 [10713.442712] Modules linked in: lmv(OE) fld(OE) mgc(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) fid(OE) ksocklnd(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic crct10dif_common ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa iTCO_wdt iTCO_vendor_support ipmi_devintf coretemp kvm radeon dcdbas ib_mthca ttm i5000_edac drm_kms_helper ib_mad lpc_ich edac_core mfd_core ib_core drm ipmi_si serio_raw pcspkr ib_addr ipmi_msghandler i5k_amb shpchp acpi_cpufreq binfmt_misc ip_tables nfsv4 dns_resolver nfsv3 nfs fscache lockd grace nfs_acl sunrpc broadcom tg3 bnx2 igb dca i2c_algo_bit i2c_core e1000e ptp pps_core e1000
12:27:45 [10713.442712] CPU: 2 PID: 15157 Comm: bash Tainted: G OE ------------ 3.10.0-327.0.0.1chaos.ch6.x86_64 #1
12:27:45 [10713.442712] Hardware name: Dell Inc. PowerEdge 1950/0NK937, BIOS 1.1.0 06/21/2006
12:27:45 [10713.442712] task: ffff8803fd255080 ti: ffff8801310e8000 task.ti: ffff8801310e8000
12:27:45 [10713.442712] RIP: 0010:[<ffffffffa0d20ed5>] [<ffffffffa0d20ed5>] ll_get_dir_page+0x3e5/0x1650 [lustre]
12:27:45 [10713.442712] RSP: 0018:ffff8801310ebcb0 EFLAGS: 00010002
12:27:45 [10713.442712] RAX: 0000000000000001 RBX: ffff880409560108 RCX: 0000000000000000
12:27:45 [10713.442712] RDX: 000000011df5000a RSI: ffff8801310ebc60 RDI: 0000000000000000
12:27:45 [10713.442712] RBP: ffff8801310ebdc8 R08: 0000000000000000 R09: ffffffffffffffff
12:27:45 [10713.442712] R10: 0000000000000000 R11: 0000000000000220 R12: 0000000000000000
12:27:45 [10713.442712] R13: fffffffffffffffe R14: 000000011df5000a R15: ffff880409560258
12:27:45 [10713.442712] FS: 00002aaaaab086c0(0000) GS:ffff88042fc80000(0000) knlGS:0000000000000000
12:27:45 [10713.442712] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
12:27:45 [10713.442712] CR2: 000000011df5000a CR3: 0000000361820000 CR4: 00000000000007e0
12:27:45 [10713.442712] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
12:27:45 [10713.442712] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
12:27:45 [10713.442712] Stack:
12:27:45 [10713.442712] ffff8801310ebce8 ffff8801310ebdf0 ffff8801310ebcd0 ffff880409560270
12:27:46 [10713.442712] ffff880409560258 ffff880409560390 ffff8801310ebce8 a707e6e818891384
12:27:46 [10713.442712] 0000000000000000 ffffffffa0d4191a ffff8801310ebd28 ffffffffa0d6b520
12:27:46 [10713.442712] Call Trace:
12:27:46 [10713.442712] [<ffffffffa0d4191a>] ? __ll_inode_revalidate_it+0x1ea/0xc00 [lustre]
12:27:46 [10713.442712] [<ffffffffa0d6b520>] ? ll_symlink+0x1d0/0x1d0 [lustre]
12:27:46 [10713.442712] [<ffffffff8120836e>] ? mntput_no_expire+0x3e/0x120
12:27:46 [10713.442712] [<ffffffff81208474>] ? mntput+0x24/0x40
12:27:46 [10713.442712] [<ffffffffa0d221e5>] ll_dir_read+0xa5/0x380 [lustre]
12:27:46 [10713.442712] [<ffffffff811fb990>] ? fillonedir+0xf0/0xf0
12:27:46 [10713.442712] [<ffffffff81647f9a>] ? __slab_free+0x10e/0x277
12:27:46 [10713.442712] [<ffffffffa0d2258b>] ll_readdir+0xcb/0x360 [lustre]
12:27:46 [10713.442712] [<ffffffff811fb990>] ? fillonedir+0xf0/0xf0
12:27:46 [10713.442712] [<ffffffff811fb990>] ? fillonedir+0xf0/0xf0
12:27:46 [10713.442712] [<ffffffff811fb870>] vfs_readdir+0xb0/0xe0
12:27:46 [10713.442712] [<ffffffff811fbce5>] SyS_getdents+0x95/0x130
12:27:46 [10713.442712] [<ffffffff8165c6c9>] system_call_fastpath+0x16/0x1b
12:27:46 [10713.442712] Code: 89 85 00 ff ff ff e8 6b 29 93 e0 49 8d 7f 08 b9 01 00 00 00 4c 89 ea 4c 89 f6 e8 87 53 5e e0 85 c0 0f 8e bf 05 00 00 4c 8b 75 88 <49> 8b 06 f6 c4 80 0f 85 27 11 00 00 f0 41 ff 46 1c 66 66 66 66
12:27:46 [10713.442712] RIP [<ffffffffa0d20ed5>] ll_get_dir_page+0x3e5/0x1650 [lustre]
12:27:46 [10713.442712] RSP <ffff8801310ebcb0>
12:27:46 [10713.442712] CR2: 000000011df5000a
12:27:46 [10713.442712] --[ end trace f520f054f8c4c1cb ]--
12:27:46 [10713.442712] Kernel panic - not syncing: Fatal exception



 Comments   
Comment by Ruth Klundt (Inactive) [ 15/Feb/16 ]

ah so I created this in the wrong jira, but I was likely about to report this at Intel as well. Sorry for the confusion.

Comment by Peter Jones [ 16/Feb/16 ]

Bobijam

Could you please advise on this issue?

Thanks

Peter

Comment by Ruth Klundt (Inactive) [ 17/Feb/16 ]

Update and clarification, the panic is repeatable also when one process is doing rm -rf on some part of the lustre tree, and another one is doing recursive ls -l. Also in the description I misspoke, the processes are not on different nodes, but the same node. The processes can be on the same file system or 2 different ones. Thanks for taking a look.

Comment by Zhenyu Xu [ 18/Feb/16 ]

can we know which code line does ll_get_dir_page+0x3e5 point at?

Comment by Ruth Klundt (Inactive) [ 18/Feb/16 ]

I haven't got the source for this particular build, I've asked for it and will let you know. I believe it is built from the latest supported Intel 2.5.5 tag, but there may be other patches applied.

Comment by Peter Jones [ 18/Feb/16 ]

Ruth

We have access to the source, but Bobijam is hoping that you can run a gdb command to map the address referenced in the crash to the line of code affected. Are you familar with this process?

Peter

Comment by Ruth Klundt (Inactive) [ 18/Feb/16 ]

I was missing the debuginfo rpm, I have that now. it's a bit busy here today. I'm familiar with disassembly various ways, was going to do objdump but whatever is needed. probably tomorrow

Comment by Oleg Drokin [ 19/Feb/16 ]

objdump is a bit overkill.

Just use gdb on the lustre.ko (with debug symbols still in, out of the debuginfo rpm) and issue a "l *(ll_get_dir_page+0x3e5)" command.

Comment by Ruth Klundt (Inactive) [ 19/Feb/16 ]

here you go, let me know if more is needed.

(gdb) l *(ll_get_dir_page+0x3e5)
0x6ed5 is in ll_get_dir_page (/usr/src/kernels/3.10.0-327.0.0.1chaos.ch6.x86_64/arch/x86/include/asm/bitops.h:319).
314 }
315
316 static __always_inline int constant_test_bit(unsigned int nr, const volatile unsigned long *addr)
317

{ 318 return ((1UL << (nr % BITS_PER_LONG)) & 319 (addr[nr / BITS_PER_LONG])) != 0; 320 }

321
322 static inline int variable_test_bit(int nr, volatile const unsigned long *addr)
323 {

Comment by Oleg Drokin [ 19/Feb/16 ]

hm, thats a bit less useful than we hoped, I guess.

Can you substract 4 from the the 0x3e5 number until the resulting l command output in gdb will show us something in the ll_get_dir_page, please?

Comment by Ruth Klundt (Inactive) [ 19/Feb/16 ]

sure, this looks a bit better:
(gdb) l *(ll_get_dir_page+0x3e1)
0x6ed1 is in ll_get_dir_page (/usr/src/debug/lustre-2.5.5/lustre/llite/dir.c:282).
277 found = radix_tree_gang_lookup(&mapping->page_tree,
278 (void **)&page, offset, 1);
279 if (found > 0) {
280 struct lu_dirpage *dp;
281
282 page_cache_get(page);
283 spin_unlock_irq(&mapping->tree_lock);
284 /*
285 * In contrast to find_lock_page() we are sure that directory
286 * page cannot be truncated (while DLM lock is held) and,

Comment by Oleg Drokin [ 19/Feb/16 ]

Ok, that clear it.

Your bug is a duplicate of LU-5162, there's a patch too, so can you please close this as the dup once confirmed?

Comment by Ruth Klundt (Inactive) [ 19/Feb/16 ]

thanks, will do

fyi, the patch in LU-5162 is for mdc_request.c, however the b2_5 tree does not have a call to radix_tree_gang_lookup in that file, although it does appear in llite/dir.c. So I'm doing the similar change there for testing.

Comment by Christopher Morrone [ 22/Feb/16 ]

Intel, you will be providing a patch for b2_5_fe, I assume?

Comment by Peter Jones [ 22/Feb/16 ]

Yup. One is in flight already and will be flagged when it is ready for you to pick up.

Comment by Ruth Klundt (Inactive) [ 22/Feb/16 ]

I'm not able to trigger the oops with the following change built in:

diff --git a/lustre/llite/dir.c b/lustre/llite/dir.c
index 4f1b853..cf69cc4 100644
--- a/lustre/llite/dir.c
+++ b/lustre/llite/dir.c
@@ -276,7 +276,7 @@ static struct page *ll_dir_page_locate(struct inode *dir, __u64 *hash,
        spin_lock_irq(&mapping->tree_lock);
         found = radix_tree_gang_lookup(&mapping->page_tree,
                                        (void **)&page, offset, 1);
-        if (found > 0) {
+        if (found > 0 && !radix_tree_exceptional_entry(page)) {
                 struct lu_dirpage *dp;

                 page_cache_get(page);
Comment by Peter Jones [ 23/Feb/16 ]

2.5.x FE fix for LU-5162 ready to pick up

Generated at Sat Feb 10 02:11:49 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.