[LU-9097] sanity test_253 test_255: ZFS list corruption Created: 09/Feb/17  Updated: 17/Apr/17  Resolved: 17/Apr/17

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Alex Zhuravlev
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-9110 sanity test_255a: test failed to resp... Open
Related
is related to LU-8582 Interop: master<->b2_8 - sanity test... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Andreas Dilger <andreas.dilger@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/937abbdc-e5e0-11e6-978f-5254006e85c2.

The sub-test test_253 timed out. Looking at the console log on the server, it appears to be ZFS list corruption:

00:50:58:[ 4422.775971] WARNING: at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
00:50:58:[ 4422.775971] list_del corruption. prev->next should be ffffc900031a3010, but was           (null)
00:50:58:[ 4422.795082] CPU: 0 PID: 32 Comm: kswapd0 Tainted: P           OE  ------------   3.10.0-514.2.2.el7_lustre.x86_64 #1
00:50:58:[ 4422.815142] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
00:50:58:[ 4422.815142] Call Trace:
00:50:58:[ 4422.815142]  [<ffffffff81686318>] dump_stack+0x19/0x1b
00:50:58:[ 4422.815142]  [<ffffffff81085940>] warn_slowpath_common+0x70/0xb0
00:50:58:[ 4422.815142]  [<ffffffff810859dc>] warn_slowpath_fmt+0x5c/0x80
00:50:58:[ 4422.815142]  [<ffffffff81333301>] __list_del_entry+0xa1/0xd0
00:50:58:[ 4422.815142]  [<ffffffff8133333d>] list_del+0xd/0x30
00:50:58:[ 4422.815142]  [<ffffffffa065cf1d>] __spl_cache_flush+0xed/0x150 [spl]
00:50:58:[ 4422.815142]  [<ffffffffa065d046>] spl_cache_flush+0x36/0x50 [spl]
00:50:58:[ 4422.815142]  [<ffffffffa065d71f>] spl_kmem_cache_reap_now+0x10f/0x120 [spl]
00:50:58:[ 4422.815142]  [<ffffffffa070b3c9>] arc_kmem_reap_now+0x79/0xe0 [zfs]
00:50:58:[ 4422.815142]  [<ffffffffa0710bb7>] arc_shrinker_func+0x97/0x130 [zfs]
00:50:58:[ 4422.815142]  [<ffffffff81194213>] shrink_slab+0x163/0x330
00:50:58:[ 4422.815142]  [<ffffffff811f5361>] ? vmpressure+0x21/0x90
00:50:58:[ 4422.815142]  [<ffffffff81198001>] balance_pgdat+0x4b1/0x5e0
00:50:58:[ 4422.815142]  [<ffffffff811982a3>] kswapd+0x173/0x450

Please provide additional information about the failure here.

Info required for matching: sanity 253
Info required for matching: sanity 255



 Comments   
Comment by Joseph Gmitter (Inactive) [ 14/Feb/17 ]

Hi Alex,

Can you please look into this one?

Thanks.
Joe

Comment by Andreas Dilger [ 14/Feb/17 ]

Alex, can you please look into this. It needs to be tracked back to when it started happening, whether it is one of your recent patch landings, or related to the update to a newer ZFS release, or is some kind of random memory corruption.

Comment by Alex Zhuravlev [ 17/Apr/17 ]

this is a duplicate of LU-9110

Generated at Sat Feb 10 02:23:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.