[LU-7237] sanity test_244: IP: [<ffffffffa019fb5f>] __spl_cache_flush+0x7f/0x160 [spl] Created: 30/Sep/15  Updated: 28/Feb/20  Resolved: 28/Feb/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/61fc9b0e-6754-11e5-8f60-5254006e85c2.

The sub-test test_244 failed with the following error:

test failed to respond and timed out

OSS paniced. looks like it happened in spl code, so it's almost certainly related to spl/zfs. maloo search shows only a couple of instances. from console log of OSS:

04:07:28:Lustre: DEBUG MARKER: == sanity test 244: sendfile with group lock tests == 04:07:01 (1443586021)
04:07:28:BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
04:07:28:IP: [<ffffffffa019fb5f>] __spl_cache_flush+0x7f/0x160 [spl]
04:07:28:PGD 0 
04:07:28:Oops: 0000 [#1] SMP 
04:07:28:last sysfs file: /sys/devices/system/cpu/online
04:07:28:CPU 0 
04:07:28:Modules linked in: osp(U) ofd(U) lfsck(U) ost(U) mgc(U) osd_zfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic libcfs(U) nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 zfs(P)(U) zcommon(P)(U) znvpair(P)(U) spl(U) zlib_deflate zavl(P)(U) zunicode(P)(U) microcode serio_raw virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
04:07:28:
04:07:28:Pid: 3983, comm: txg_sync Tainted: P           -- ------------    2.6.32-573.3.1.el6_lustre.g00880a0.x86_64 #1 Red Hat KVM
04:07:28:RIP: 0010:[<ffffffffa019fb5f>]  [<ffffffffa019fb5f>] __spl_cache_flush+0x7f/0x160 [spl]
04:07:28:RSP: 0018:ffff88006fb2f440  EFLAGS: 00010086
04:07:28:RAX: 00000000fffe0000 RBX: 0000000000000000 RCX: 000000000003f7ff
04:07:28:RDX: 0000000000000008 RSI: ffff88003794d080 RDI: ffffc9000c15f018
04:07:28:RBP: ffff88006fb2f490 R08: 00000000000006c3 R09: 073ef094fe643515
04:07:28:R10: 0000000000000000 R11: 0000000000000000 R12: ffff880079580000
04:07:28:R13: 0000000000000000 R14: ffff88003794d080 R15: 0000000000000008
04:07:28:FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
04:07:28:CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
04:07:28:CR2: 0000000000000020 CR3: 000000007b72a000 CR4: 00000000000006f0
04:07:28:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
04:07:28:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
04:07:28:Process txg_sync (pid: 3983, threadinfo ffff88006fb2c000, task ffff88006ea82040)
04:07:28:Stack:
04:07:28: ffff88007d50d300 ffff880002215a68 ffff880079588098 ffff88003794d080
04:07:28:<d> ffff8800707caab0 ffff8800795880b0 ffff880079580000 ffff88003794d080
04:07:28:<d> 0000000000000008 0000000000000000 ffff88006fb2f4c0 ffffffffa019fc86
04:07:28:Call Trace:
04:07:28: [<ffffffffa019fc86>] spl_cache_flush+0x46/0x70 [spl]
04:07:28: [<ffffffffa01a1785>] spl_kmem_cache_free+0x1f5/0x210 [spl]
04:07:28: [<ffffffffa02a56a4>] zio_data_buf_free+0x24/0x30 [zfs]
04:07:28: [<ffffffffa01f7de2>] arc_buf_data_free+0x22/0x50 [zfs]
04:07:28: [<ffffffffa01f83a5>] arc_buf_destroy+0x145/0x1c0 [zfs]
04:07:28: [<ffffffffa01f8cd7>] arc_hdr_destroy+0x277/0x310 [zfs]
04:07:28: [<ffffffffa01f871c>] ? arc_change_state+0x22c/0x350 [zfs]
04:07:28: [<ffffffffa01f934b>] arc_buf_free+0x1bb/0x1f0 [zfs]
04:07:28: [<ffffffffa01f94e4>] arc_buf_remove_ref+0x164/0x170 [zfs]
04:07:28: [<ffffffffa01fa894>] arc_freed+0xe4/0xf0 [zfs]
04:07:28: [<ffffffffa02a8670>] zio_free_sync+0x50/0x150 [zfs]
04:07:28: [<ffffffffa02a8f43>] zio_free+0xd3/0x130 [zfs]
04:07:28: [<ffffffffa0239f01>] dsl_free+0x11/0x20 [zfs]
04:07:28: [<ffffffffa022acb1>] dsl_dataset_block_kill+0x241/0x440 [zfs]
04:07:28: [<ffffffffa0222a7c>] free_blocks+0xcc/0x190 [zfs]
04:07:28: [<ffffffffa02232a0>] free_children+0x270/0x370 [zfs]
04:07:28: [<ffffffffa0206030>] ? dbuf_hold_impl+0x90/0xb0 [zfs]
04:07:28: [<ffffffffa0223171>] free_children+0x141/0x370 [zfs]
04:07:28: [<ffffffffa02ab440>] ? zio_execute+0x0/0x180 [zfs]
04:07:28: [<ffffffffa02234eb>] dnode_sync_free_range+0x14b/0x340 [zfs]
04:07:28: [<ffffffffa02233a0>] ? dnode_sync_free_range+0x0/0x340 [zfs]
04:07:28: [<ffffffffa0242478>] range_tree_vacate+0x58/0xa0 [zfs]
04:07:28: [<ffffffffa022394a>] dnode_sync+0x26a/0x940 [zfs]
04:07:28: [<ffffffffa0207749>] ? dbuf_sync_list+0x59/0x80 [zfs]
04:07:28: [<ffffffffa0211fe9>] dmu_objset_sync_dnodes+0x89/0xb0 [zfs]
04:07:28: [<ffffffffa02121bb>] dmu_objset_sync+0x1ab/0x2e0 [zfs]
04:07:28: [<ffffffffa0210a40>] ? dmu_objset_write_ready+0x0/0x70 [zfs]
04:07:28: [<ffffffffa02122f0>] ? dmu_objset_write_done+0x0/0x70 [zfs]
04:07:28: [<ffffffffa022549c>] dsl_dataset_sync+0x4c/0x60 [zfs]
04:07:28: [<ffffffffa0232a6a>] dsl_pool_sync+0x9a/0x430 [zfs]
04:07:28: [<ffffffffa0248943>] spa_sync+0x443/0xb90 [zfs]
04:07:28: [<ffffffff81059939>] ? __wake_up_common+0x59/0x90
04:07:28: [<ffffffffa025e079>] txg_sync_thread+0x389/0x5f0 [zfs]
04:07:28: [<ffffffffa025dcf0>] ? txg_sync_thread+0x0/[zfs]
04:07:28: [<ffffffffa01a1fb8>] thread_generic_wrapper+0x68/0x80 [spl]
04:07:28: [<ffffffffa01a1f50>] ? thread_generic_wrapper+0x0/0x80 [spl]
04:07:28: [<ffffffff810a101e>] kthread+0x9e/0xc0
04:07:28: [<ffffffff8100c28a>] child_rip+0xa/0x20
04:07:28: [<ffffffff810a0f80>] ? kthread+0x0/0xc0
04:07:28: [<ffffffff8100c280>] ? child_rip+0x0/0x20
04:07:28:Code: 8e 96 00 00 00 41 8b bc 24 54 80 00 00 41 8b 84 24 50 80 00 00 f7 df f7 d8 21 c7 f7 df 89 ff 49 03 7e 28 48 8b 5f 10 48 83 c7 18 <48> 8b 53 20 48 8d 73 20 e8 74 45 10 e1 48 8b 05 4d fd a7 e1 83 
04:07:28:RIP  [<ffffffffa019fb5f>] __spl_cache_flush+0x7f/0x160 [spl]
04:07:28: RSP <ffff88006fb2f440>
04:07:28:CR2: 0000000000000020

Info required for matching: sanity 244



 Comments   
Comment by Andreas Dilger [ 01/Oct/15 ]

I saw another sanity test_244 failure on ZFS. Not exactly the same, but very similar:
https://testing.hpdd.intel.com/test_sets/a0d9dfb6-5ed1-11e5-bb07-5254006e85c2

08:53:26:kernel BUG at mm/vmalloc.c:1501!
08:53:26:invalid opcode: 0000 [#1] SMP 
08:53:26:Pid: 31862, comm: ll_ost00_002 Tainted: P           ---------------    2.6.32-504.30.3.el6_lustre.g1e202d9.x86_64 #1 Red Hat KVM
08:53:26:RIP: 0010:[<ffffffff8115cf9c>]  [<ffffffff8115cf9c>] __vunmap+0xec/0x120
08:53:26:Process ll_ost00_002 (pid: 31862, threadinfo ffff88006f05c000, task ffff88006de96ab0)
08:53:26:Call Trace:
08:53:26: [<ffffffff8115d056>] vfree+0x36/0x80
08:53:26: [<ffffffffa0197d15>] kv_free+0x65/0x70 [spl]
08:53:26: [<ffffffffa01993c1>] spl_slab_reclaim+0x1c1/0x200 [spl]
08:53:26: [<ffffffffa0199678>] spl_kmem_cache_free+0xe8/0x210 [spl]
08:53:26: [<ffffffffa029d6a4>] zio_data_buf_free+0x24/0x30 [zfs]
08:53:26: [<ffffffffa01efde2>] arc_buf_data_free+0x22/0x50 [zfs]
08:53:26: [<ffffffffa01f03a5>] arc_buf_destroy+0x145/0x1c0 [zfs]
08:53:26: [<ffffffffa01f0cd7>] arc_hdr_destroy+0x277/0x310 [zfs]
08:53:26: [<ffffffffa01f134b>] arc_buf_free+0x1bb/0x1f0 [zfs]
08:53:26: [<ffffffffa01f14e4>] arc_buf_remove_ref+0x164/0x170 [zfs]
08:53:26: [<ffffffffa01fcc58>] dbuf_free_range+0x3f8/0x5a0 [zfs]
08:53:26: [<ffffffffa021a78a>] dnode_free_range+0x45a/0x680 [zfs]
08:53:26: [<ffffffffa020596f>] dmu_free_long_range+0x1af/0x230 [zfs]
08:53:26: [<ffffffffa0f766e8>] osd_unlinked_object_free+0x38/0x2a0 [osd_zfs]
08:53:26: [<ffffffffa0f769ac>] osd_unlinked_list_emptify+0x5c/0xb0 [osd_zfs]
08:53:26: [<ffffffffa0f78ec9>] osd_trans_stop+0x3c9/0x610 [osd_zfs]
08:53:26: [<ffffffffa10cea1f>] ofd_trans_stop+0x1f/0x60 [ofd]
08:53:26: [<ffffffffa10d0df1>] ofd_object_destroy+0x2d1/0x8e0 [ofd]
08:53:26: [<ffffffffa10cad9d>] ofd_destroy_by_fid+0x35d/0x620 [ofd]
08:53:26: [<ffffffffa10c465a>] ofd_destroy_hdl+0x2fa/0xb60 [ofd]
08:53:26: [<ffffffffa0af2a8c>] tgt_request_handle+0x8bc/0x12e0 [ptlrpc]
08:53:26: [<ffffffffa0a9a7e1>] ptlrpc_main+0xe41/0x1910 [ptlrpc]
08:53:26: [<ffffffff8109e78e>] kthread+0x9e/0xc0

This was with a PPC client, but that shouldn't be the cause of a server-side oops.

Comment by Andreas Dilger [ 28/Feb/20 ]

Close old bug that hasn't been seen in a long time.

Generated at Sat Feb 10 02:07:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.