[LU-8644] Kernel Panic with obdfilter-survey: spl_kmem_cache_alloc+0x99/0x150 Created: 27/Sep/16 Updated: 14/Oct/16 Resolved: 14/Oct/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Adam Roe (Inactive) | Assignee: | Nathaniel Clark |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre Master Build 3450: ZFS 0.6.5.7-1, kernel 3.10.0-327.36.1.el7_lustre.x86_64 All storage devices are NVMe |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Under obdfilter-surver system kernel panics (immediately) with SPL error; see vmcore below: |
| Comments |
| Comment by Adam Roe (Inactive) [ 27/Sep/16 ] |
|
*zpool configuration & format:* [root@zlfs2-oss1 ~]# zpool create OST0000 -o ashift=12 -O recordsize=1M -f /dev/nvme0n1
[root@zlfs2-oss1 ~]# mkfs.lustre --reformat --ost --backfstype=zfs --fsname=cam --mgsnode=192.168.5.21@o2ib0 --index=0 OST0000/OST0000
Permanent disk data:
Target: cam:OST0000
Index: 0
Lustre FS: cam
Mount type: zfs
Flags: 0x62
(OST first_time update )
Persistent mount opts:
Parameters: mgsnode=192.168.5.21@o2ib
mkfs_cmd = zfs create -o canmount=off -o xattr=sa OST0000/OST0000
Writing OST0000/OST0000 properties
lustre:version=1
lustre:flags=98
lustre:index=0
lustre:fsname=cam
lustre:svname=cam:OST0000
lustre:mgsnode=192.168.5.21@o2ib
[root@zlfs2-oss1 ~]# mount -vvv -t lustre OST0000/OST0000 /mnt/OST0000
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw
arg[4] = OST0000/OST0000
arg[5] = /mnt/OST0000
source = OST0000/OST0000 (OST0000/OST0000), target = /mnt/OST0000
options = rw
checking for existing Lustre data: found
Writing OST0000/OST0000 properties
lustre:version=1
lustre:flags=34
lustre:index=0
lustre:fsname=cam
lustre:svname=cam:OST0000
lustre:mgsnode=192.168.5.21@o2ib
mounting device OST0000/OST0000 at /mnt/OST0000, flags=0x1000000 options=osd=osd-zfs,,mgsnode=192.168.5.21@o2ib,virgin,update,param=mgsnode=192.168.5.21@o2ib,svname=cam-OST0000,device=OST0000/OST0000
lustre:svname=cam-OST0000
vmcore [ 662.214980] Modules linked in: obdecho(OE) osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) xprtrdma ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs(OE) ib_umad rdma_cm ib_cm iw_cm ib_sa intel_powerclamp coretemp intel_rapl kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper iTCO_wdt iTCO_vendor_support hfi1(OE) mxm_wmi cryptd ioatdma pcspkr ib_mad ib_core ib_addr sb_edac edac_core i2c_i801 mei_me shpchp mei sg lpc_ich mfd_core ipmi_devintf ipmi_si ipmi_msghandler acpi_power_meter acpi_pad wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c raid1 sd_mod [ 662.215589] crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper nvme ttm ixgbe ahci drm mdio libahci ptp i2c_core libata pps_core dca dm_mirror dm_region_hash dm_log dm_mod zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate [ 662.215842] CPU: 29 PID: 7653 Comm: lctl Tainted: P OE ------------ 3.10.0-327.36.1.el7_lustre.x86_64 #1 [ 662.215911] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0018.072020161249 07/20/2016 [ 662.215978] task: ffff882021f71700 ti: ffff881ff3c98000 task.ti: ffff881ff3c98000 [ 662.216028] RIP: 0010:[<ffffffff811c1545>] [<ffffffff811c1545>] kmem_cache_alloc+0x75/0x1d0 [ 662.216092] RSP: 0018:ffff881ff3c9b9d0 EFLAGS: 00010286 [ 662.216129] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000003f8b [ 662.216176] RDX: 0000000000003f8a RSI: 00000000000042f0 RDI: ffff88103e007a00 [ 662.216222] RBP: ffff881ff3c9ba00 R08: 0000000000019580 R09: ffffffffa000b939 [ 662.216268] R10: 00000000000008df R11: 0000000180000000 R12: 00000000047546c0 [ 662.216314] R13: 00000000000042f0 R14: ffff88103e007a00 R15: ffff88103e007a00 [ 662.216361] FS: 00007f3e446be740(0000) GS:ffff88103e9c0000(0000) knlGS:0000000000000000 [ 662.216414] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 662.216452] CR2: 00000000047546c0 CR3: 0000001ff3c8d000 CR4: 00000000001407e0 [ 662.216498] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 662.216544] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 662.216590] Stack: [ 662.216607] ffffffffa000b939 0000000000000000 0000000000000004 ffff88103e007a00 [ 662.216667] ffff8820219bba00 0000000000000000 ffff881ff3c9ba48 ffffffffa000b939 [ 662.216730] 00000004f3c9ba48 ffff882021f71700 ffff881003c80a40 ffff882026f6e000 [ 662.216795] Call Trace: [ 662.216839] [<ffffffffa000b939>] ? spl_kmem_cache_alloc+0x99/0x150 [spl] [ 662.216895] [<ffffffffa000b939>] spl_kmem_cache_alloc+0x99/0x150 [spl] [ 662.216999] [<ffffffffa00bbc70>] arc_buf_alloc+0x90/0x190 [zfs] [ 662.217067] [<ffffffffa00bbd8b>] arc_loan_buf+0x1b/0x30 [zfs] [ 662.217142] [<ffffffffa00cba09>] dmu_request_arcbuf+0x19/0x20 [zfs] [ 662.217204] [<ffffffffa1160901>] osd_bufs_get+0x591/0xba0 [osd_zfs] [ 662.217249] [<ffffffff811c25d3>] ? __kmalloc+0x1f3/0x230 [ 662.217313] [<ffffffffa125b78d>] ofd_preprw_write.isra.29+0x1bd/0xcd0 [ofd] [ 662.217369] [<ffffffffa125ca8a>] ofd_preprw+0x7ea/0x10c0 [ofd] [ 662.217421] [<ffffffffa1119894>] echo_client_prep_commit.isra.49+0x334/0xc30 [obdecho] [ 662.217549] [<ffffffffa0ce131f>] ? lu_object_find_slice+0x1f/0x90 [obdclass] [ 662.219311] [<ffffffffa11230af>] echo_client_iocontrol+0x9bf/0x1c40 [obdecho] [ 662.221071] [<ffffffff811c25d3>] ? __kmalloc+0x1f3/0x230 [ 662.222855] [<ffffffffa0caa15e>] class_handle_ioctl+0x19de/0x2150 [obdclass] [ 662.224622] [<ffffffff81197580>] ? handle_mm_fault+0x5e0/0xf80 [ 662.226363] [<ffffffff81285868>] ? security_capable+0x18/0x20 [ 662.228099] [<ffffffffa0c8e712>] obd_class_ioctl+0xd2/0x170 [obdclass] [ 662.229809] [<ffffffff811f2665>] do_vfs_ioctl+0x2e5/0x4c0 [ 662.231518] [<ffffffff8164215d>] ? __do_page_fault+0x16d/0x450 [ 662.233191] [<ffffffff811f28e1>] SyS_ioctl+0xa1/0xc0 [ 662.234815] [<ffffffff81646c49>] system_call_fastpath+0x16/0x1b [ 662.236410] Code: ce 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85 e4 0f 84 1f 01 00 00 48 85 c0 0f 84 16 01 00 00 49 63 46 20 48 8d 4a 01 4d 8b 06 <49> 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63 [ 662.239730] RIP [<ffffffff811c1545>] kmem_cache_alloc+0x75/0x1d0 [ 662.241291] RSP <ffff881ff3c9b9d0> [ 662.242792] CR2: 00000000047546c0 zfs.conf options zfs metaslab_debug_unload=1 options zfs zfs_vdev_scheduler=deadline options zfs zfs_arc_max=103079215104 options zfs zfs_dirty_data_max=4294967296 options zfs zfs_vdev_async_write_active_min_dirty_percent=20 options zfs zfs_vdev_async_write_min_active=5 options zfs zfs_vdev_async_write_max_active=10 options zfs zfs_vdev_sync_read_min_active=16 options zfs zfs_vdev_sync_read_max_active=16 |
| Comment by Adam Roe (Inactive) [ 27/Sep/16 ] |
|
Normal workloads (IOR etc) when mounted are fine. I have only observed the issue when using obdfilter-survey. |
| Comment by Nathaniel Clark [ 06/Oct/16 ] |
|
Adam, |
| Comment by Adam Roe (Inactive) [ 07/Oct/16 ] |
|
I don't I'm afraid - this system has since had its base OS reprovisioned. |
| Comment by Nathaniel Clark [ 14/Oct/16 ] |
|
If this recurs, please capture full console log and this bug can be reopened. |