Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
Lustre 2.8.0
-
None
-
Lustre Master Build 3450: ZFS 0.6.5.7-1, kernel 3.10.0-327.36.1.el7_lustre.x86_64
All storage devices are NVMe
-
3
-
9223372036854775807
Description
Under obdfilter-surver system kernel panics (immediately) with SPL error; see vmcore below:
Attachments
Activity
I don't I'm afraid - this system has since had its base OS reprovisioned.
Adam,
Do you have the full console log or messages file from the machine?
Normal workloads (IOR etc) when mounted are fine. I have only observed the issue when using obdfilter-survey.
*zpool configuration & format:*
[root@zlfs2-oss1 ~]# zpool create OST0000 -o ashift=12 -O recordsize=1M -f /dev/nvme0n1 [root@zlfs2-oss1 ~]# mkfs.lustre --reformat --ost --backfstype=zfs --fsname=cam --mgsnode=192.168.5.21@o2ib0 --index=0 OST0000/OST0000 Permanent disk data: Target: cam:OST0000 Index: 0 Lustre FS: cam Mount type: zfs Flags: 0x62 (OST first_time update ) Persistent mount opts: Parameters: mgsnode=192.168.5.21@o2ib mkfs_cmd = zfs create -o canmount=off -o xattr=sa OST0000/OST0000 Writing OST0000/OST0000 properties lustre:version=1 lustre:flags=98 lustre:index=0 lustre:fsname=cam lustre:svname=cam:OST0000 lustre:mgsnode=192.168.5.21@o2ib [root@zlfs2-oss1 ~]# mount -vvv -t lustre OST0000/OST0000 /mnt/OST0000 arg[0] = /sbin/mount.lustre arg[1] = -v arg[2] = -o arg[3] = rw arg[4] = OST0000/OST0000 arg[5] = /mnt/OST0000 source = OST0000/OST0000 (OST0000/OST0000), target = /mnt/OST0000 options = rw checking for existing Lustre data: found Writing OST0000/OST0000 properties lustre:version=1 lustre:flags=34 lustre:index=0 lustre:fsname=cam lustre:svname=cam:OST0000 lustre:mgsnode=192.168.5.21@o2ib mounting device OST0000/OST0000 at /mnt/OST0000, flags=0x1000000 options=osd=osd-zfs,,mgsnode=192.168.5.21@o2ib,virgin,update,param=mgsnode=192.168.5.21@o2ib,svname=cam-OST0000,device=OST0000/OST0000 lustre:svname=cam-OST0000
vmcore
[ 662.214980] Modules linked in: obdecho(OE) osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) xprtrdma ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs(OE) ib_umad rdma_cm ib_cm iw_cm ib_sa intel_powerclamp coretemp intel_rapl kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper iTCO_wdt iTCO_vendor_support hfi1(OE) mxm_wmi cryptd ioatdma pcspkr ib_mad ib_core ib_addr sb_edac edac_core i2c_i801 mei_me shpchp mei sg lpc_ich mfd_core ipmi_devintf ipmi_si ipmi_msghandler acpi_power_meter acpi_pad wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c raid1 sd_mod [ 662.215589] crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper nvme ttm ixgbe ahci drm mdio libahci ptp i2c_core libata pps_core dca dm_mirror dm_region_hash dm_log dm_mod zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate [ 662.215842] CPU: 29 PID: 7653 Comm: lctl Tainted: P OE ------------ 3.10.0-327.36.1.el7_lustre.x86_64 #1 [ 662.215911] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0018.072020161249 07/20/2016 [ 662.215978] task: ffff882021f71700 ti: ffff881ff3c98000 task.ti: ffff881ff3c98000 [ 662.216028] RIP: 0010:[<ffffffff811c1545>] [<ffffffff811c1545>] kmem_cache_alloc+0x75/0x1d0 [ 662.216092] RSP: 0018:ffff881ff3c9b9d0 EFLAGS: 00010286 [ 662.216129] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000003f8b [ 662.216176] RDX: 0000000000003f8a RSI: 00000000000042f0 RDI: ffff88103e007a00 [ 662.216222] RBP: ffff881ff3c9ba00 R08: 0000000000019580 R09: ffffffffa000b939 [ 662.216268] R10: 00000000000008df R11: 0000000180000000 R12: 00000000047546c0 [ 662.216314] R13: 00000000000042f0 R14: ffff88103e007a00 R15: ffff88103e007a00 [ 662.216361] FS: 00007f3e446be740(0000) GS:ffff88103e9c0000(0000) knlGS:0000000000000000 [ 662.216414] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 662.216452] CR2: 00000000047546c0 CR3: 0000001ff3c8d000 CR4: 00000000001407e0 [ 662.216498] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 662.216544] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 662.216590] Stack: [ 662.216607] ffffffffa000b939 0000000000000000 0000000000000004 ffff88103e007a00 [ 662.216667] ffff8820219bba00 0000000000000000 ffff881ff3c9ba48 ffffffffa000b939 [ 662.216730] 00000004f3c9ba48 ffff882021f71700 ffff881003c80a40 ffff882026f6e000 [ 662.216795] Call Trace: [ 662.216839] [<ffffffffa000b939>] ? spl_kmem_cache_alloc+0x99/0x150 [spl] [ 662.216895] [<ffffffffa000b939>] spl_kmem_cache_alloc+0x99/0x150 [spl] [ 662.216999] [<ffffffffa00bbc70>] arc_buf_alloc+0x90/0x190 [zfs] [ 662.217067] [<ffffffffa00bbd8b>] arc_loan_buf+0x1b/0x30 [zfs] [ 662.217142] [<ffffffffa00cba09>] dmu_request_arcbuf+0x19/0x20 [zfs] [ 662.217204] [<ffffffffa1160901>] osd_bufs_get+0x591/0xba0 [osd_zfs] [ 662.217249] [<ffffffff811c25d3>] ? __kmalloc+0x1f3/0x230 [ 662.217313] [<ffffffffa125b78d>] ofd_preprw_write.isra.29+0x1bd/0xcd0 [ofd] [ 662.217369] [<ffffffffa125ca8a>] ofd_preprw+0x7ea/0x10c0 [ofd] [ 662.217421] [<ffffffffa1119894>] echo_client_prep_commit.isra.49+0x334/0xc30 [obdecho] [ 662.217549] [<ffffffffa0ce131f>] ? lu_object_find_slice+0x1f/0x90 [obdclass] [ 662.219311] [<ffffffffa11230af>] echo_client_iocontrol+0x9bf/0x1c40 [obdecho] [ 662.221071] [<ffffffff811c25d3>] ? __kmalloc+0x1f3/0x230 [ 662.222855] [<ffffffffa0caa15e>] class_handle_ioctl+0x19de/0x2150 [obdclass] [ 662.224622] [<ffffffff81197580>] ? handle_mm_fault+0x5e0/0xf80 [ 662.226363] [<ffffffff81285868>] ? security_capable+0x18/0x20 [ 662.228099] [<ffffffffa0c8e712>] obd_class_ioctl+0xd2/0x170 [obdclass] [ 662.229809] [<ffffffff811f2665>] do_vfs_ioctl+0x2e5/0x4c0 [ 662.231518] [<ffffffff8164215d>] ? __do_page_fault+0x16d/0x450 [ 662.233191] [<ffffffff811f28e1>] SyS_ioctl+0xa1/0xc0 [ 662.234815] [<ffffffff81646c49>] system_call_fastpath+0x16/0x1b [ 662.236410] Code: ce 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85 e4 0f 84 1f 01 00 00 48 85 c0 0f 84 16 01 00 00 49 63 46 20 48 8d 4a 01 4d 8b 06 <49> 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63 [ 662.239730] RIP [<ffffffff811c1545>] kmem_cache_alloc+0x75/0x1d0 [ 662.241291] RSP <ffff881ff3c9b9d0> [ 662.242792] CR2: 00000000047546c0
zfs.conf
options zfs metaslab_debug_unload=1 options zfs zfs_vdev_scheduler=deadline options zfs zfs_arc_max=103079215104 options zfs zfs_dirty_data_max=4294967296 options zfs zfs_vdev_async_write_active_min_dirty_percent=20 options zfs zfs_vdev_async_write_min_active=5 options zfs zfs_vdev_async_write_max_active=10 options zfs zfs_vdev_sync_read_min_active=16 options zfs zfs_vdev_sync_read_max_active=16
If this recurs, please capture full console log and this bug can be reopened.