[LU-8644] Kernel Panic with obdfilter-survey: spl_kmem_cache_alloc+0x99/0x150 Created: 27/Sep/16  Updated: 14/Oct/16  Resolved: 14/Oct/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Major
Reporter: Adam Roe (Inactive) Assignee: Nathaniel Clark
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Lustre Master Build 3450: ZFS 0.6.5.7-1, kernel 3.10.0-327.36.1.el7_lustre.x86_64

All storage devices are NVMe


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Under obdfilter-surver system kernel panics (immediately) with SPL error; see vmcore below:



 Comments   
Comment by Adam Roe (Inactive) [ 27/Sep/16 ]

*zpool configuration & format:*

[root@zlfs2-oss1 ~]# zpool create OST0000 -o ashift=12 -O recordsize=1M -f /dev/nvme0n1
[root@zlfs2-oss1 ~]# mkfs.lustre --reformat --ost --backfstype=zfs --fsname=cam --mgsnode=192.168.5.21@o2ib0 --index=0 OST0000/OST0000

   Permanent disk data:
Target:     cam:OST0000
Index:      0
Lustre FS:  cam
Mount type: zfs
Flags:      0x62
              (OST first_time update )
Persistent mount opts:
Parameters: mgsnode=192.168.5.21@o2ib

mkfs_cmd = zfs create -o canmount=off -o xattr=sa OST0000/OST0000
Writing OST0000/OST0000 properties
  lustre:version=1
  lustre:flags=98
  lustre:index=0
  lustre:fsname=cam
  lustre:svname=cam:OST0000
  lustre:mgsnode=192.168.5.21@o2ib
[root@zlfs2-oss1 ~]# mount -vvv -t lustre OST0000/OST0000 /mnt/OST0000
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw
arg[4] = OST0000/OST0000
arg[5] = /mnt/OST0000
source = OST0000/OST0000 (OST0000/OST0000), target = /mnt/OST0000
options = rw
checking for existing Lustre data: found
Writing OST0000/OST0000 properties
  lustre:version=1
  lustre:flags=34
  lustre:index=0
  lustre:fsname=cam
  lustre:svname=cam:OST0000
  lustre:mgsnode=192.168.5.21@o2ib
mounting device OST0000/OST0000 at /mnt/OST0000, flags=0x1000000 options=osd=osd-zfs,,mgsnode=192.168.5.21@o2ib,virgin,update,param=mgsnode=192.168.5.21@o2ib,svname=cam-OST0000,device=OST0000/OST0000
  lustre:svname=cam-OST0000

vmcore

[  662.214980] Modules linked in: obdecho(OE) osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) xprtrdma ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs(OE) ib_umad rdma_cm ib_cm iw_cm ib_sa intel_powerclamp coretemp intel_rapl kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper iTCO_wdt iTCO_vendor_support hfi1(OE) mxm_wmi cryptd ioatdma pcspkr ib_mad ib_core ib_addr sb_edac edac_core i2c_i801 mei_me shpchp mei sg lpc_ich mfd_core ipmi_devintf ipmi_si ipmi_msghandler acpi_power_meter acpi_pad wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c raid1 sd_mod
[  662.215589]  crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper nvme ttm ixgbe ahci drm mdio libahci ptp i2c_core libata pps_core dca dm_mirror dm_region_hash dm_log dm_mod zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate
[  662.215842] CPU: 29 PID: 7653 Comm: lctl Tainted: P           OE  ------------   3.10.0-327.36.1.el7_lustre.x86_64 #1
[  662.215911] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0018.072020161249 07/20/2016
[  662.215978] task: ffff882021f71700 ti: ffff881ff3c98000 task.ti: ffff881ff3c98000
[  662.216028] RIP: 0010:[<ffffffff811c1545>]  [<ffffffff811c1545>] kmem_cache_alloc+0x75/0x1d0
[  662.216092] RSP: 0018:ffff881ff3c9b9d0  EFLAGS: 00010286
[  662.216129] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000003f8b
[  662.216176] RDX: 0000000000003f8a RSI: 00000000000042f0 RDI: ffff88103e007a00
[  662.216222] RBP: ffff881ff3c9ba00 R08: 0000000000019580 R09: ffffffffa000b939
[  662.216268] R10: 00000000000008df R11: 0000000180000000 R12: 00000000047546c0
[  662.216314] R13: 00000000000042f0 R14: ffff88103e007a00 R15: ffff88103e007a00
[  662.216361] FS:  00007f3e446be740(0000) GS:ffff88103e9c0000(0000) knlGS:0000000000000000
[  662.216414] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  662.216452] CR2: 00000000047546c0 CR3: 0000001ff3c8d000 CR4: 00000000001407e0
[  662.216498] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  662.216544] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  662.216590] Stack:
[  662.216607]  ffffffffa000b939 0000000000000000 0000000000000004 ffff88103e007a00
[  662.216667]  ffff8820219bba00 0000000000000000 ffff881ff3c9ba48 ffffffffa000b939
[  662.216730]  00000004f3c9ba48 ffff882021f71700 ffff881003c80a40 ffff882026f6e000
[  662.216795] Call Trace:
[  662.216839]  [<ffffffffa000b939>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
[  662.216895]  [<ffffffffa000b939>] spl_kmem_cache_alloc+0x99/0x150 [spl]
[  662.216999]  [<ffffffffa00bbc70>] arc_buf_alloc+0x90/0x190 [zfs]
[  662.217067]  [<ffffffffa00bbd8b>] arc_loan_buf+0x1b/0x30 [zfs]
[  662.217142]  [<ffffffffa00cba09>] dmu_request_arcbuf+0x19/0x20 [zfs]
[  662.217204]  [<ffffffffa1160901>] osd_bufs_get+0x591/0xba0 [osd_zfs]
[  662.217249]  [<ffffffff811c25d3>] ? __kmalloc+0x1f3/0x230
[  662.217313]  [<ffffffffa125b78d>] ofd_preprw_write.isra.29+0x1bd/0xcd0 [ofd]
[  662.217369]  [<ffffffffa125ca8a>] ofd_preprw+0x7ea/0x10c0 [ofd]
[  662.217421]  [<ffffffffa1119894>] echo_client_prep_commit.isra.49+0x334/0xc30 [obdecho]
[  662.217549]  [<ffffffffa0ce131f>] ? lu_object_find_slice+0x1f/0x90 [obdclass]
[  662.219311]  [<ffffffffa11230af>] echo_client_iocontrol+0x9bf/0x1c40 [obdecho]
[  662.221071]  [<ffffffff811c25d3>] ? __kmalloc+0x1f3/0x230
[  662.222855]  [<ffffffffa0caa15e>] class_handle_ioctl+0x19de/0x2150 [obdclass]
[  662.224622]  [<ffffffff81197580>] ? handle_mm_fault+0x5e0/0xf80
[  662.226363]  [<ffffffff81285868>] ? security_capable+0x18/0x20
[  662.228099]  [<ffffffffa0c8e712>] obd_class_ioctl+0xd2/0x170 [obdclass]
[  662.229809]  [<ffffffff811f2665>] do_vfs_ioctl+0x2e5/0x4c0
[  662.231518]  [<ffffffff8164215d>] ? __do_page_fault+0x16d/0x450
[  662.233191]  [<ffffffff811f28e1>] SyS_ioctl+0xa1/0xc0
[  662.234815]  [<ffffffff81646c49>] system_call_fastpath+0x16/0x1b
[  662.236410] Code: ce 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85 e4 0f 84 1f 01 00 00 48 85 c0 0f 84 16 01 00 00 49 63 46 20 48 8d 4a 01 4d 8b 06 <49> 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63
[  662.239730] RIP  [<ffffffff811c1545>] kmem_cache_alloc+0x75/0x1d0
[  662.241291]  RSP <ffff881ff3c9b9d0>
[  662.242792] CR2: 00000000047546c0

zfs.conf

options zfs metaslab_debug_unload=1
options zfs zfs_vdev_scheduler=deadline
options zfs zfs_arc_max=103079215104
options zfs zfs_dirty_data_max=4294967296
options zfs zfs_vdev_async_write_active_min_dirty_percent=20
options zfs zfs_vdev_async_write_min_active=5
options zfs zfs_vdev_async_write_max_active=10
options zfs zfs_vdev_sync_read_min_active=16
options zfs zfs_vdev_sync_read_max_active=16
Comment by Adam Roe (Inactive) [ 27/Sep/16 ]

Normal workloads (IOR etc) when mounted are fine. I have only observed the issue when using obdfilter-survey.

Comment by Nathaniel Clark [ 06/Oct/16 ]

Adam,
Do you have the full console log or messages file from the machine?

Comment by Adam Roe (Inactive) [ 07/Oct/16 ]

I don't I'm afraid - this system has since had its base OS reprovisioned.

Comment by Nathaniel Clark [ 14/Oct/16 ]

If this recurs, please capture full console log and this bug can be reopened.

Generated at Sat Feb 10 02:19:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.