Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8644

Kernel Panic with obdfilter-survey: spl_kmem_cache_alloc+0x99/0x150

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • Lustre 2.9.0
    • Lustre 2.8.0
    • None
    • Lustre Master Build 3450: ZFS 0.6.5.7-1, kernel 3.10.0-327.36.1.el7_lustre.x86_64

      All storage devices are NVMe
    • 3
    • 9223372036854775807

    Description

      Under obdfilter-surver system kernel panics (immediately) with SPL error; see vmcore below:

      Attachments

        Activity

          [LU-8644] Kernel Panic with obdfilter-survey: spl_kmem_cache_alloc+0x99/0x150

          If this recurs, please capture full console log and this bug can be reopened.

          utopiabound Nathaniel Clark added a comment - If this recurs, please capture full console log and this bug can be reopened.

          I don't I'm afraid - this system has since had its base OS reprovisioned.

          adam.j.roe Adam Roe (Inactive) added a comment - I don't I'm afraid - this system has since had its base OS reprovisioned.

          Adam,
          Do you have the full console log or messages file from the machine?

          utopiabound Nathaniel Clark added a comment - Adam, Do you have the full console log or messages file from the machine?

          Normal workloads (IOR etc) when mounted are fine. I have only observed the issue when using obdfilter-survey.

          adam.j.roe Adam Roe (Inactive) added a comment - Normal workloads (IOR etc) when mounted are fine. I have only observed the issue when using obdfilter-survey.
          adam.j.roe Adam Roe (Inactive) added a comment - - edited

          *zpool configuration & format:*

          [root@zlfs2-oss1 ~]# zpool create OST0000 -o ashift=12 -O recordsize=1M -f /dev/nvme0n1
          [root@zlfs2-oss1 ~]# mkfs.lustre --reformat --ost --backfstype=zfs --fsname=cam --mgsnode=192.168.5.21@o2ib0 --index=0 OST0000/OST0000
          
             Permanent disk data:
          Target:     cam:OST0000
          Index:      0
          Lustre FS:  cam
          Mount type: zfs
          Flags:      0x62
                        (OST first_time update )
          Persistent mount opts:
          Parameters: mgsnode=192.168.5.21@o2ib
          
          mkfs_cmd = zfs create -o canmount=off -o xattr=sa OST0000/OST0000
          Writing OST0000/OST0000 properties
            lustre:version=1
            lustre:flags=98
            lustre:index=0
            lustre:fsname=cam
            lustre:svname=cam:OST0000
            lustre:mgsnode=192.168.5.21@o2ib
          [root@zlfs2-oss1 ~]# mount -vvv -t lustre OST0000/OST0000 /mnt/OST0000
          arg[0] = /sbin/mount.lustre
          arg[1] = -v
          arg[2] = -o
          arg[3] = rw
          arg[4] = OST0000/OST0000
          arg[5] = /mnt/OST0000
          source = OST0000/OST0000 (OST0000/OST0000), target = /mnt/OST0000
          options = rw
          checking for existing Lustre data: found
          Writing OST0000/OST0000 properties
            lustre:version=1
            lustre:flags=34
            lustre:index=0
            lustre:fsname=cam
            lustre:svname=cam:OST0000
            lustre:mgsnode=192.168.5.21@o2ib
          mounting device OST0000/OST0000 at /mnt/OST0000, flags=0x1000000 options=osd=osd-zfs,,mgsnode=192.168.5.21@o2ib,virgin,update,param=mgsnode=192.168.5.21@o2ib,svname=cam-OST0000,device=OST0000/OST0000
            lustre:svname=cam-OST0000
          

          vmcore

          [  662.214980] Modules linked in: obdecho(OE) osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) xprtrdma ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs(OE) ib_umad rdma_cm ib_cm iw_cm ib_sa intel_powerclamp coretemp intel_rapl kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper iTCO_wdt iTCO_vendor_support hfi1(OE) mxm_wmi cryptd ioatdma pcspkr ib_mad ib_core ib_addr sb_edac edac_core i2c_i801 mei_me shpchp mei sg lpc_ich mfd_core ipmi_devintf ipmi_si ipmi_msghandler acpi_power_meter acpi_pad wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c raid1 sd_mod
          [  662.215589]  crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper nvme ttm ixgbe ahci drm mdio libahci ptp i2c_core libata pps_core dca dm_mirror dm_region_hash dm_log dm_mod zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate
          [  662.215842] CPU: 29 PID: 7653 Comm: lctl Tainted: P           OE  ------------   3.10.0-327.36.1.el7_lustre.x86_64 #1
          [  662.215911] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0018.072020161249 07/20/2016
          [  662.215978] task: ffff882021f71700 ti: ffff881ff3c98000 task.ti: ffff881ff3c98000
          [  662.216028] RIP: 0010:[<ffffffff811c1545>]  [<ffffffff811c1545>] kmem_cache_alloc+0x75/0x1d0
          [  662.216092] RSP: 0018:ffff881ff3c9b9d0  EFLAGS: 00010286
          [  662.216129] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000003f8b
          [  662.216176] RDX: 0000000000003f8a RSI: 00000000000042f0 RDI: ffff88103e007a00
          [  662.216222] RBP: ffff881ff3c9ba00 R08: 0000000000019580 R09: ffffffffa000b939
          [  662.216268] R10: 00000000000008df R11: 0000000180000000 R12: 00000000047546c0
          [  662.216314] R13: 00000000000042f0 R14: ffff88103e007a00 R15: ffff88103e007a00
          [  662.216361] FS:  00007f3e446be740(0000) GS:ffff88103e9c0000(0000) knlGS:0000000000000000
          [  662.216414] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          [  662.216452] CR2: 00000000047546c0 CR3: 0000001ff3c8d000 CR4: 00000000001407e0
          [  662.216498] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          [  662.216544] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
          [  662.216590] Stack:
          [  662.216607]  ffffffffa000b939 0000000000000000 0000000000000004 ffff88103e007a00
          [  662.216667]  ffff8820219bba00 0000000000000000 ffff881ff3c9ba48 ffffffffa000b939
          [  662.216730]  00000004f3c9ba48 ffff882021f71700 ffff881003c80a40 ffff882026f6e000
          [  662.216795] Call Trace:
          [  662.216839]  [<ffffffffa000b939>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
          [  662.216895]  [<ffffffffa000b939>] spl_kmem_cache_alloc+0x99/0x150 [spl]
          [  662.216999]  [<ffffffffa00bbc70>] arc_buf_alloc+0x90/0x190 [zfs]
          [  662.217067]  [<ffffffffa00bbd8b>] arc_loan_buf+0x1b/0x30 [zfs]
          [  662.217142]  [<ffffffffa00cba09>] dmu_request_arcbuf+0x19/0x20 [zfs]
          [  662.217204]  [<ffffffffa1160901>] osd_bufs_get+0x591/0xba0 [osd_zfs]
          [  662.217249]  [<ffffffff811c25d3>] ? __kmalloc+0x1f3/0x230
          [  662.217313]  [<ffffffffa125b78d>] ofd_preprw_write.isra.29+0x1bd/0xcd0 [ofd]
          [  662.217369]  [<ffffffffa125ca8a>] ofd_preprw+0x7ea/0x10c0 [ofd]
          [  662.217421]  [<ffffffffa1119894>] echo_client_prep_commit.isra.49+0x334/0xc30 [obdecho]
          [  662.217549]  [<ffffffffa0ce131f>] ? lu_object_find_slice+0x1f/0x90 [obdclass]
          [  662.219311]  [<ffffffffa11230af>] echo_client_iocontrol+0x9bf/0x1c40 [obdecho]
          [  662.221071]  [<ffffffff811c25d3>] ? __kmalloc+0x1f3/0x230
          [  662.222855]  [<ffffffffa0caa15e>] class_handle_ioctl+0x19de/0x2150 [obdclass]
          [  662.224622]  [<ffffffff81197580>] ? handle_mm_fault+0x5e0/0xf80
          [  662.226363]  [<ffffffff81285868>] ? security_capable+0x18/0x20
          [  662.228099]  [<ffffffffa0c8e712>] obd_class_ioctl+0xd2/0x170 [obdclass]
          [  662.229809]  [<ffffffff811f2665>] do_vfs_ioctl+0x2e5/0x4c0
          [  662.231518]  [<ffffffff8164215d>] ? __do_page_fault+0x16d/0x450
          [  662.233191]  [<ffffffff811f28e1>] SyS_ioctl+0xa1/0xc0
          [  662.234815]  [<ffffffff81646c49>] system_call_fastpath+0x16/0x1b
          [  662.236410] Code: ce 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85 e4 0f 84 1f 01 00 00 48 85 c0 0f 84 16 01 00 00 49 63 46 20 48 8d 4a 01 4d 8b 06 <49> 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63
          [  662.239730] RIP  [<ffffffff811c1545>] kmem_cache_alloc+0x75/0x1d0
          [  662.241291]  RSP <ffff881ff3c9b9d0>
          [  662.242792] CR2: 00000000047546c0
          

          zfs.conf

          options zfs metaslab_debug_unload=1
          options zfs zfs_vdev_scheduler=deadline
          options zfs zfs_arc_max=103079215104
          options zfs zfs_dirty_data_max=4294967296
          options zfs zfs_vdev_async_write_active_min_dirty_percent=20
          options zfs zfs_vdev_async_write_min_active=5
          options zfs zfs_vdev_async_write_max_active=10
          options zfs zfs_vdev_sync_read_min_active=16
          options zfs zfs_vdev_sync_read_max_active=16
          
          adam.j.roe Adam Roe (Inactive) added a comment - - edited * zpool configuration & format: * [root@zlfs2-oss1 ~]# zpool create OST0000 -o ashift=12 -O recordsize=1M -f /dev/nvme0n1 [root@zlfs2-oss1 ~]# mkfs.lustre --reformat --ost --backfstype=zfs --fsname=cam --mgsnode=192.168.5.21@o2ib0 --index=0 OST0000/OST0000 Permanent disk data: Target: cam:OST0000 Index: 0 Lustre FS: cam Mount type: zfs Flags: 0x62 (OST first_time update ) Persistent mount opts: Parameters: mgsnode=192.168.5.21@o2ib mkfs_cmd = zfs create -o canmount=off -o xattr=sa OST0000/OST0000 Writing OST0000/OST0000 properties lustre:version=1 lustre:flags=98 lustre:index=0 lustre:fsname=cam lustre:svname=cam:OST0000 lustre:mgsnode=192.168.5.21@o2ib [root@zlfs2-oss1 ~]# mount -vvv -t lustre OST0000/OST0000 /mnt/OST0000 arg[0] = /sbin/mount.lustre arg[1] = -v arg[2] = -o arg[3] = rw arg[4] = OST0000/OST0000 arg[5] = /mnt/OST0000 source = OST0000/OST0000 (OST0000/OST0000), target = /mnt/OST0000 options = rw checking for existing Lustre data: found Writing OST0000/OST0000 properties lustre:version=1 lustre:flags=34 lustre:index=0 lustre:fsname=cam lustre:svname=cam:OST0000 lustre:mgsnode=192.168.5.21@o2ib mounting device OST0000/OST0000 at /mnt/OST0000, flags=0x1000000 options=osd=osd-zfs,,mgsnode=192.168.5.21@o2ib,virgin,update,param=mgsnode=192.168.5.21@o2ib,svname=cam-OST0000,device=OST0000/OST0000 lustre:svname=cam-OST0000 vmcore [ 662.214980] Modules linked in: obdecho(OE) osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) xprtrdma ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs(OE) ib_umad rdma_cm ib_cm iw_cm ib_sa intel_powerclamp coretemp intel_rapl kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper iTCO_wdt iTCO_vendor_support hfi1(OE) mxm_wmi cryptd ioatdma pcspkr ib_mad ib_core ib_addr sb_edac edac_core i2c_i801 mei_me shpchp mei sg lpc_ich mfd_core ipmi_devintf ipmi_si ipmi_msghandler acpi_power_meter acpi_pad wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c raid1 sd_mod [ 662.215589] crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper nvme ttm ixgbe ahci drm mdio libahci ptp i2c_core libata pps_core dca dm_mirror dm_region_hash dm_log dm_mod zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate [ 662.215842] CPU: 29 PID: 7653 Comm: lctl Tainted: P OE ------------ 3.10.0-327.36.1.el7_lustre.x86_64 #1 [ 662.215911] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0018.072020161249 07/20/2016 [ 662.215978] task: ffff882021f71700 ti: ffff881ff3c98000 task.ti: ffff881ff3c98000 [ 662.216028] RIP: 0010:[<ffffffff811c1545>] [<ffffffff811c1545>] kmem_cache_alloc+0x75/0x1d0 [ 662.216092] RSP: 0018:ffff881ff3c9b9d0 EFLAGS: 00010286 [ 662.216129] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000003f8b [ 662.216176] RDX: 0000000000003f8a RSI: 00000000000042f0 RDI: ffff88103e007a00 [ 662.216222] RBP: ffff881ff3c9ba00 R08: 0000000000019580 R09: ffffffffa000b939 [ 662.216268] R10: 00000000000008df R11: 0000000180000000 R12: 00000000047546c0 [ 662.216314] R13: 00000000000042f0 R14: ffff88103e007a00 R15: ffff88103e007a00 [ 662.216361] FS: 00007f3e446be740(0000) GS:ffff88103e9c0000(0000) knlGS:0000000000000000 [ 662.216414] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 662.216452] CR2: 00000000047546c0 CR3: 0000001ff3c8d000 CR4: 00000000001407e0 [ 662.216498] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 662.216544] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 662.216590] Stack: [ 662.216607] ffffffffa000b939 0000000000000000 0000000000000004 ffff88103e007a00 [ 662.216667] ffff8820219bba00 0000000000000000 ffff881ff3c9ba48 ffffffffa000b939 [ 662.216730] 00000004f3c9ba48 ffff882021f71700 ffff881003c80a40 ffff882026f6e000 [ 662.216795] Call Trace: [ 662.216839] [<ffffffffa000b939>] ? spl_kmem_cache_alloc+0x99/0x150 [spl] [ 662.216895] [<ffffffffa000b939>] spl_kmem_cache_alloc+0x99/0x150 [spl] [ 662.216999] [<ffffffffa00bbc70>] arc_buf_alloc+0x90/0x190 [zfs] [ 662.217067] [<ffffffffa00bbd8b>] arc_loan_buf+0x1b/0x30 [zfs] [ 662.217142] [<ffffffffa00cba09>] dmu_request_arcbuf+0x19/0x20 [zfs] [ 662.217204] [<ffffffffa1160901>] osd_bufs_get+0x591/0xba0 [osd_zfs] [ 662.217249] [<ffffffff811c25d3>] ? __kmalloc+0x1f3/0x230 [ 662.217313] [<ffffffffa125b78d>] ofd_preprw_write.isra.29+0x1bd/0xcd0 [ofd] [ 662.217369] [<ffffffffa125ca8a>] ofd_preprw+0x7ea/0x10c0 [ofd] [ 662.217421] [<ffffffffa1119894>] echo_client_prep_commit.isra.49+0x334/0xc30 [obdecho] [ 662.217549] [<ffffffffa0ce131f>] ? lu_object_find_slice+0x1f/0x90 [obdclass] [ 662.219311] [<ffffffffa11230af>] echo_client_iocontrol+0x9bf/0x1c40 [obdecho] [ 662.221071] [<ffffffff811c25d3>] ? __kmalloc+0x1f3/0x230 [ 662.222855] [<ffffffffa0caa15e>] class_handle_ioctl+0x19de/0x2150 [obdclass] [ 662.224622] [<ffffffff81197580>] ? handle_mm_fault+0x5e0/0xf80 [ 662.226363] [<ffffffff81285868>] ? security_capable+0x18/0x20 [ 662.228099] [<ffffffffa0c8e712>] obd_class_ioctl+0xd2/0x170 [obdclass] [ 662.229809] [<ffffffff811f2665>] do_vfs_ioctl+0x2e5/0x4c0 [ 662.231518] [<ffffffff8164215d>] ? __do_page_fault+0x16d/0x450 [ 662.233191] [<ffffffff811f28e1>] SyS_ioctl+0xa1/0xc0 [ 662.234815] [<ffffffff81646c49>] system_call_fastpath+0x16/0x1b [ 662.236410] Code: ce 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85 e4 0f 84 1f 01 00 00 48 85 c0 0f 84 16 01 00 00 49 63 46 20 48 8d 4a 01 4d 8b 06 <49> 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63 [ 662.239730] RIP [<ffffffff811c1545>] kmem_cache_alloc+0x75/0x1d0 [ 662.241291] RSP <ffff881ff3c9b9d0> [ 662.242792] CR2: 00000000047546c0 zfs.conf options zfs metaslab_debug_unload=1 options zfs zfs_vdev_scheduler=deadline options zfs zfs_arc_max=103079215104 options zfs zfs_dirty_data_max=4294967296 options zfs zfs_vdev_async_write_active_min_dirty_percent=20 options zfs zfs_vdev_async_write_min_active=5 options zfs zfs_vdev_async_write_max_active=10 options zfs zfs_vdev_sync_read_min_active=16 options zfs zfs_vdev_sync_read_max_active=16

          People

            utopiabound Nathaniel Clark
            adam.j.roe Adam Roe (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: