Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.8.0
-
None
-
CentOS 7.2, NVMe devices, DNE2, LDISKFS MDT's, OPA with IFS 10.1.1.0.9 and Lustre-master Build #3419
-
3
-
9223372036854775807
Description
Using DNE2 with LDISK FS I observe Kernel Panics, below is the vmcore-dmsg.txt from two systems which I observed the issue - although the behaviour to the end user (me) was the same theses look like two entirely different issues.
On Both occasion workload was MDTEST with 256 Cores striped (DNE2) across sever systems. Server6 was 16 MDT's 2x per system and the occurrence on server2 was 1x MDT per server.
I can upload the vmcore's if needed but they are about 600MB.
vmcore-dmsg.txt - server6
[132444.977566] Modules linked in: ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) mbcache jbd2 lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) xprtrdma ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs(OE) ib_umad rdma_cm ib_cm iw_cm ib_sa intel_powerclamp coretemp intel_rapl iTCO_wdt iTCO_vendor_support kvm ipmi_devintf crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd mxm_wmi hfi1(OE) ipmi_si mei_me mei ipmi_msghandler pcspkr sg sb_edac edac_core lpc_ich mfd_core ib_mad ib_core ib_addr ioatdma shpchp i2c_i801 acpi_pad acpi_power_meter wmi nfsd auth_rpcgss [132444.977916] nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c raid1 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul mgag200 crct10dif_common syscopyarea crc32c_intel sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm nvme drm ixgbe ahci libahci mdio libata i2c_core ptp pps_core dca dm_mirror dm_region_hash dm_log dm_mod zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate [132444.978093] CPU: 5 PID: 5441 Comm: mdt00_003 Tainted: P OE ------------ 3.10.0-327.22.2.el7_lustre.x86_64 #1 [132444.978132] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0018.072020161249 07/20/2016 [132444.978169] task: ffff88102308b980 ti: ffff880fecadc000 task.ti: ffff880fecadc000 [132444.978197] RIP: 0010:[<ffffffffa12f43a8>] [<ffffffffa12f43a8>] osd_oxc_lookup+0x38/0x70 [osd_ldiskfs] [132444.978248] RSP: 0018:ffff880fecadf938 EFLAGS: 00010297 [132444.978268] RAX: 00000000ffffffff RBX: dead000000100100 RCX: 0000000000000064 [132444.978295] RDX: 000000000000000a RSI: ffff880074305038 RDI: ffffffffa153b934 [132444.978321] RBP: ffff880fecadf958 R08: 000000000000006c R09: ffff880074305000 [132444.978347] R10: ffff88103ec07a00 R11: ffffffffa1334060 R12: 000000000000000b [132444.978373] R13: ffff881027013ab8 R14: ffffffffa153b934 R15: ffff881fc4128000 [132444.978399] FS: 0000000000000000(0000) GS:ffff88103f2a0000(0000) knlGS:0000000000000000 [132444.978429] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [132444.978450] CR2: 000000000044bf46 CR3: 000000000194a000 CR4: 00000000001407e0 [132444.978476] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [132444.978503] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [132444.978529] Stack: [132444.978539] ffff881027013a00 ffffffffa153b934 ffff880ff665b810 ffff880fd7c85a08 [132444.978577] ffff880fecadf998 ffffffffa12fe6df ffff880fecadf9a0 ffff880ff665b800 [132444.978615] ffff881027013a00 ffff88102499cd00 ffffffffa153b934 ffff880ff665b810 [132444.978654] Call Trace: [132444.978680] [<ffffffffa12fe6df>] osd_xattr_get+0x18f/0x550 [osd_ldiskfs] [132444.978720] [<ffffffffa1511a11>] lod_get_ea+0x111/0x410 [lod] [132444.978750] [<ffffffffa151ef01>] lod_ah_init+0x681/0x9a0 [lod] [132444.978790] [<ffffffffa158dd85>] mdd_object_make_hint+0xc5/0x190 [mdd] [132444.978822] [<ffffffffa12f5c68>] ? osd_object_read_unlock+0x58/0x60 [osd_ldiskfs] [132444.978856] [<ffffffffa1580ff8>] mdd_create+0x688/0x12b0 [mdd] [132444.978933] [<ffffffffa0c76c0c>] ? lu_object_find_at+0xac/0xe0 [obdclass] [132444.978985] [<ffffffffa14596b9>] mdt_md_create+0x849/0xba0 [mdt] [132444.979081] [<ffffffffa0e52532>] ? ldlm_resource_putref+0x72/0x510 [ptlrpc] [132444.980136] [<ffffffffa1459b7b>] mdt_reint_create+0x16b/0x350 [mdt] [132444.981181] [<ffffffffa145b080>] mdt_reint_rec+0x80/0x210 [mdt] [132444.982220] [<ffffffffa143dd62>] mdt_reint_internal+0x5b2/0x9b0 [mdt] [132444.983256] [<ffffffffa1448f97>] mdt_reint+0x67/0x140 [mdt] [132444.984315] [<ffffffffa0efab15>] tgt_request_handle+0x915/0x1320 [ptlrpc] [132444.985386] [<ffffffffa0ea6ccb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] [132444.986592] [<ffffffffa0b5d568>] ? lc_watchdog_touch+0x68/0x180 [libcfs] [132444.987791] [<ffffffffa0ea4888>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] [132444.988953] [<ffffffff810b88d2>] ? default_wake_function+0x12/0x20 [132444.990090] [<ffffffff810af038>] ? __wake_up_common+0x58/0x90 [132444.991057] [<ffffffffa0eaad80>] ptlrpc_main+0xaa0/0x1de0 [ptlrpc] [132444.991978] [<ffffffffa0eaa2e0>] ? ptlrpc_register_service+0xe40/0xe40 [ptlrpc] [132444.992850] [<ffffffff810a5aef>] kthread+0xcf/0xe0 [132444.993693] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [132444.994523] [<ffffffff816469d8>] ret_from_fork+0x58/0x90 [132444.995326] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [132444.996109] Code: f6 41 55 4c 8d af b8 00 00 00 41 54 49 89 d4 53 48 8b 9f b8 00 00 00 4c 39 eb 75 0f eb 35 0f 1f 44 00 00 48 8b 1b 4c 39 eb 74 28 <4c> 39 63 18 75 f2 48 8d 73 38 4c 89 e2 4c 89 f7 e8 33 7a 00 e0 [132444.997737] RIP [<ffffffffa12f43a8>] osd_oxc_lookup+0x38/0x70 [osd_ldiskfs] [132444.998514] RSP <ffff880fecadf938>
vmcore-dmsg.txt - server2
[ 529.528377] nvme3n1: unknown partition table [ 529.539980] LDISKFS-fs (nvme3n1): file extents enabled, maximum tree depth=5 [ 529.548195] LDISKFS-fs (nvme3n1): mounted filesystem with ordered data mode. Opts: errors=remount-ro [ 775.424045] LDISKFS-fs (nvme0n1): mounted filesystem with ordered data mode. Opts: errors=remount-ro [ 776.406448] LDISKFS-fs (nvme0n1): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache [ 926.651982] LustreError: 3844:0:(mgc_request.c:257:do_config_log_add()) MGC192.168.5.21@o2ib: failed processing log, type 4: rc = -22 [ 926.677853] Lustre: srv-zlfs2-MDT0002: No data found on store. Initialize space [ 926.701051] Lustre: zlfs2-MDT0002: new disk, initializing [ 926.725877] LustreError: 3844:0:(nodemap_storage.c:368:nodemap_idx_nodemap_add_update()) cannot add nodemap config to non-existing MGS. [ 926.725944] LustreError: 3844:0:(nodemap_storage.c:1313:nodemap_fs_init()) zlfs2-MDD0002: error loading nodemap config file, file must be removed via ldiskfs: rc = -22 [ 926.726110] LustreError: 3844:0:(lu_object.c:1243:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1 [ 926.726157] LustreError: 3844:0:(lu_object.c:1243:lu_device_fini()) LBUG [ 926.726183] Pid: 3844, comm: mount.lustre [ 926.726184] Call Trace: [ 926.726206] [<ffffffffa0b327d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] [ 926.726212] [<ffffffffa0b32d75>] lbug_with_loc+0x45/0xc0 [libcfs] [ 926.726264] [<ffffffffa0c6dbb8>] lu_device_fini+0xb8/0xc0 [obdclass] [ 926.726282] [<ffffffffa0c52d22>] ls_device_put+0x82/0x2a0 [obdclass] [ 926.726298] [<ffffffffa0c5301d>] local_oid_storage_fini+0xdd/0x210 [obdclass] [ 926.726304] [<ffffffffa13a0331>] mgc_set_info_async+0x951/0x1610 [mgc] [ 926.726313] [<ffffffffa0b3d957>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [ 926.726338] [<ffffffffa0c91954>] server_start_targets+0x794/0x2d20 [obdclass] [ 926.726356] [<ffffffffa0c62f90>] ? class_config_llog_handler+0x0/0x1b40 [obdclass] [ 926.726374] [<ffffffffa0c94f6d>] server_fill_super+0x108d/0x184c [obdclass] [ 926.726392] [<ffffffffa0c6cf98>] lustre_fill_super+0x328/0x950 [obdclass] [ 926.726408] [<ffffffffa0c6cc70>] ? lustre_fill_super+0x0/0x950 [obdclass] [ 926.726426] [<ffffffff811e235d>] mount_nodev+0x4d/0xb0 [ 926.726445] [<ffffffffa0c64ec8>] lustre_mount+0x38/0x60 [obdclass] [ 926.726448] [<ffffffff811e2d09>] mount_fs+0x39/0x1b0 [ 926.726454] [<ffffffff811fe5df>] vfs_kern_mount+0x5f/0xf0 [ 926.726457] [<ffffffff81200b2e>] do_mount+0x24e/0xa40 [ 926.726464] [<ffffffff8116e30e>] ? __get_free_pages+0xe/0x50 [ 926.726466] [<ffffffff812013b6>] SyS_mount+0x96/0xf0 [ 926.726473] [<ffffffff81646e89>] system_call_fastpath+0x16/0x1b [ 926.726474] [ 926.726585] Kernel panic - not syncing: LBUG [ 926.726606] CPU: 20 PID: 3844 Comm: mount.lustre Tainted: P OE ------------ 3.10.0-327.28.2.el7_lustre.x86_64 #1 [ 926.726646] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0018.072020161249 07/20/2016 [ 926.726683] ffffffffa0b4fdef 0000000027dd9981 ffff881024c1f9e8 ffffffff8163677b [ 926.726718] ffff881024c1fa68 ffffffff8163000a ffffffff00000008 ffff881024c1fa78 [ 926.726758] ffff881024c1fa18 0000000027dd9981 ffffffffa0c9e1d5 0000000000000000 [ 926.726798] Call Trace: [ 926.726820] [<ffffffff8163677b>] dump_stack+0x19/0x1b [ 926.726843] [<ffffffff8163000a>] panic+0xd8/0x1e7 [ 926.726869] [<ffffffffa0b32ddb>] lbug_with_loc+0xab/0xc0 [libcfs] [ 926.726915] [<ffffffffa0c6dbb8>] lu_device_fini+0xb8/0xc0 [obdclass] [ 926.726961] [<ffffffffa0c52d22>] ls_device_put+0x82/0x2a0 [obdclass] [ 926.727004] [<ffffffffa0c5301d>] local_oid_storage_fini+0xdd/0x210 [obdclass] [ 926.727035] [<ffffffffa13a0331>] mgc_set_info_async+0x951/0x1610 [mgc] [ 926.727068] [<ffffffffa0b3d957>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [ 926.727116] [<ffffffffa0c91954>] server_start_targets+0x794/0x2d20 [obdclass] [ 926.727165] [<ffffffffa0c62f90>] ? class_config_dump_handler+0xb70/0xb70 [obdclass] [ 926.727221] [<ffffffffa0c94f6d>] server_fill_super+0x108d/0x184c [obdclass] [ 926.727276] [<ffffffffa0c6cf98>] lustre_fill_super+0x328/0x950 [obdclass] [ 926.727329] [<ffffffffa0c6cc70>] ? lustre_common_put_super+0x270/0x270 [obdclass] [ 926.727366] [<ffffffff811e235d>] mount_nodev+0x4d/0xb0 [ 926.727413] [<ffffffffa0c64ec8>] lustre_mount+0x38/0x60 [obdclass] [ 926.727444] [<ffffffff811e2d09>] mount_fs+0x39/0x1b0 [ 926.727470] [<ffffffff811fe5df>] vfs_kern_mount+0x5f/0xf0 [ 926.727498] [<ffffffff81200b2e>] do_mount+0x24e/0xa40 [ 926.727524] [<ffffffff8116e30e>] ? __get_free_pages+0xe/0x50 [ 926.727552] [<ffffffff812013b6>] SyS_mount+0x96/0xf0 [ 926.727577] [<ffffffff81646e89>] system_call_fastpath+0x16/0x1b
Attachments
Issue Links
- is related to
-
LU-8580 general protection fault: osd_xattr_get+0x32c/0x5b0 [osd_ldiskfs]
-
- Closed
-