Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.11.0, Lustre 2.12.0
-
None
-
Soak stress cluster - lustre-master-ib build 64 version=2.10.58_139_g630cd49
-
3
-
9223372036854775807
Description
Attempting to re-mount the filesystem after the upgrade, Have a hard crash on MDT0001.
Crash is repeatable. I will leave the system in this state for examination, then re-format non-DNE.
Crash dumps are available on soak
[ 451.170602] LDISKFS-fs warning (device dm-2): ldiskfs_multi_mount_protect:322: MMP interval 42 higher than expected, please wait.[ 493.737484] LDISKFS-fs (dm-2): recovery complete [ 493.793102] LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,user_xattr,no_mbcache,nodelalloc [ 495.357987] LustreError: 2384:0:(tgt_lastrcvd.c:1533:tgt_clients_data_init()) soaked-MDT0001: duplicate export for client generation 11 [ 495.646489] LustreError: 2384:0:(obd_config.c:559:class_setup()) setup soaked-MDT0001 failed (-114) [ 495.646493] LustreError: 2384:0:(obd_config.c:1822:class_config_llog_handler()) MGC192.168.1.108@o2ib: cfg command failed: rc = -114 [ 495.646497] Lustre: cmd=cf003 0:soaked-MDT0001 1:soaked-MDT0001_UUID 2:1 3:soaked-MDT0001-mdtlov 4:f[ 495.646570] LustreError: 15c-8: MGC192.168.1.108@o2ib: The configuration from log 'soaked-MDT0001' failed (-114). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. [ 495.646587] LustreError: 2303:0:(obd_mount_server.c:1383:server_start_targets()) failed to start server soaked-MDT0001: -114 [ 495.646728] LustreError: 2303:0:(obd_mount_server.c:1936:server_fill_super()) Unable to start targets: -114 [ 495.646760] LustreError: 2303:0:(obd_config.c:610:class_cleanup()) Device 4 not setup [ 495.899986] BUG: unable to handle kernel NULL pointer dereference at 0000000000000378 [ 495.899999] IP: [<ffffffff816b683c>] _raw_spin_lock+0xc/0x30 [ 495.900002] PGD 0 [ 495.900005] Oops: 0002 [#1] SMP [ 495.900073] Modules linked in: mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) zfs(POE) zunicode(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlx4_en(OE) sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd dm_round_robin iTCO_wdt iTCO_vendor_support ipmi_ssif sg joydev ipmi_si ipmi_devintf mei_me ioatdma ipmi_msghandler pcspkr wmi mei lpc_ich shpchp i2c_i801 dm_multipath [ 495.900107] dm_mod nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx4_ib(OE) ib_core(OE) mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops isci ahci igb mpt2sas libsas ttm libahci ptp crct10dif_pclmul pps_core crct10dif_common mlx4_core(OE) raid_class drm libata crc32c_intel dca mlx_compat(OE) scsi_transport_sas i2c_algo_bit devlink i2c_core [ 495.900113] CPU: 10 PID: 2167 Comm: obd_zombid Tainted: P OE ------------ 3.10.0-693.21.1.el7_lustre.x86_64 #1 [ 495.900114] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [ 495.900117] task: ffff880036358fd0 ti: ffff8804176d0000 task.ti: ffff8804176d0000 [ 495.900122] RIP: 0010:[<ffffffff816b683c>] [<ffffffff816b683c>] _raw_spin_lock+0xc/0x30 [ 495.900124] RSP: 0018:ffff8804176d3da8 EFLAGS: 00010246 [ 495.900126] RAX: 0000000000000000 RBX: ffff88081503c800 RCX: 000000018040003f [ 495.900128] RDX: 0000000000000001 RSI: ffffea0020556b00 RDI: 0000000000000378 [ 495.900129] RBP: ffff8804176d3de0 R08: ffff8808155acf00 R09: 000000018040003f [ 495.900131] R10: 0000000000000001 R11: ffffea0020556b00 R12: 0000000000000000 [ 495.900133] R13: 0000000000000378 R14: ffff880817131068 R15: ffff88081503c800 [ 495.900135] FS: 0000000000000000(0000) GS:ffff88082d880000(0000) knlGS:0000000000000000 [ 495.900137] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 495.900139] CR2: 0000000000000378 CR3: 0000000001a02000 CR4: 00000000000607e0 [ 495.900141] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 495.900143] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 495.900144] Call Trace: [ 495.900246] [<ffffffffc0d6b635>] ? tgt_grant_discard+0x35/0x190 [ptlrpc] [ 495.900317] [<ffffffffc0d3f74e>] ? tgt_client_free+0x17e/0x3b0 [ptlrpc] [ 495.900354] [<ffffffffc177c097>] mdt_destroy_export+0x87/0x200 [mdt] [ 495.900410] [<ffffffffc0a7b9be>] class_export_destroy+0xee/0x490 [obdclass] [ 495.900448] [<ffffffffc0a8434a>] obd_zombie_impexp_cull+0x39a/0x550 [obdclass] [ 495.900479] [<ffffffffc0a8456d>] obd_zombie_impexp_thread+0x6d/0x1c0 [obdclass] [ 495.900489] [<ffffffff810c7c70>] ? wake_up_state+0x20/0x20 [ 495.900519] [<ffffffffc0a84500>] ? obd_zombie_impexp_cull+0x550/0x550 [obdclass] [ 495.900526] [<ffffffff810b4031>] kthread+0xd1/0xe0 [ 495.900530] [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40 [ 495.900537] [<ffffffff816c0577>] ret_from_fork+0x77/0xb0 [ 495.900541] [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40 [ 495.900576] Code: 5d c3 0f 1f 44 00 00 85 d2 74 e4 0f 1f 40 00 eb ed 66 0f 1f 44 00 00 b8 01 00 00 00 5d c3 90 66 66 66 66 90 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 01 c3 55 89 c6 48 89 e5 e8 99 27 ff ff 5d [ 495.900580] RIP [<ffffffff816b683c>] _raw_spin_lock+0xc/0x30 [ 495.900581] RSP <ffff8804176d3da8> [ 495.900582] CR2: 0000000000000378
Landed for 2.12