Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.7.0
-
OpenSFS cluster running lustre-master tag 2.6.90 build #2745 with one MDS/MDT, three OSS with two OSTs each and three clients.
-
3
-
16581
Description
I was running sanity-hsm and, from the client running the test, if looks like sanity-hsm hangs for over an hour in test 52. After checking the cluster, the MDS crashed during test 52. The test results are at https://testing.hpdd.intel.com/test_sessions/d22be0aa-7020-11e4-a1b8-5254006e85c2
The MDS crashes with
<4>Lustre: DEBUG MARKER: == sanity-hsm test 52: Opened for write file on an evicted client should be set dirty == 08:09:13 (1416413353) <4>Lustre: 5811:0:(genops.c:1517:obd_export_evict_by_uuid()) scratch-MDT0000: evicting e716fca4-33eb-2ac0-ac0c-baa0a73f449e at adminstrative request <1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 <1>IP: [<ffffffffa113dc8d>] mdd_changelog_data_store+0xbd/0x320 [mdd] <4>PGD bc685e067 PUD 102cfe5067 PMD 0 <4>Oops: 0000 [#1] SMP <4>last sysfs file: /sys/devices/system/cpu/online <4>CPU 13 <4>Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic crc32c_intel libcfs(U) ldiskfs(U) jbd2 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_fil ter ip_tables nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 iTCO_wdt iTCO_vendor_support microcode serio_raw mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core i2c_i801 lpc_ich mfd_core ioatdma i7core_edac edac_core ses enclosure sg igb dca i2c_algo_bit i2c_core ptp pps_core ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpiata_generic ata_piix mpt2sas scsi_transport_sas raid_class dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_conntrack] <4> <4>Pid: 5811, comm: lctl Not tainted 2.6.32-431.29.2.el6_lustre.gbe60ead.x86_64 #1 Supermicro X8DTH-i/6/iF/6F/X8DTH <4>RIP: 0010:[<ffffffffa113dc8d>] [<ffffffffa113dc8d>] mdd_changelog_data_store +0xbd/0x320 [mdd] <4>RSP: 0018:ffff8802ad83fb48 EFLAGS: 00010202 <4>RAX: 0000000000005043 RBX: ffff8802ad86fa00 RCX: 0000000000000000 <4>RDX: 000000000000000b RSI: ffff8802ad86fa00 RDI: ffff8802ad83fcf8 <4>RBP: ffff8802ad83fba8 R08: ffff88029e695738 R09: ffff880bbe388500 <4>R10: 0000000000000031 R11: 0000000000001000 R12: 000000000000000b <4>R13: ffff8802ad83fcf8 R14: ffff88021abaf410 R15: 0000000000001043 <4>FS: 00007f5efaef0700(0000) GS:ffff88085c4a0000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b <4>CR2: 0000000000000048 CR3: 0000000bc68ce000 CR4: 00000000000007e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process lctl (pid: 5811, threadinfo ffff8802ad83e000, task ffff88035dd8caa0) <4>Stack: <4> 0000000000000000 ffff880931053990 ffff8802ad83fb98 ffffffffa10b4bbf <4><d> ffff8802ad83fb88 ffff880bbe388500 ffff8802ad83fcf8 ffff88021abaf410 <4><d> ffff8802ad83fcf8 0000000000000000 ffff880bbe388500 ffff880931053990 <4>Call Trace: <4> [<ffffffffa10b4bbf>] ? lod_trans_start+0x9f/0x190 [lod] <4> [<ffffffffa11427be>] mdd_close+0x34e/0xc50 [mdd] <4> [<ffffffffa10067a9>] mdt_mfd_close+0x4a9/0x1ba0 [mdt] <4> [<ffffffffa06d1ea4>] ? keys_fini+0xe4/0x130 [obdclass] <4> [<ffffffffa06d1f1f>] ? lu_context_fini+0x2f/0xc0 [obdclass] <4> [<ffffffffa0fd401e>] ? mdt_ctxt_add_dirty_flag+0x13e/0x190 [mdt] <4> [<ffffffffa0fd4402>] mdt_obd_disconnect+0x392/0x510 [mdt] <4> [<ffffffffa069e72d>] class_fail_export+0x23d/0x540 [obdclass] <4> [<ffffffffa069eb72>] obd_export_evict_by_uuid+0x142/0x240 [obdclass] <4> [<ffffffffa06f0933>] lprocfs_evict_client_seq_write+0x2f3/0x3b0 [obdclass] <4> [<ffffffffa101588c>] mdt_mds_evict_client_write+0x2ac/0x390 [mdt] <4> [<ffffffff811f423e>] proc_reg_write+0x7e/0xc0 <4> [<ffffffff81189278>] vfs_write+0xb8/0x1a0 <4> [<ffffffff81189c41>] sys_write+0x51/0x90 <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b <4>Code: 00 00 41 8d 44 24 ef 4d 8b 06 83 f8 02 0f 86 6b 01 00 00 44 89 f8 4c 89 ef 25 ff 0f 00 00 41 89 c7 80 cc 50 41 81 cf 00 10 00 00 <80> 79 48 00 48 89 4d b0 4c 89 45 b8 44 0f 45 f8 44 89 f8 44 89 <1>RIP [<ffffffffa113dc8d>] mdd_changelog_data_store+0xbd/0x320 [mdd] <4> RSP <ffff8802ad83fb48> <4>CR2: 0000000000000048
I will upload the vmcore form the MDS in a little while.