Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5938

sanity-hsm test_52 MDS OOPS: mdd_changelog_data_store

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.8.0
    • Lustre 2.7.0
    • OpenSFS cluster running lustre-master tag 2.6.90 build #2745 with one MDS/MDT, three OSS with two OSTs each and three clients.
    • 3
    • 16581

    Description

      I was running sanity-hsm and, from the client running the test, if looks like sanity-hsm hangs for over an hour in test 52. After checking the cluster, the MDS crashed during test 52. The test results are at https://testing.hpdd.intel.com/test_sessions/d22be0aa-7020-11e4-a1b8-5254006e85c2

      The MDS crashes with

      <4>Lustre: DEBUG MARKER: == sanity-hsm test 52: Opened for write file on an evicted client should be set dirty == 08:09:13 (1416413353)
      <4>Lustre: 5811:0:(genops.c:1517:obd_export_evict_by_uuid()) scratch-MDT0000: evicting e716fca4-33eb-2ac0-ac0c-baa0a73f449e at adminstrative request
      <1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
      <1>IP: [<ffffffffa113dc8d>] mdd_changelog_data_store+0xbd/0x320 [mdd]
      <4>PGD bc685e067 PUD 102cfe5067 PMD 0 
      <4>Oops: 0000 [#1] SMP 
      <4>last sysfs file: /sys/devices/system/cpu/online
      <4>CPU 13 
      <4>Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic crc32c_intel libcfs(U) ldiskfs(U) jbd2 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_fil
      ter ip_tables nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 iTCO_wdt iTCO_vendor_support microcode serio_raw mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core i2c_i801 lpc_ich mfd_core ioatdma i7core_edac edac_core ses enclosure sg igb dca i2c_algo_bit i2c_core ptp pps_core ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpiata_generic ata_piix mpt2sas scsi_transport_sas raid_class dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_conntrack]
      <4>
      <4>Pid: 5811, comm: lctl Not tainted 2.6.32-431.29.2.el6_lustre.gbe60ead.x86_64 
      #1 Supermicro X8DTH-i/6/iF/6F/X8DTH
      <4>RIP: 0010:[<ffffffffa113dc8d>]  [<ffffffffa113dc8d>] mdd_changelog_data_store
      +0xbd/0x320 [mdd]
      <4>RSP: 0018:ffff8802ad83fb48  EFLAGS: 00010202
      <4>RAX: 0000000000005043 RBX: ffff8802ad86fa00 RCX: 0000000000000000
      <4>RDX: 000000000000000b RSI: ffff8802ad86fa00 RDI: ffff8802ad83fcf8
      <4>RBP: ffff8802ad83fba8 R08: ffff88029e695738 R09: ffff880bbe388500
      <4>R10: 0000000000000031 R11: 0000000000001000 R12: 000000000000000b
      <4>R13: ffff8802ad83fcf8 R14: ffff88021abaf410 R15: 0000000000001043
      <4>FS:  00007f5efaef0700(0000) GS:ffff88085c4a0000(0000) knlGS:0000000000000000
      <4>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      <4>CR2: 0000000000000048 CR3: 0000000bc68ce000 CR4: 00000000000007e0
      <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>Process lctl (pid: 5811, threadinfo ffff8802ad83e000, task ffff88035dd8caa0)
      <4>Stack:
      <4> 0000000000000000 ffff880931053990 ffff8802ad83fb98 ffffffffa10b4bbf
      <4><d> ffff8802ad83fb88 ffff880bbe388500 ffff8802ad83fcf8 ffff88021abaf410
      <4><d> ffff8802ad83fcf8 0000000000000000 ffff880bbe388500 ffff880931053990
      <4>Call Trace:
      <4> [<ffffffffa10b4bbf>] ? lod_trans_start+0x9f/0x190 [lod]
      <4> [<ffffffffa11427be>] mdd_close+0x34e/0xc50 [mdd]
      <4> [<ffffffffa10067a9>] mdt_mfd_close+0x4a9/0x1ba0 [mdt]
      <4> [<ffffffffa06d1ea4>] ? keys_fini+0xe4/0x130 [obdclass]
      <4> [<ffffffffa06d1f1f>] ? lu_context_fini+0x2f/0xc0 [obdclass]
      <4> [<ffffffffa0fd401e>] ? mdt_ctxt_add_dirty_flag+0x13e/0x190 [mdt]
      <4> [<ffffffffa0fd4402>] mdt_obd_disconnect+0x392/0x510 [mdt]
      <4> [<ffffffffa069e72d>] class_fail_export+0x23d/0x540 [obdclass]
      <4> [<ffffffffa069eb72>] obd_export_evict_by_uuid+0x142/0x240 [obdclass]
      <4> [<ffffffffa06f0933>] lprocfs_evict_client_seq_write+0x2f3/0x3b0 [obdclass]
      <4> [<ffffffffa101588c>] mdt_mds_evict_client_write+0x2ac/0x390 [mdt]
      <4> [<ffffffff811f423e>] proc_reg_write+0x7e/0xc0
      <4> [<ffffffff81189278>] vfs_write+0xb8/0x1a0
      <4> [<ffffffff81189c41>] sys_write+0x51/0x90
      <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      <4>Code: 00 00 41 8d 44 24 ef 4d 8b 06 83 f8 02 0f 86 6b 01 00 00 44 89 f8 4c 89
      ef 25 ff 0f 00 00 41 89 c7 80 cc 50 41 81 cf 00 10 00 00 <80> 79 48 00 48 89 4d
      b0 4c 89 45 b8 44 0f 45 f8 44 89 f8 44 89 
      <1>RIP  [<ffffffffa113dc8d>] mdd_changelog_data_store+0xbd/0x320 [mdd]
      <4> RSP <ffff8802ad83fb48>
      <4>CR2: 0000000000000048
      

      I will upload the vmcore form the MDS in a little while.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: