Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8484

MDS server crash during sanity test 63a run

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Duplicate
    • Affects Version/s: Lustre 2.9.0
    • Fix Version/s: None
    • Labels:
      None
    • Environment:
      RHE6.7 running latest lustre 2.8.55 on bother server and client nodes.
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      Updated to the latest lustre 2.8.55 and while testing sanity when I encountered test 63a I got the following oops on the MDS server.

      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.525423] BUG: unable to handle kernel NULL pointer dereference at (null)
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.532579] IP: [<ffffffff8153c4a3>] down_write+0x23/0x40
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.538135] PGD 0
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.540295] Oops: 0002 1 SMP DEBUG_PAGEALLOC
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.545035] last sysfs file: /sys/devices/system/cpu/online
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.550741] CPU 4
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.552593] Modules linked in: ofd(U) ost(U) osp(U) mdd(U) lod(U) mdt(U) lfsck(U)
      mgs(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sh
      a512_generic crc32c_intel libcfs(U) ldiskfs(U) jbd2 mbcache dm_flakey autofs4 ipmi_devintf 8021q garp stp llc nf_conntrack_netbio
      s_ns nf_conntrack_broadcast ipt_REJECT xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables
      ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_sa ib_mad ib_core ib_addr zfs(P)(U) zcommon(P)(U) znvp
      air(P)(U) spl(U) zlib_deflate zavl(P)(U) zunicode(P)(U) dm_mirror dm_region_hash dm_log dm_multipath dm_mod sg microcode sd_mod c
      rc_t10dif iTCO_wdt iTCO_vendor_support ipmi_si ipmi_msghandler mpt2sas raid_class acpi_pad joydev isci libsas scsi_transport_sas
      sb_edac edac_core i2c_i801 ahci lpc_ich mfd_core ioatdma shpchp ipv6 nfs lockd fscache auth_rpcgss nfs_acl sunrpc mlx4_en mlx4_co
      re igb dc
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: a i2c_algo_bit i2c_core ptp pps_core [last unloaded: scsi_wait_scan]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.650689]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.652312] Pid: 12343, comm: mdt01_001 Tainted: P – ------------ 2
      .6.32-573.26.1.el6.head.x86_64 #1 Supermicro X9DRT/X9DRT
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.664983] RIP: 0010:[<ffffffff8153c4a3>] [<ffffffff8153c4a3>] down_write+0x23/
      0x40
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.673099] RSP: 0018:ffff880790c97800 EFLAGS: 00010246
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.681568] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.688839] RDX: ffffffff00000001 RSI: 0000000000000000 RDI: 00000000000000
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.696105] RBP: ffff880790c97810 R08: 0000000000000000 R09: ffff8800000be040
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.703378] R10: 0000000000000000 R11: 0000000000000198 R12: ffff88104e5bd0c0
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.710642] R13: 0000000000000000 R14: ffff881074682cf0 R15: ffff88103794f680
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.717915] FS: 0000000000000000(0000) GS:ffff88089c400000(0000) knlGS:000000000
      0000000
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.726254] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.732140] CR2: 0000000000000000 CR3: 0000000001a8d000 CR4: 00000000000407e0
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.739407] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.746669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.753942] Process mdt01_001 (pid: 12343, threadinfo ffff880790c94000, task ffff
      88083ed7e040)
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.762802] Stack:
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.764943] ffff88103794f680 0000000000000000 ffff880790c97870 ffffffffa092e453
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.772381] <d> 0000000000000000 ffff881074682cf0 ffff881074682cb0 ffff88105932ee
      40
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.780417] <d> 0000000000000000 ffff88104e5bd0c0 ffff88105932ee40 ffff881074682c
      b0
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.788757] Call Trace:
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.791375] [<ffffffffa092e453>] llog_cat_add_rec+0x403/0x7b0 [obdclass]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.798314] [<ffffffffa0924319>] llog_add+0x89/0x1c0 [obdclass]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.804462] [<ffffffffa13dd270>] osp_sync_add_rec+0x270/0xa30 [osp]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.810956] [<ffffffffa13ddad7>] osp_sync_add+0x77/0x80 [osp]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.816938] [<ffffffffa130bcbe>] ? lod_sub_get_thandle+0x24e/0x3c0 [lod]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.823871] [<ffffffffa13ce433>] osp_object_destroy+0x173/0x230 [osp]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.830541] [<ffffffffa130f23d>] lod_sub_object_destroy+0x1fd/0x440 [lod]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.837555] [<ffffffffa07b824f>] ? ldiskfs_dirty_inode+0x4f/0x60 [ldiskfs]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.844653] [<ffffffffa1302dcb>] lod_object_destroy+0x36b/0x770 [lod]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.851324] [<ffffffffa1368f3b>] mdd_finish_unlink+0x28b/0x3d0 [mdd]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.857904] [<ffffffffa136d095>] mdd_unlink+0xab5/0xf70 [mdd]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.863892] [<ffffffffa1231b58>] mdo_unlink+0x18/0x50 [mdt]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.869702] [<ffffffffa123ac5f>] mdt_reint_unlink+0xcaf/0x10c0 [mdt]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.876289] [<ffffffffa1231bed>] mdt_reint_rec+0x5d/0x200 [mdt]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.882443] [<ffffffffa121d5db>] mdt_reint_internal+0x62b/0xa50 [mdt]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.889107] [<ffffffffa121deab>] mdt_reint+0x6b/0x120 [mdt]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.894974] [<ffffffffa0bd273c>] tgt_request_handle+0x8ec/0x1440 [ptlrpc]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.902035] [<ffffffffa0b7f2e1>] ptlrpc_main+0xd31/0x1800 [ptlrpc]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.908484] [<ffffffffa0b7e5b0>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.914881] [<ffffffff810a148e>] kthread+0x9e/0xc0
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.919891] [<ffffffff8100c28a>] child_rip+0xa/0x20
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.924996] [<ffffffff810a13f0>] ? kthread+0x0/0xc0
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.930090] [<ffffffff8100c280>] ? child_rip+0x0/0x20
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.935368] Code: c3 e8 f2 b3 b3 ff 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00
      00 48 89 fb e8 9a e2 ff ff 48 ba 01 00 00 00 ff ff ff ff 48 89 d8 <f0> 48 0f c1 10 48 85 d2 74 05 e8 ae 26 d6 ff 48 83 c4 08 5b c
      9
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.956253] RIP [<ffffffff8153c4a3>] down_write+0x23/0x40
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.961888] RSP <ffff880790c97800>
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.965502] CR2: 0000000000000000
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.969351] --[ end trace 30c27fbd94bdd40c ]--
      Aug 7 20:46:29 ninja11.ccs.ornl.gov kernel: [10262.974170] Kernel panic - not syncing: Fatal exception

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                hongchao.zhang Hongchao Zhang
                Reporter:
                simmonsja James A Simmons
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: