Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7638

general protection fault: 0000 after mounting MDTs

    XMLWordPrintable

Details

    • 3
    • 9223372036854775807

    Description

      Error occurred during soak testing of build '20160104' (see https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&spaceKey=Releases#SoakTestingonLola-20160104). DNE is enabled. MDTs have been formatted using ldiskfs, OSTs using zfs. MDS nodes are configured in active-active HA configuration.

      (mds_restart means hard reset of MDS node and remount of MDTs (primary resources)
      Event sequence:

      • 2016-01-06 06:36:33,402:fsmgmt.fsmgmt:INFO triggering fault mds_restart for lola-9
      • 2016-01-06 06:46:35,601:fsmgmt.fsmgmt:INFO oss_restart just completed for lola-9
      • lola-9 crashed before 06:46:40 as last update for collectl counters
        happened at 06:46:20 (frequency 20s). Also no exhausting of memory (slabs) happened.
      • Error message reads as:
      <4>general protection fault: 0000 [#1] SMP 
      <4>last sysfs file: /sys/devices/system/cpu/online
      <4>CPU 2 
      <4>Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) ldiskfs(U) jbd2 lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) 8021q garp stp llc nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm scsi_dh_rdac dm_round_robin dm_multipath microcode iTCO_wdt iTCO_vendor_support zfs(P)(U) zcommon(P)(U) znvpair(P)(U) spl(U) zlib_deflate zavl(P)(U) zunicode(P)(U) sb_edac edac_core lpc_ich mfd_core i2c_i801 ioatdma sg igb dca i2c_algo_bit i2c_core ptp pps_core ext3 jbd mbcache sd_mod crc_t10dif ahci isci libsas wmi mpt2sas scsi_transport_sas raid_class mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      <4>
      <4>Pid: 5372, comm: lod0002_rec0004 Tainted: P           ---------------    2.6.32-504.30.3.el6_lustre.g3f4572c.x86_64 #1 Intel Corporation S2600GZ ........../S2600GZ
      <4>RIP: 0010:[<ffffffffa0b8ee8b>]  [<ffffffffa0b8ee8b>] insert_update_records_to_replay_list+0xf6b/0x1b70 [ptlrpc]
      <4>RSP: 0018:ffff880821d05a50  EFLAGS: 00010296
      <4>RAX: 0000000000005a5a RBX: ffff880804003d78 RCX: ffff880434faa2e0
      <4>RDX: 5a5a5a5a5a5a5a5a RSI: 0000000000000000 RDI: 0000000000000004
      <4>RBP: ffff880821d05ac0 R08: 0000000000000000 R09: 0000000000000000
      <4>R10: 000000000000004d R11: 0000000000000000 R12: ffff8803ec7afe40
      <4>R13: 5a5a5a5a5a5a5a42 R14: ffff880804003d88 R15: ffff8803ec7afe58
      <4>FS:  0000000000000000(0000) GS:ffff880038240000(0000) knlGS:0000000000000000
      <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      <4>CR2: 00007f1cacb4f000 CR3: 0000000001a85000 CR4: 00000000000407e0
      <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>Process lod0002_rec0004 (pid: 5372, threadinfo ffff880821d04000, task ffff880821f2c040)
      <4>Stack:
      <4> ffff8807fa7c40c0 ffff880804cc5078 ffff880821d05ac0 ffff880804cc50a8
      <4><d> ffff8803ef8a72d8 0000000421d05ad0 ffff880804cc5088 ffff880804cc50a8
      <4><d> 0000000000007fff ffff880804cc5078 ffff8803ef8a7000 ffff88041b9b2360
      <4>Call Trace:
      <4> [<ffffffffa1303b79>] lod_process_recovery_updates+0x1e9/0x420 [lod]
      <4> [<ffffffffa089048a>] llog_process_thread+0x94a/0x1040 [obdclass]
      <4> [<ffffffffa0890c3d>] llog_process_or_fork+0xbd/0x5d0 [obdclass]
      <4> [<ffffffffa1303990>] ? lod_process_recovery_updates+0x0/0x420 [lod]
      <4> [<ffffffffa0893e38>] llog_cat_process_cb+0x458/0x600 [obdclass]
      <4> [<ffffffffa089048a>] llog_process_thread+0x94a/0x1040 [obdclass]
      <4> [<ffffffffa08e02e4>] ? dt_read+0x14/0x50 [obdclass]
      <4> [<ffffffffa0890c3d>] llog_process_or_fork+0xbd/0x5d0 [obdclass]
      <4> [<ffffffffa08939e0>] ? llog_cat_process_cb+0x0/0x600 [obdclass]
      <4> [<ffffffffa089269d>] llog_cat_process_or_fork+0x1ad/0x300 [obdclass]
      <4> [<ffffffffa13301b9>] ? lod_sub_prep_llog+0x4f9/0x7a0 [lod]
      <4> [<ffffffffa1303990>] ? lod_process_recovery_updates+0x0/0x420 [lod]
      <4> [<ffffffffa0892809>] llog_cat_process+0x19/0x20 [obdclass]
      <4> [<ffffffffa13096f3>] lod_sub_recovery_thread+0x4e3/0xcf0 [lod]
      <4> [<ffffffffa1309210>] ? lod_sub_recovery_thread+0x0/0xcf0 [lod]
      <4> [<ffffffff8109e78e>] kthread+0x9e/0xc0
      <4> [<ffffffff8100c28a>] child_rip+0xa/0x20
      <4> [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
      <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
      <4>Code: 4d 89 7c 24 20 49 89 44 24 08 49 89 44 24 10 8b 55 bc 41 89 14 24 e8 b5 e9 99 e0 49 8b 55 38 48 39 d3 4c 8d 6a e8 74 1f 8b 7d bc <3b> 7a e8 74 6f 8b 4d bc eb 05 3b 48 e8 74 65 49 8b 45 18 48 39 
      <1>RIP  [<ffffffffa0b8ee8b>] insert_update_records_to_replay_list+0xf6b/0x1b70 [ptlrpc]
      <4> RSP <ffff880821d05a50>
      

      Attached messages, console and vmcore-dmesg log file of lola-9.
      Crash file was saved to crashdump directory of cluster Lola and can be uploaded on demand to a desired location. I'll list the exact path of the crash dump in the next comment (box).

      Attachments

        1. console-lola-8.log.bz2
          64 kB
        2. console-lola-9.log.bz2
          177 kB
        3. messages-lola-8.log.bz2
          209 kB
        4. messages-lola-9.log.bz2
          99 kB
        5. vmcore-dmesg.txt.bz2
          24 kB
        6. vmcore-dmesg.txt.bz2
          29 kB

        Issue Links

          Activity

            People

              di.wang Di Wang
              heckes Frank Heckes (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: