Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8580

general protection fault: osd_xattr_get+0x32c/0x5b0 [osd_ldiskfs]

Details

    • 3
    • 9223372036854775807

    Description

      Error happened during soak testing of build '20160902' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160902)
      Configuration reads as:
      4 MDS with 1 MDT / MDS, backend FS ldiskfs, nodes configured pairwise in active-active HA configuration
      6 OSS with 4 OSTs / OSS, backend FS zfs, nodes configured in pairwise in active-active HA configuration

      MDS crashed two times with the following message:

      <4>general protection fault: 0000 [#1] SMP 
      <4>last sysfs file: /sys/devices/system/cpu/online
      <4>CPU 26 
      <4>Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) ldiskfs(U) jbd2 lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) 8021q garp stp llc nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm dm_round_robin dm_multipath microcode iTCO_wdt iTCO_vendor_support zfs(P)(U) zcommon(P)(U) znvpair(P)(U) spl(U) zlib_deflate zavl(P)(U) zunicode(P)(U) sb_edac edac_core joydev lpc_ich mfd_core i2c_i801 ioatdma sg igb dca i2c_algo_bit i2c_core ext3 jbd mbcache sd_mod crc_t10dif ahci wmi isci libsas mpt2sas scsi_transport_sas raid_class mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en ptp pps_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod scsi_dh_rdac [last unloaded: scsi_wait_scan]
      <4>
      <4>Pid: 6399, comm: mdt02_008 Tainted: P           -- ------------    2.6.32-573.26.1.el6_lustre.x86_64 #1 Intel Corporation S2600GZ ........../S2600GZ
      <4>RIP: 0010:[<ffffffffa1083c9c>]  [<ffffffffa1083c9c>] osd_xattr_get+0x32c/0x5b0 [osd_ldiskfs]
      <4>RSP: 0018:ffff8803fb9bf960  EFLAGS: 00010206
      <4>RAX: 00000000ffffffff RBX: ffff8803fc661cc0 RCX: dead000000100100
      <4>RDX: 0000000000000003 RSI: ffff88080efacdf8 RDI: ffffffffa12f59e4
      <4>RBP: ffff8803fb9bf9b0 R08: 000000000000000b R09: ffff8803fc661d78
      <4>R10: ffff88082a2e509c R11: 0000000000000000 R12: ffff88081392f010
      <4>R13: ffffffffa12f59e4 R14: ffff8803de29bb70 R15: ffff880813931000
      <4>FS:  0000000000000000(0000) GS:ffff88044e540000(0000) knlGS:0000000000000000
      <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      <4>CR2: 0000003d4feacd90 CR3: 0000000001a8d000 CR4: 00000000000407e0
      <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>Process mdt02_008 (pid: 6399, threadinfo ffff8803fb9bc000, task ffff8803fb946040)
      <4>Stack:
      <4> ffff8803fc661d78 ffff88080efacdc0 000000000000000b ffff88080efacdf8
      <4><d> ffff8803fb9bf9c0 ffff88081392f000 ffff8803fc661cc0 ffff88082cca66c0
      <4><d> ffffffffa12f59e4 ffff88081392f010 ffff8803fb9bf9f0 ffffffffa12ccdc3
      <4>Call Trace:
      <4> [<ffffffffa12ccdc3>] lod_get_ea+0xc3/0x530 [lod]
      <4> [<ffffffffa12de6bc>] lod_ah_init+0x6cc/0x980 [lod]
      <4> [<ffffffffa1359e49>] mdd_object_make_hint+0x139/0x180 [mdd]
      <4> [<ffffffffa1084f08>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs]
      <4> [<ffffffffa1356251>] mdd_create+0x6f1/0x1770 [mdd]
      <4> [<ffffffffa090ca41>] ? lu_object_find_at+0xb1/0xe0 [obdclass]
      <4> [<ffffffffa1212b94>] ? mdt_version_save+0x84/0x1a0 [mdt]
      <4> [<ffffffffa121cc4c>] mdt_reint_create+0xbdc/0xfe0 [mdt]
      <4> [<ffffffffa120e30c>] ? mdt_root_squash+0x2c/0x3f0 [mdt]
      <4> [<ffffffffa0b058db>] ? lustre_pack_reply_v2+0x1eb/0x280 [ptlrpc]
      <4> [<ffffffff81299b7a>] ? strlcpy+0x4a/0x60
      <4> [<ffffffffa120f97a>] ? old_init_ucred_common+0xda/0x2b0 [mdt]
      <4> [<ffffffffa1211ead>] mdt_reint_rec+0x5d/0x200 [mdt]
      <4> [<ffffffffa11fd5db>] mdt_reint_internal+0x62b/0xa50 [mdt]
      <4> [<ffffffffa11fdeab>] mdt_reint+0x6b/0x120 [mdt]
      <4> [<ffffffffa0b69ccc>] tgt_request_handle+0x8ec/0x1440 [ptlrpc]
      <4> [<ffffffffa0b16501>] ptlrpc_main+0xd31/0x1800 [ptlrpc]
      <4> [<ffffffff8106ee50>] ? pick_next_task_fair+0xd0/0x130
      <4> [<ffffffff81539896>] ? schedule+0x176/0x3a0
      <4> [<ffffffffa0b157d0>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
      <4> [<ffffffff810a138e>] kthread+0x9e/0xc0
      <4> [<ffffffff8100c28a>] child_rip+0xa/0x20
      <4> [<ffffffff810a12f0>] ? kthread+0x0/0xc0
      <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
      <4>Code: e0 48 8b 8b b8 00 00 00 4c 8d 8b b8 00 00 00 49 89 c0 4c 39 c9 75 14 e9 8a 01 00 00 0f 1f 00 48 8b 09 49 39 c9 0f 84 7b 01 00 00 <4c> 3b 41 18 75 ee 48 8d 41 38 4c 89 c2 4c 89 ef 48 89 4d b8 4c 
      <1>RIP  [<ffffffffa1083c9c>] osd_xattr_get+0x32c/0x5b0 [osd_ldiskfs]
      <4> RSP <ffff8803fb9bf960>
      

      Sequence of events:

      • 2016-09-02-12:58:40 First crash; happened during 'normal' operations while no fault had been injected
      • 2016-09-02-16:11:03 Second crash; happened after MDS restart
      • Both incidents occurred on the same node (lola-11)
      • No errors on other nodes can be correlate to these events

      Attached files: messages, console, vmcore files of node lola-11 of both crashes. (Note console have time stamps printed in 5 min intervals)
      Crash dump files are available.

      Attachments

        Issue Links

          Activity

            People

              bzzz Alex Zhuravlev
              heckes Frank Heckes (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: