Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7430

General protection fault: 0000 upon mounting MDT

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.8.0
    • Lustre 2.8.0
    • lola
      build: tip of master(df6cf859bbb29392064e6ddb701f3357e01b3a13) + patches
    • 3
    • 9223372036854775807

    Description

      The error occurred during soak testing of build '20151113' (see https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&spaceKey=Releases#SoakTestingonLola-20151113). DNE is enabled. OSTs have been formated with zfs, MDTs with ldiskfs as backend. MDSes are configured in active-active HA failover configuration.

      During mount of mdt-2 the following error messages were printed:

      Nov 13 16:27:52 lola-9 kernel: LDISKFS-fs (dm-9): mounted filesystem with ordered data mode. quota=on. Opts: 
      Nov 13 16:27:53 lola-9 kernel: LustreError: 6485:0:(tgt_lastrcvd.c:1458:tgt_clients_data_init()) soaked-MDT0002: duplicate export for client generation 1
      Nov 13 16:27:53 lola-9 kernel: LustreError: 6485:0:(obd_config.c:575:class_setup()) setup soaked-MDT0002 failed (-114)
      Nov 13 16:27:53 lola-9 kernel: LustreError: 6485:0:(obd_config.c:1663:class_config_llog_handler()) MGC192.168.1.108@o2ib10: cfg command failed: rc = -114
      Nov 13 16:27:53 lola-9 kernel: Lustre:    cmd=cf003 0:soaked-MDT0002  1:soaked-MDT0002_UUID  2:2  3:soaked-MDT0002-mdtlov  4:f  
      Nov 13 16:27:53 lola-9 kernel: 
      Nov 13 16:27:53 lola-9 kernel: LustreError: 15c-8: MGC192.168.1.108@o2ib10: The configuration from log 'soaked-MDT0002' failed (-114). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      Nov 13 16:27:53 lola-9 kernel: LustreError: 6298:0:(obd_mount_server.c:1306:server_start_targets()) failed to start server soaked-MDT0002: -114
      Nov 13 16:27:53 lola-9 kernel: LustreError: 6298:0:(obd_mount_server.c:1794:server_fill_super()) Unable to start targets: -114
      Nov 13 16:27:53 lola-9 kernel: LustreError: 6298:0:(obd_config.c:622:class_cleanup()) Device 4 not setup
      

      before crashing with

      <4>general protection fault: 0000 [#1] SMP
      <4>last sysfs file: /sys/module/lfsck/initstate
      <4>CPU 25
      <4>Modules linked in: mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) ldiskfs(U) jbd2 8021q garp stp llc nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm scsi_dh_rdac dm_round_robin dm_multipath microcode iTCO_wdt iTCO_vendor_support zfs(P)(U) zcommon(P)(U) znvpair(P)(U) spl(U) zlib_deflate zavl(P)(U) zunicode(P)(U) sb_edac edac_core lpc_ich mfd_core i2c_i801 ioatdma sg igb dca i2c_algo_bit i2c_core ptp pps_core ext3 jbd mbcache sd_mod crc_t10dif ahci isci libsas wmi mpt2sas scsi_transport_sas raid_class mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      <4>
      <4>Pid: 6329, comm: obd_zombid Tainted: P           ---------------    2.6.32-504.30.3.el6_lustre.gb64632c.x86_64 #1 Intel Corporation S2600GZ ........../S2600GZ
      <4>RIP: 0010:[<ffffffffa0c4a6ed>]  [<ffffffffa0c4a6ed>] tgt_client_free+0x25d/0x610 [ptlrpc]
      <4>RSP: 0018:ffff8808337fddd0  EFLAGS: 00010206
      <4>RAX: 5a5a5a5a5a5a5a5a RBX: ffff8803b80c2400 RCX: ffff8803b80c6ec0
      <4>RDX: 0000000000000007 RSI: 5a5a5a5a5a5a5a5a RDI: 0000000000000282
      <4>RBP: ffff8808337fde00 R08: 5a5a5a5a5a5a5a5a R09: 5a5a5a5a5a5a5a5a
      <4>R10: 5a5a5a5a5a5a5a5a R11: 0000000000000000 R12: ffff8803b630d0b0
      <4>R13: 5a5a5a5a5a5a5a5a R14: 5a5a5a5a5a5a5a5a R15: 5a5a5a5a5a5a5a5a
      <4>FS:  0000000000000000(0000) GS:ffff88044e520000(0000) knlGS:0000000000000000
      <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      <4>CR2: 0000003232070df0 CR3: 0000000001a85000 CR4: 00000000000407e0
      <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>Process obd_zombid (pid: 6329, threadinfo ffff8808337fc000, task ffff880834c75520)
      <4>Stack:
      <4> ffff8803b6308038 ffff8803b80c2400 0000370000000000 ffff8803b80c2400
      <4><d> ffff8803b6308038 ffff880834c75520 ffff8808337fde20 ffffffffa126ff81
      <4><d> ffff8803b6308078 0000000000000000 ffff8808337fde60 ffffffffa099a350
      <4>Call Trace:
      <4> [<ffffffffa126ff81>] mdt_destroy_export+0x71/0x220 [mdt]
      <4> [<ffffffffa099a350>] obd_zombie_impexp_cull+0x5e0/0xac0 [obdclass]
      <4> [<ffffffffa099a895>] obd_zombie_impexp_thread+0x65/0x190 [obdclass]
      <4> [<ffffffff81064c00>] ? default_wake_function+0x0/0x20
      <4> [<ffffffffa099a830>] ? obd_zombie_impexp_thread+0x0/0x190 [obdclass]
      <4> [<ffffffff8109e78e>] kthread+0x9e/0xc0
      <4> [<ffffffff8100c28a>] child_rip+0xa/0x20
      <4> [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
      <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
      <4>Code: 00 00 48 c7 83 c8 02 00 00 00 00 00 00 85 d2 78 4a 4d 85 e4 0f 84 4e 02 00 00 49 8b 84 24 18 03 00 00 48 85 c0 0f 84 3d 02 00 00 <f0> 0f b3 10 19 d2 85 d2 0f 84 23 03 00 00 f6 83 6f 01 00 00 02 
      <1>RIP  [<ffffffffa0c4a6ed>] tgt_client_free+0x25d/0x610 [ptlrpc]
      <4> RSP <ffff8808337fddd0>
      

      Attached files: console, messages of node lola-9

      Attachments

        1. console-lola-9.log.gz
          880 kB
        2. dump_today.out
          48 kB
        3. messages-lola-9.log.bz2
          659 kB

        Issue Links

          Activity

            People

              pichong Gregoire Pichon
              heckes Frank Heckes (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: