Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10806

Hard crash when mounting DNE MDT

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.12.0
    • Lustre 2.11.0, Lustre 2.12.0
    • None
    • Soak stress cluster - lustre-master-ib build 64 version=2.10.58_139_g630cd49
    • 3
    • 9223372036854775807

    Description

      Attempting to re-mount the filesystem after the upgrade, Have a hard crash on MDT0001. 

      Crash is repeatable. I will leave the system in this state for examination, then re-format non-DNE.

      Crash dumps are available on soak

       [  451.170602] LDISKFS-fs warning (device dm-2): ldiskfs_multi_mount_protect:322: MMP interval 42 higher than expected, please wait.[  493.737484] LDISKFS-fs (dm-2): recovery complete
      [  493.793102] LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,user_xattr,no_mbcache,nodelalloc
      [  495.357987] LustreError: 2384:0:(tgt_lastrcvd.c:1533:tgt_clients_data_init()) soaked-MDT0001: duplicate export for client generation 11
      [  495.646489] LustreError: 2384:0:(obd_config.c:559:class_setup()) setup soaked-MDT0001 failed (-114)
      [  495.646493] LustreError: 2384:0:(obd_config.c:1822:class_config_llog_handler()) MGC192.168.1.108@o2ib: cfg command failed: rc = -114
      [  495.646497] Lustre:    cmd=cf003 0:soaked-MDT0001  1:soaked-MDT0001_UUID  2:1  3:soaked-MDT0001-mdtlov  4:f[  495.646570] LustreError: 15c-8: MGC192.168.1.108@o2ib: The configuration from log 'soaked-MDT0001' failed (-114). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      [  495.646587] LustreError: 2303:0:(obd_mount_server.c:1383:server_start_targets()) failed to start server soaked-MDT0001: -114
      [  495.646728] LustreError: 2303:0:(obd_mount_server.c:1936:server_fill_super()) Unable to start targets: -114
      [  495.646760] LustreError: 2303:0:(obd_config.c:610:class_cleanup()) Device 4 not setup
      [  495.899986] BUG: unable to handle kernel NULL pointer dereference at 0000000000000378
      [  495.899999] IP: [<ffffffff816b683c>] _raw_spin_lock+0xc/0x30
      [  495.900002] PGD 0
      [  495.900005] Oops: 0002 [#1] SMP
      [  495.900073] Modules linked in: mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) zfs(POE) zunicode(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlx4_en(OE) sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd dm_round_robin iTCO_wdt iTCO_vendor_support ipmi_ssif sg joydev ipmi_si ipmi_devintf mei_me ioatdma ipmi_msghandler pcspkr wmi mei lpc_ich shpchp i2c_i801 dm_multipath
      [  495.900107]  dm_mod nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx4_ib(OE) ib_core(OE) mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops isci ahci igb mpt2sas libsas ttm libahci ptp crct10dif_pclmul pps_core crct10dif_common mlx4_core(OE) raid_class drm libata crc32c_intel dca mlx_compat(OE) scsi_transport_sas i2c_algo_bit devlink i2c_core
      
      [  495.900113] CPU: 10 PID: 2167 Comm: obd_zombid Tainted: P           OE  ------------   3.10.0-693.21.1.el7_lustre.x86_64 #1
      [  495.900114] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
      [  495.900117] task: ffff880036358fd0 ti: ffff8804176d0000 task.ti: ffff8804176d0000
      [  495.900122] RIP: 0010:[<ffffffff816b683c>]  [<ffffffff816b683c>] _raw_spin_lock+0xc/0x30
      [  495.900124] RSP: 0018:ffff8804176d3da8  EFLAGS: 00010246
      [  495.900126] RAX: 0000000000000000 RBX: ffff88081503c800 RCX: 000000018040003f
      [  495.900128] RDX: 0000000000000001 RSI: ffffea0020556b00 RDI: 0000000000000378
      [  495.900129] RBP: ffff8804176d3de0 R08: ffff8808155acf00 R09: 000000018040003f
      [  495.900131] R10: 0000000000000001 R11: ffffea0020556b00 R12: 0000000000000000
      [  495.900133] R13: 0000000000000378 R14: ffff880817131068 R15: ffff88081503c800
      [  495.900135] FS:  0000000000000000(0000) GS:ffff88082d880000(0000) knlGS:0000000000000000
      [  495.900137] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  495.900139] CR2: 0000000000000378 CR3: 0000000001a02000 CR4: 00000000000607e0
      [  495.900141] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  495.900143] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  495.900144] Call Trace:
      [  495.900246]  [<ffffffffc0d6b635>] ? tgt_grant_discard+0x35/0x190 [ptlrpc]
      [  495.900317]  [<ffffffffc0d3f74e>] ? tgt_client_free+0x17e/0x3b0 [ptlrpc]
      [  495.900354]  [<ffffffffc177c097>] mdt_destroy_export+0x87/0x200 [mdt]
      [  495.900410]  [<ffffffffc0a7b9be>] class_export_destroy+0xee/0x490 [obdclass]
      [  495.900448]  [<ffffffffc0a8434a>] obd_zombie_impexp_cull+0x39a/0x550 [obdclass]
      [  495.900479]  [<ffffffffc0a8456d>] obd_zombie_impexp_thread+0x6d/0x1c0 [obdclass]
      [  495.900489]  [<ffffffff810c7c70>] ? wake_up_state+0x20/0x20
      [  495.900519]  [<ffffffffc0a84500>] ? obd_zombie_impexp_cull+0x550/0x550 [obdclass]
      [  495.900526]  [<ffffffff810b4031>] kthread+0xd1/0xe0
      [  495.900530]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
      [  495.900537]  [<ffffffff816c0577>] ret_from_fork+0x77/0xb0
      [  495.900541]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
      [  495.900576] Code: 5d c3 0f 1f 44 00 00 85 d2 74 e4 0f 1f 40 00 eb ed 66 0f 1f 44 00 00 b8 01 00 00 00 5d c3 90 66 66 66 66 90 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 01 c3 55 89 c6 48 89 e5 e8 99 27 ff ff 5d
      [  495.900580] RIP  [<ffffffff816b683c>] _raw_spin_lock+0xc/0x30
      [  495.900581]  RSP <ffff8804176d3da8>
      [  495.900582] CR2: 0000000000000378
      
      
      

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: