Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7430

General protection fault: 0000 upon mounting MDT

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.8.0
    • Lustre 2.8.0
    • lola
      build: tip of master(df6cf859bbb29392064e6ddb701f3357e01b3a13) + patches
    • 3
    • 9223372036854775807

    Description

      The error occurred during soak testing of build '20151113' (see https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&spaceKey=Releases#SoakTestingonLola-20151113). DNE is enabled. OSTs have been formated with zfs, MDTs with ldiskfs as backend. MDSes are configured in active-active HA failover configuration.

      During mount of mdt-2 the following error messages were printed:

      Nov 13 16:27:52 lola-9 kernel: LDISKFS-fs (dm-9): mounted filesystem with ordered data mode. quota=on. Opts: 
      Nov 13 16:27:53 lola-9 kernel: LustreError: 6485:0:(tgt_lastrcvd.c:1458:tgt_clients_data_init()) soaked-MDT0002: duplicate export for client generation 1
      Nov 13 16:27:53 lola-9 kernel: LustreError: 6485:0:(obd_config.c:575:class_setup()) setup soaked-MDT0002 failed (-114)
      Nov 13 16:27:53 lola-9 kernel: LustreError: 6485:0:(obd_config.c:1663:class_config_llog_handler()) MGC192.168.1.108@o2ib10: cfg command failed: rc = -114
      Nov 13 16:27:53 lola-9 kernel: Lustre:    cmd=cf003 0:soaked-MDT0002  1:soaked-MDT0002_UUID  2:2  3:soaked-MDT0002-mdtlov  4:f  
      Nov 13 16:27:53 lola-9 kernel: 
      Nov 13 16:27:53 lola-9 kernel: LustreError: 15c-8: MGC192.168.1.108@o2ib10: The configuration from log 'soaked-MDT0002' failed (-114). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      Nov 13 16:27:53 lola-9 kernel: LustreError: 6298:0:(obd_mount_server.c:1306:server_start_targets()) failed to start server soaked-MDT0002: -114
      Nov 13 16:27:53 lola-9 kernel: LustreError: 6298:0:(obd_mount_server.c:1794:server_fill_super()) Unable to start targets: -114
      Nov 13 16:27:53 lola-9 kernel: LustreError: 6298:0:(obd_config.c:622:class_cleanup()) Device 4 not setup
      

      before crashing with

      <4>general protection fault: 0000 [#1] SMP
      <4>last sysfs file: /sys/module/lfsck/initstate
      <4>CPU 25
      <4>Modules linked in: mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) ldiskfs(U) jbd2 8021q garp stp llc nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm scsi_dh_rdac dm_round_robin dm_multipath microcode iTCO_wdt iTCO_vendor_support zfs(P)(U) zcommon(P)(U) znvpair(P)(U) spl(U) zlib_deflate zavl(P)(U) zunicode(P)(U) sb_edac edac_core lpc_ich mfd_core i2c_i801 ioatdma sg igb dca i2c_algo_bit i2c_core ptp pps_core ext3 jbd mbcache sd_mod crc_t10dif ahci isci libsas wmi mpt2sas scsi_transport_sas raid_class mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      <4>
      <4>Pid: 6329, comm: obd_zombid Tainted: P           ---------------    2.6.32-504.30.3.el6_lustre.gb64632c.x86_64 #1 Intel Corporation S2600GZ ........../S2600GZ
      <4>RIP: 0010:[<ffffffffa0c4a6ed>]  [<ffffffffa0c4a6ed>] tgt_client_free+0x25d/0x610 [ptlrpc]
      <4>RSP: 0018:ffff8808337fddd0  EFLAGS: 00010206
      <4>RAX: 5a5a5a5a5a5a5a5a RBX: ffff8803b80c2400 RCX: ffff8803b80c6ec0
      <4>RDX: 0000000000000007 RSI: 5a5a5a5a5a5a5a5a RDI: 0000000000000282
      <4>RBP: ffff8808337fde00 R08: 5a5a5a5a5a5a5a5a R09: 5a5a5a5a5a5a5a5a
      <4>R10: 5a5a5a5a5a5a5a5a R11: 0000000000000000 R12: ffff8803b630d0b0
      <4>R13: 5a5a5a5a5a5a5a5a R14: 5a5a5a5a5a5a5a5a R15: 5a5a5a5a5a5a5a5a
      <4>FS:  0000000000000000(0000) GS:ffff88044e520000(0000) knlGS:0000000000000000
      <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      <4>CR2: 0000003232070df0 CR3: 0000000001a85000 CR4: 00000000000407e0
      <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>Process obd_zombid (pid: 6329, threadinfo ffff8808337fc000, task ffff880834c75520)
      <4>Stack:
      <4> ffff8803b6308038 ffff8803b80c2400 0000370000000000 ffff8803b80c2400
      <4><d> ffff8803b6308038 ffff880834c75520 ffff8808337fde20 ffffffffa126ff81
      <4><d> ffff8803b6308078 0000000000000000 ffff8808337fde60 ffffffffa099a350
      <4>Call Trace:
      <4> [<ffffffffa126ff81>] mdt_destroy_export+0x71/0x220 [mdt]
      <4> [<ffffffffa099a350>] obd_zombie_impexp_cull+0x5e0/0xac0 [obdclass]
      <4> [<ffffffffa099a895>] obd_zombie_impexp_thread+0x65/0x190 [obdclass]
      <4> [<ffffffff81064c00>] ? default_wake_function+0x0/0x20
      <4> [<ffffffffa099a830>] ? obd_zombie_impexp_thread+0x0/0x190 [obdclass]
      <4> [<ffffffff8109e78e>] kthread+0x9e/0xc0
      <4> [<ffffffff8100c28a>] child_rip+0xa/0x20
      <4> [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
      <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
      <4>Code: 00 00 48 c7 83 c8 02 00 00 00 00 00 00 85 d2 78 4a 4d 85 e4 0f 84 4e 02 00 00 49 8b 84 24 18 03 00 00 48 85 c0 0f 84 3d 02 00 00 <f0> 0f b3 10 19 d2 85 d2 0f 84 23 03 00 00 f6 83 6f 01 00 00 02 
      <1>RIP  [<ffffffffa0c4a6ed>] tgt_client_free+0x25d/0x610 [ptlrpc]
      <4> RSP <ffff8808337fddd0>
      

      Attached files: console, messages of node lola-9

      Attachments

        1. console-lola-9.log.gz
          880 kB
        2. dump_today.out
          48 kB
        3. messages-lola-9.log.bz2
          659 kB

        Issue Links

          Activity

            [LU-7430] General protection fault: 0000 upon mounting MDT

            well, it should be unique as it's used to bind the replies to the clients. essentially it's unique client id growing monotonically. it's calculated from last_rcvd on every boot. given 1 was duplicated, it looks like the whole process of last_rcvd scanning was skipped for a reason.

            bzzz Alex Zhuravlev added a comment - well, it should be unique as it's used to bind the replies to the clients. essentially it's unique client id growing monotonically. it's calculated from last_rcvd on every boot. given 1 was duplicated, it looks like the whole process of last_rcvd scanning was skipped for a reason.

            Grégoire, Alex, is there a reason that the generation needs to be unique between different clients? If yes, then we need to examine the code that assigns the generation, as well as the recovery code to ensure that there aren't old slots on disk that are not cleared after recovery is completed.

            adilger Andreas Dilger added a comment - Grégoire, Alex, is there a reason that the generation needs to be unique between different clients? If yes, then we need to examine the code that assigns the generation, as well as the recovery code to ensure that there aren't old slots on disk that are not cleared after recovery is completed.

            Alex: No it isn't. The multipathd queries for some reason the device of the Emc^2 (or is it Dell now )
            controller devices although they're blacklisted in mulitpath configuration file. If you refer to this:

            end_request: critical target error, dev sdh, sector 0
            sd 0:0:1:7: [sdh]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
            sd 0:0:1:7: [sdh]  Sense Key : Illegal Request [current] 
            sd 0:0:1:7: [sdh]  Add. Sense: Logical block address out of range
            sd 0:0:1:7: [sdh] CDB: Read(10): 28 00 00 00 00 00 00 02 00 00
            end_request: critical target error, dev sdh, sector 0
            __ratelimit: 310 callbacks suppressed
            Buffer I/O error on device sdh, logical block 0
            Buffer I/O error on device sdh, logical block 1
            Buffer I/O error on device sdh, logical block 2
            Buffer I/O error on device sdh, logical block 3
            Buffer I/O error on device sdh, logical block 4
            Buffer I/O error on device sdh, logical block 5
            Buffer I/O error on device sdh, logical block 6
            Buffer I/O error on device sdh, logical block 7
            Buffer I/O error on device sdh, logical block 8
            Buffer I/O error on device sdh, logical block 9
            

            the messages can be ignored.

            heckes Frank Heckes (Inactive) added a comment - Alex: No it isn't. The multipathd queries for some reason the device of the Emc^2 (or is it Dell now ) controller devices although they're blacklisted in mulitpath configuration file. If you refer to this: end_request: critical target error, dev sdh, sector 0 sd 0:0:1:7: [sdh] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 0:0:1:7: [sdh] Sense Key : Illegal Request [current] sd 0:0:1:7: [sdh] Add. Sense: Logical block address out of range sd 0:0:1:7: [sdh] CDB: Read(10): 28 00 00 00 00 00 00 02 00 00 end_request: critical target error, dev sdh, sector 0 __ratelimit: 310 callbacks suppressed Buffer I/O error on device sdh, logical block 0 Buffer I/O error on device sdh, logical block 1 Buffer I/O error on device sdh, logical block 2 Buffer I/O error on device sdh, logical block 3 Buffer I/O error on device sdh, logical block 4 Buffer I/O error on device sdh, logical block 5 Buffer I/O error on device sdh, logical block 6 Buffer I/O error on device sdh, logical block 7 Buffer I/O error on device sdh, logical block 8 Buffer I/O error on device sdh, logical block 9 the messages can be ignored.
            di.wang Di Wang added a comment -

            Here you are.

            di.wang Di Wang added a comment - Here you are.

            How can I have a look at the lustre logs of the failing MDS ?

            pichong Gregoire Pichon added a comment - How can I have a look at the lustre logs of the failing MDS ?
            di.wang Di Wang added a comment -

            Yes, but this happened several times (> 5 times) specially at this place in soak-tests, so I doubt this because of the I/O errors.

            di.wang Di Wang added a comment - Yes, but this happened several times (> 5 times) specially at this place in soak-tests, so I doubt this because of the I/O errors.

            the console output contains a lot of I/O errors. can this is a root cause?

            bzzz Alex Zhuravlev added a comment - the console output contains a lot of I/O errors. can this is a root cause?
            di.wang Di Wang added a comment -

            Just investigate this problem a bit. So there are two problems here.

            1. It seems MDT assign duplicate generations in last rcvd file.

            00000001:00080000:8.0:1448270282.553342:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0006-mdtlov_UUID idx: 0 lr: 661425870098 srv lr: 661424965096 lx: 1518615346990020 gen 0
            00000001:00080000:8.0:1448270282.553360:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0004-mdtlov_UUID idx: 1 lr: 661425866544 srv lr: 661424965096 lx: 1518608412495276 gen 0
            00000001:00080000:8.0:1448270282.553368:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0003-mdtlov_UUID idx: 2 lr: 661425864081 srv lr: 661424965096 lx: 1518609357754768 gen 0
            00000001:00080000:8.0:1448270282.553383:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0007-mdtlov_UUID idx: 3 lr: 661425870368 srv lr: 661424965096 lx: 1518615346992984 gen 0
            00000001:00080000:8.0:1448270282.553390:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0005-mdtlov_UUID idx: 4 lr: 661425870380 srv lr: 661424965096 lx: 1518608412535008 gen 0
            00000001:00080000:8.0:1448270282.553398:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0002-mdtlov_UUID idx: 5 lr: 661425870358 srv lr: 661424965096 lx: 1518609357956232 gen 0
            00000001:00080000:8.0:1448270282.553404:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0000-mdtlov_UUID idx: 6 lr: 661425870382 srv lr: 661424965096 lx: 1518609357957028 gen 0
            00000001:00080000:8.0:1448270282.553413:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: 7de62113-8c1f-79c4-ca4f-2848938e1c9e idx: 7 lr: 0 srv lr: 661424965096 lx: 0 gen 19
            00000001:00080000:8.0:1448270282.553420:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: 477a48c7-fa0c-b67e-c777-17f9099c0649 idx: 8 lr: 618478612079 srv lr: 661424965096 lx: 0 gen 36
            00000001:00080000:8.0:1448270282.553427:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: 2c75515f-eba9-e6e7-c604-c055fb7778c9 idx: 9 lr: 657130421084 srv lr: 661424965096 lx: 0 gen 37
            00000001:00080000:8.0:1448270282.553434:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: 03f180c9-ba1c-7578-bc08-987c14296d5a idx: 10 lr: 0 srv lr: 661424965096 lx: 0 gen 1
            00000001:00080000:8.0:1448270282.553441:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: c5426649-6edd-3483-f076-e3c10338edce idx: 11 lr: 0 srv lr: 661424965096 lx: 0 gen 39
            00000001:00080000:8.0:1448270282.553447:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: 937e0860-a0d8-59ba-f989-c403afaa31e2 idx: 12 lr: 635656643323 srv lr: 661424965096 lx: 0 gen 40
            00000001:00080000:8.0:1448270282.553454:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: b81b7277-aea2-e147-05e4-b16f1a92332a idx: 13 lr: 0 srv lr: 661424965096 lx: 0 gen 1
            00000001:00020000:8.0:1448270282.553459:0:7875:0:(tgt_lastrcvd.c:1458:tgt_clients_data_init()) soaked-MDT0001: duplicate export for client generation 1
            

            This causes the mount failure. Hmm, this looks like multiple slot RPC patch problem.

            2. The error handler for this mount failure seems not be handled correctly, which caused the panic here.

            general protection fault: 0000 [#1] SMP 
            last sysfs file: /sys/module/lfsck/initstate
            CPU 25 
            Modules linked in: mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) ldiskfs(U) jbd2 lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) 8021q garp stp llc nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm scsi_dh_rdac dm_round_robin dm_multipath microcode iTCO_wdt iTCO_vendor_support zfs(P)(U) zcommon(P)(U) znvpair(P)(U) spl(U) zlib_deflate zavl(P)(U) zunicode(P)(U) sb_edac edac_core lpc_ich mfd_core i2c_i801 ioatdma sg igb dca i2c_algo_bit i2c_core ptp pps_core ext3 jbd mbcache sd_mod crc_t10dif ahci wmi isci libsas mpt2sas scsi_transport_sas raid_class mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
            
            Pid: 7713, comm: obd_zombid Tainted: P           ---------------    2.6.32-504.30.3.el6_lustre.gf1f8275.x86_64 #1 Intel Corporation S2600GZ ........../S2600GZ
            RIP: 0010:[<ffffffffa0b9972d>]  [<ffffffffa0b9972d>] tgt_client_free+0x25d/0x610 [ptlrpc]
            RSP: 0018:ffff88041b5f5dd0  EFLAGS: 00010206
            RAX: 5a5a5a5a5a5a5a5a RBX: ffff880820d80400 RCX: 5a5a5a5a5a5a5a5a
            RDX: 000000000000000d RSI: 5a5a5a5a5a5a5a5a RDI: 0000000000000286
            RBP: ffff88041b5f5e00 R08: 5a5a5a5a5a5a5a5a R09: 5a5a5a5a5a5a5a5a
            R10: 5a5a5a5a5a5a5a5a R11: 0000000000000000 R12: ffff88083366f0b0
            R13: 5a5a5a5a5a5a5a5a R14: 5a5a5a5a5a5a5a5a R15: 5a5a5a5a5a5a5a5a
            FS:  0000000000000000(0000) GS:ffff88044e520000(0000) knlGS:0000000000000000
            CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
            CR2: 00007f0d4f3bd000 CR3: 0000000833499000 CR4: 00000000000407e0
            DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
            DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
            Process obd_zombid (pid: 7713, threadinfo ffff88041b5f4000, task ffff880428628040)
            
            di.wang Di Wang added a comment - Just investigate this problem a bit. So there are two problems here. 1. It seems MDT assign duplicate generations in last rcvd file. 00000001:00080000:8.0:1448270282.553342:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0006-mdtlov_UUID idx: 0 lr: 661425870098 srv lr: 661424965096 lx: 1518615346990020 gen 0 00000001:00080000:8.0:1448270282.553360:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0004-mdtlov_UUID idx: 1 lr: 661425866544 srv lr: 661424965096 lx: 1518608412495276 gen 0 00000001:00080000:8.0:1448270282.553368:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0003-mdtlov_UUID idx: 2 lr: 661425864081 srv lr: 661424965096 lx: 1518609357754768 gen 0 00000001:00080000:8.0:1448270282.553383:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0007-mdtlov_UUID idx: 3 lr: 661425870368 srv lr: 661424965096 lx: 1518615346992984 gen 0 00000001:00080000:8.0:1448270282.553390:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0005-mdtlov_UUID idx: 4 lr: 661425870380 srv lr: 661424965096 lx: 1518608412535008 gen 0 00000001:00080000:8.0:1448270282.553398:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0002-mdtlov_UUID idx: 5 lr: 661425870358 srv lr: 661424965096 lx: 1518609357956232 gen 0 00000001:00080000:8.0:1448270282.553404:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0000-mdtlov_UUID idx: 6 lr: 661425870382 srv lr: 661424965096 lx: 1518609357957028 gen 0 00000001:00080000:8.0:1448270282.553413:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: 7de62113-8c1f-79c4-ca4f-2848938e1c9e idx: 7 lr: 0 srv lr: 661424965096 lx: 0 gen 19 00000001:00080000:8.0:1448270282.553420:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: 477a48c7-fa0c-b67e-c777-17f9099c0649 idx: 8 lr: 618478612079 srv lr: 661424965096 lx: 0 gen 36 00000001:00080000:8.0:1448270282.553427:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: 2c75515f-eba9-e6e7-c604-c055fb7778c9 idx: 9 lr: 657130421084 srv lr: 661424965096 lx: 0 gen 37 00000001:00080000:8.0:1448270282.553434:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: 03f180c9-ba1c-7578-bc08-987c14296d5a idx: 10 lr: 0 srv lr: 661424965096 lx: 0 gen 1 00000001:00080000:8.0:1448270282.553441:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: c5426649-6edd-3483-f076-e3c10338edce idx: 11 lr: 0 srv lr: 661424965096 lx: 0 gen 39 00000001:00080000:8.0:1448270282.553447:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: 937e0860-a0d8-59ba-f989-c403afaa31e2 idx: 12 lr: 635656643323 srv lr: 661424965096 lx: 0 gen 40 00000001:00080000:8.0:1448270282.553454:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: b81b7277-aea2-e147-05e4-b16f1a92332a idx: 13 lr: 0 srv lr: 661424965096 lx: 0 gen 1 00000001:00020000:8.0:1448270282.553459:0:7875:0:(tgt_lastrcvd.c:1458:tgt_clients_data_init()) soaked-MDT0001: duplicate export for client generation 1 This causes the mount failure. Hmm, this looks like multiple slot RPC patch problem. 2. The error handler for this mount failure seems not be handled correctly, which caused the panic here. general protection fault: 0000 [#1] SMP last sysfs file: /sys/module/lfsck/initstate CPU 25 Modules linked in: mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) ldiskfs(U) jbd2 lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) 8021q garp stp llc nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm scsi_dh_rdac dm_round_robin dm_multipath microcode iTCO_wdt iTCO_vendor_support zfs(P)(U) zcommon(P)(U) znvpair(P)(U) spl(U) zlib_deflate zavl(P)(U) zunicode(P)(U) sb_edac edac_core lpc_ich mfd_core i2c_i801 ioatdma sg igb dca i2c_algo_bit i2c_core ptp pps_core ext3 jbd mbcache sd_mod crc_t10dif ahci wmi isci libsas mpt2sas scsi_transport_sas raid_class mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 7713, comm: obd_zombid Tainted: P --------------- 2.6.32-504.30.3.el6_lustre.gf1f8275.x86_64 #1 Intel Corporation S2600GZ ........../S2600GZ RIP: 0010:[<ffffffffa0b9972d>] [<ffffffffa0b9972d>] tgt_client_free+0x25d/0x610 [ptlrpc] RSP: 0018:ffff88041b5f5dd0 EFLAGS: 00010206 RAX: 5a5a5a5a5a5a5a5a RBX: ffff880820d80400 RCX: 5a5a5a5a5a5a5a5a RDX: 000000000000000d RSI: 5a5a5a5a5a5a5a5a RDI: 0000000000000286 RBP: ffff88041b5f5e00 R08: 5a5a5a5a5a5a5a5a R09: 5a5a5a5a5a5a5a5a R10: 5a5a5a5a5a5a5a5a R11: 0000000000000000 R12: ffff88083366f0b0 R13: 5a5a5a5a5a5a5a5a R14: 5a5a5a5a5a5a5a5a R15: 5a5a5a5a5a5a5a5a FS: 0000000000000000(0000) GS:ffff88044e520000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00007f0d4f3bd000 CR3: 0000000833499000 CR4: 00000000000407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process obd_zombid (pid: 7713, threadinfo ffff88041b5f4000, task ffff880428628040)

            Crash file has been stored at lhn:/scratch/crashdumps/lu-7430/127.0.0.1-2015-11-13-16:28:09 and can be uploaded on demand to desired storage location.

            heckes Frank Heckes (Inactive) added a comment - Crash file has been stored at lhn:/scratch/crashdumps/lu-7430/127.0.0.1-2015-11-13-16:28:09 and can be uploaded on demand to desired storage location.

            People

              pichong Gregoire Pichon
              heckes Frank Heckes (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: