[LU-7638] general protection fault: 0000 after mounting MDTs Created: 07/Jan/16 Updated: 28/Jan/16 Resolved: 28/Jan/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Frank Heckes (Inactive) | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | soak | ||
| Environment: |
lola |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Error occurred during soak testing of build '20160104' (see https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&spaceKey=Releases#SoakTestingonLola-20160104). DNE is enabled. MDTs have been formatted using ldiskfs, OSTs using zfs. MDS nodes are configured in active-active HA configuration. (mds_restart means hard reset of MDS node and remount of MDTs (primary resources)
<4>general protection fault: 0000 [#1] SMP <4>last sysfs file: /sys/devices/system/cpu/online <4>CPU 2 <4>Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) ldiskfs(U) jbd2 lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) 8021q garp stp llc nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm scsi_dh_rdac dm_round_robin dm_multipath microcode iTCO_wdt iTCO_vendor_support zfs(P)(U) zcommon(P)(U) znvpair(P)(U) spl(U) zlib_deflate zavl(P)(U) zunicode(P)(U) sb_edac edac_core lpc_ich mfd_core i2c_i801 ioatdma sg igb dca i2c_algo_bit i2c_core ptp pps_core ext3 jbd mbcache sd_mod crc_t10dif ahci isci libsas wmi mpt2sas scsi_transport_sas raid_class mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] <4> <4>Pid: 5372, comm: lod0002_rec0004 Tainted: P --------------- 2.6.32-504.30.3.el6_lustre.g3f4572c.x86_64 #1 Intel Corporation S2600GZ ........../S2600GZ <4>RIP: 0010:[<ffffffffa0b8ee8b>] [<ffffffffa0b8ee8b>] insert_update_records_to_replay_list+0xf6b/0x1b70 [ptlrpc] <4>RSP: 0018:ffff880821d05a50 EFLAGS: 00010296 <4>RAX: 0000000000005a5a RBX: ffff880804003d78 RCX: ffff880434faa2e0 <4>RDX: 5a5a5a5a5a5a5a5a RSI: 0000000000000000 RDI: 0000000000000004 <4>RBP: ffff880821d05ac0 R08: 0000000000000000 R09: 0000000000000000 <4>R10: 000000000000004d R11: 0000000000000000 R12: ffff8803ec7afe40 <4>R13: 5a5a5a5a5a5a5a42 R14: ffff880804003d88 R15: ffff8803ec7afe58 <4>FS: 0000000000000000(0000) GS:ffff880038240000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>CR2: 00007f1cacb4f000 CR3: 0000000001a85000 CR4: 00000000000407e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process lod0002_rec0004 (pid: 5372, threadinfo ffff880821d04000, task ffff880821f2c040) <4>Stack: <4> ffff8807fa7c40c0 ffff880804cc5078 ffff880821d05ac0 ffff880804cc50a8 <4><d> ffff8803ef8a72d8 0000000421d05ad0 ffff880804cc5088 ffff880804cc50a8 <4><d> 0000000000007fff ffff880804cc5078 ffff8803ef8a7000 ffff88041b9b2360 <4>Call Trace: <4> [<ffffffffa1303b79>] lod_process_recovery_updates+0x1e9/0x420 [lod] <4> [<ffffffffa089048a>] llog_process_thread+0x94a/0x1040 [obdclass] <4> [<ffffffffa0890c3d>] llog_process_or_fork+0xbd/0x5d0 [obdclass] <4> [<ffffffffa1303990>] ? lod_process_recovery_updates+0x0/0x420 [lod] <4> [<ffffffffa0893e38>] llog_cat_process_cb+0x458/0x600 [obdclass] <4> [<ffffffffa089048a>] llog_process_thread+0x94a/0x1040 [obdclass] <4> [<ffffffffa08e02e4>] ? dt_read+0x14/0x50 [obdclass] <4> [<ffffffffa0890c3d>] llog_process_or_fork+0xbd/0x5d0 [obdclass] <4> [<ffffffffa08939e0>] ? llog_cat_process_cb+0x0/0x600 [obdclass] <4> [<ffffffffa089269d>] llog_cat_process_or_fork+0x1ad/0x300 [obdclass] <4> [<ffffffffa13301b9>] ? lod_sub_prep_llog+0x4f9/0x7a0 [lod] <4> [<ffffffffa1303990>] ? lod_process_recovery_updates+0x0/0x420 [lod] <4> [<ffffffffa0892809>] llog_cat_process+0x19/0x20 [obdclass] <4> [<ffffffffa13096f3>] lod_sub_recovery_thread+0x4e3/0xcf0 [lod] <4> [<ffffffffa1309210>] ? lod_sub_recovery_thread+0x0/0xcf0 [lod] <4> [<ffffffff8109e78e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c28a>] child_rip+0xa/0x20 <4> [<ffffffff8109e6f0>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20 <4>Code: 4d 89 7c 24 20 49 89 44 24 08 49 89 44 24 10 8b 55 bc 41 89 14 24 e8 b5 e9 99 e0 49 8b 55 38 48 39 d3 4c 8d 6a e8 74 1f 8b 7d bc <3b> 7a e8 74 6f 8b 4d bc eb 05 3b 48 e8 74 65 49 8b 45 18 48 39 <1>RIP [<ffffffffa0b8ee8b>] insert_update_records_to_replay_list+0xf6b/0x1b70 [ptlrpc] <4> RSP <ffff880821d05a50> Attached messages, console and vmcore-dmesg log file of lola-9. |
| Comments |
| Comment by Frank Heckes (Inactive) [ 07/Jan/16 ] |
|
Crash file is saved at lola-1:/scratch/crashdumps/lu-7638/lola-9-127.0.0.1-2016-01-06-06:47:10 |
| Comment by Gerrit Updater [ 08/Jan/16 ] |
|
wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/17885 |
| Comment by Frank Heckes (Inactive) [ 11/Jan/16 ] |
|
Same error happened also for build '20160108' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160108).
<4>general protection fault: 0000 [#1] SMP <4>last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/infiniband_mad/umad0/port <4>CPU 14 <4>Modules linked in: mgs(U) osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) ldiskfs(U) jbd2 lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) 8021q garp stp llc nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm scsi_dh_rdac dm_round_robin dm_multipath microcode iTCO_wdt iTCO_vendor_support zfs(P)(U) zcommon(P)(U) znvpair(P)(U) spl(U) zlib_deflate zavl(P)(U) zunicode(P)(U) sb_edac edac_core lpc_ich mfd_core i2c_i801 ioatdma sg igb dca i2c_algo_bit i2c_core ptp pps_core ext3 jbd mbcache sd_mod crc_t10dif ahci wmi isci libsas mpt2sas scsi_transport_sas raid_class mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] <4> <4>Pid: 4393, comm: mdt03_000 Tainted: P --------------- 2.6.32-504.30.3.el6_lustre.g990ef68.x86_64 #1 Intel Corporation S2600GZ ........../S2600GZ <4>RIP: 0010:[<ffffffffa083bacc>] [<ffffffffa083bacc>] llog_exist+0x3c/0x170 [obdclass] <4>RSP: 0000:ffff880826eb1990 EFLAGS: 00010206 <4>RAX: 5a5a5a5a5a5a5a5a RBX: ffff88081cdf61c0 RCX: ffff8808336eb8c0 <4>RDX: ffff88040666d8c0 RSI: ffff88081cdf61c0 RDI: ffff88081cdf61c0 <4>RBP: ffff880826eb19a0 R08: ffff8808100262c0 R09: 0000000000010000 <4>R10: 0000000000000010 R11: 0000000000004000 R12: 0000000000000000 <4>R13: ffff88082d540c80 R14: ffff88082dbbcc00 R15: ffff8808100262c0 <4>FS: 0000000000000000(0000) GS:ffff88044e4c0000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>CR2: 000000000168a950 CR3: 0000000001a85000 CR4: 00000000000407e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process mdt03_000 (pid: 4393, threadinfo ffff880826eb0000, task ffff880826ea8ab0) <4>Stack: <4> ffff88082dbbcc00 ffff880421df67c0 ffff880826eb1a00 ffffffffa0845e5c <4><d> ffff88081cdf61c0 ffff88081cf84000 ffff880826eb19d0 0000000033d75218 <4><d> ffff8808336e41e0 ffff880421df67c0 ffff88082d540c80 ffff88081cf84000 <4>Call Trace: <4> [<ffffffffa0845e5c>] llog_cat_declare_add_rec+0x35c/0x610 [obdclass] <4> [<ffffffffa083c06f>] llog_declare_add+0x7f/0x1b0 [obdclass] <4> [<ffffffffa0b380cc>] top_trans_start+0x17c/0x920 [ptlrpc] <4> [<ffffffffa12a5e31>] lod_trans_start+0x61/0x70 [lod] <4> [<ffffffffa1350e84>] mdd_trans_start+0x14/0x20 [mdd] <4> [<ffffffffa133a67a>] mdd_create+0x9aa/0x1600 [mdd] <4> [<ffffffffa11ecb92>] ? mdt_version_check+0x132/0x440 [mdt] <4> [<ffffffffa11f1536>] mdt_reint_create+0xbb6/0xcc0 [mdt] <4> [<ffffffffa0ab769b>] ? lustre_pack_reply_v2+0x1eb/0x280 [ptlrpc] <4> [<ffffffff81294a3a>] ? strlcpy+0x4a/0x60 <4> [<ffffffffa11eba9d>] mdt_reint_rec+0x5d/0x200 [mdt] <4> [<ffffffffa11d787b>] mdt_reint_internal+0x62b/0xb80 [mdt] <4> [<ffffffffa11d826b>] mdt_reint+0x6b/0x120 [mdt] <4> [<ffffffffa0b21bbc>] tgt_request_handle+0x8ec/0x1470 [ptlrpc] <4> [<ffffffffa0ac9231>] ptlrpc_main+0xe41/0x1910 [ptlrpc] <4> [<ffffffff8152a39e>] ? thread_return+0x4e/0x7d0 <4> [<ffffffffa0ac83f0>] ? ptlrpc_main+0x0/0x1910 [ptlrpc] <4> [<ffffffff8109e78e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c28a>] child_rip+0xa/0x20 <4> [<ffffffff8109e6f0>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20 <4>Code: d6 f5 ff 01 48 89 fb 74 09 f6 05 6f d6 f5 ff 40 75 5d 48 85 db 0f 84 b4 00 00 00 48 8b 83 d8 00 00 00 48 85 c0 0f 84 a4 00 00 00 <48> 8b 40 58 48 85 c0 0f 84 e7 00 00 00 48 89 df ff d0 f6 05 3f <1>RIP [<ffffffffa083bacc>] llog_exist+0x3c/0x170 [obdclass] <4> RSP <ffff880826eb1990> Attached messages, console and vmcore-dmesg files to ticket. |
| Comment by Gerrit Updater [ 28/Jan/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17885/ |
| Comment by Joseph Gmitter (Inactive) [ 28/Jan/16 ] |
|
Landed for 2.8 |