[LU-8580] general protection fault: osd_xattr_get+0x32c/0x5b0 [osd_ldiskfs] Created: 05/Sep/16 Updated: 07/Oct/16 Resolved: 07/Oct/16 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Frank Heckes (Inactive) | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | soak | ||
| Environment: |
lola |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Error happened during soak testing of build '20160902' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160902) MDS crashed two times with the following message: <4>general protection fault: 0000 [#1] SMP <4>last sysfs file: /sys/devices/system/cpu/online <4>CPU 26 <4>Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) ldiskfs(U) jbd2 lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) 8021q garp stp llc nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm dm_round_robin dm_multipath microcode iTCO_wdt iTCO_vendor_support zfs(P)(U) zcommon(P)(U) znvpair(P)(U) spl(U) zlib_deflate zavl(P)(U) zunicode(P)(U) sb_edac edac_core joydev lpc_ich mfd_core i2c_i801 ioatdma sg igb dca i2c_algo_bit i2c_core ext3 jbd mbcache sd_mod crc_t10dif ahci wmi isci libsas mpt2sas scsi_transport_sas raid_class mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_en ptp pps_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod scsi_dh_rdac [last unloaded: scsi_wait_scan] <4> <4>Pid: 6399, comm: mdt02_008 Tainted: P -- ------------ 2.6.32-573.26.1.el6_lustre.x86_64 #1 Intel Corporation S2600GZ ........../S2600GZ <4>RIP: 0010:[<ffffffffa1083c9c>] [<ffffffffa1083c9c>] osd_xattr_get+0x32c/0x5b0 [osd_ldiskfs] <4>RSP: 0018:ffff8803fb9bf960 EFLAGS: 00010206 <4>RAX: 00000000ffffffff RBX: ffff8803fc661cc0 RCX: dead000000100100 <4>RDX: 0000000000000003 RSI: ffff88080efacdf8 RDI: ffffffffa12f59e4 <4>RBP: ffff8803fb9bf9b0 R08: 000000000000000b R09: ffff8803fc661d78 <4>R10: ffff88082a2e509c R11: 0000000000000000 R12: ffff88081392f010 <4>R13: ffffffffa12f59e4 R14: ffff8803de29bb70 R15: ffff880813931000 <4>FS: 0000000000000000(0000) GS:ffff88044e540000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>CR2: 0000003d4feacd90 CR3: 0000000001a8d000 CR4: 00000000000407e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process mdt02_008 (pid: 6399, threadinfo ffff8803fb9bc000, task ffff8803fb946040) <4>Stack: <4> ffff8803fc661d78 ffff88080efacdc0 000000000000000b ffff88080efacdf8 <4><d> ffff8803fb9bf9c0 ffff88081392f000 ffff8803fc661cc0 ffff88082cca66c0 <4><d> ffffffffa12f59e4 ffff88081392f010 ffff8803fb9bf9f0 ffffffffa12ccdc3 <4>Call Trace: <4> [<ffffffffa12ccdc3>] lod_get_ea+0xc3/0x530 [lod] <4> [<ffffffffa12de6bc>] lod_ah_init+0x6cc/0x980 [lod] <4> [<ffffffffa1359e49>] mdd_object_make_hint+0x139/0x180 [mdd] <4> [<ffffffffa1084f08>] ? osd_object_read_unlock+0x88/0xd0 [osd_ldiskfs] <4> [<ffffffffa1356251>] mdd_create+0x6f1/0x1770 [mdd] <4> [<ffffffffa090ca41>] ? lu_object_find_at+0xb1/0xe0 [obdclass] <4> [<ffffffffa1212b94>] ? mdt_version_save+0x84/0x1a0 [mdt] <4> [<ffffffffa121cc4c>] mdt_reint_create+0xbdc/0xfe0 [mdt] <4> [<ffffffffa120e30c>] ? mdt_root_squash+0x2c/0x3f0 [mdt] <4> [<ffffffffa0b058db>] ? lustre_pack_reply_v2+0x1eb/0x280 [ptlrpc] <4> [<ffffffff81299b7a>] ? strlcpy+0x4a/0x60 <4> [<ffffffffa120f97a>] ? old_init_ucred_common+0xda/0x2b0 [mdt] <4> [<ffffffffa1211ead>] mdt_reint_rec+0x5d/0x200 [mdt] <4> [<ffffffffa11fd5db>] mdt_reint_internal+0x62b/0xa50 [mdt] <4> [<ffffffffa11fdeab>] mdt_reint+0x6b/0x120 [mdt] <4> [<ffffffffa0b69ccc>] tgt_request_handle+0x8ec/0x1440 [ptlrpc] <4> [<ffffffffa0b16501>] ptlrpc_main+0xd31/0x1800 [ptlrpc] <4> [<ffffffff8106ee50>] ? pick_next_task_fair+0xd0/0x130 <4> [<ffffffff81539896>] ? schedule+0x176/0x3a0 <4> [<ffffffffa0b157d0>] ? ptlrpc_main+0x0/0x1800 [ptlrpc] <4> [<ffffffff810a138e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c28a>] child_rip+0xa/0x20 <4> [<ffffffff810a12f0>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20 <4>Code: e0 48 8b 8b b8 00 00 00 4c 8d 8b b8 00 00 00 49 89 c0 4c 39 c9 75 14 e9 8a 01 00 00 0f 1f 00 48 8b 09 49 39 c9 0f 84 7b 01 00 00 <4c> 3b 41 18 75 ee 48 8d 41 38 4c 89 c2 4c 89 ef 48 89 4d b8 4c <1>RIP [<ffffffffa1083c9c>] osd_xattr_get+0x32c/0x5b0 [osd_ldiskfs] <4> RSP <ffff8803fb9bf960> Sequence of events:
Attached files: messages, console, vmcore files of node lola-11 of both crashes. (Note console have time stamps printed in 5 min intervals) |
| Comments |
| Comment by Frank Heckes (Inactive) [ 05/Sep/16 ] |
|
Crash dump files have been saved to the subdirectories 127.0.0.1-2016-09-02-12:58:40, 127.0.0.1-2016-09-02-16:11:03 of lhn.hpdd.intel.com:/scratch/crashdumps/lu-8580/lola-11. |
| Comment by Frank Heckes (Inactive) [ 06/Sep/16 ] |
|
The error happens quite often. 5 more incidents on all MDS nodes. |
| Comment by Frank Heckes (Inactive) [ 08/Sep/16 ] |
|
one more crash during last nights session. |
| Comment by Alex Zhuravlev [ 09/Sep/16 ] |
|
Lai, can this be similar to |
| Comment by Lai Siyao [ 12/Sep/16 ] |
|
yes, the backtrace of "osd_xattr_get+0x32c" is exactly in osd_oxc_get() list traversal. |
| Comment by Peter Jones [ 12/Sep/16 ] |
|
ok then let's track the fix under |
| Comment by Peter Jones [ 12/Sep/16 ] |
|
Ooops. My misunderstanding - it's a similar issue but not a duplicate |
| Comment by Lai Siyao [ 13/Sep/16 ] |
|
IMO this is a duplicate of |
| Comment by Alex Zhuravlev [ 13/Sep/16 ] |
|
well, we can't fix this issue with the patch for |
| Comment by Frank Heckes (Inactive) [ 19/Sep/16 ] |
|
Error occurs with frequency 1 - 2 hours for build https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160916 |
| Comment by Alex Zhuravlev [ 07/Oct/16 ] |
|
a duplicate of |