Details
-
3
-
9223372036854775807
Description
[ 1890.299266] LustreError: 19229:0:(mdd_device.c:263:llog_changelog_cancel()) lustre-MDD0000: cancel idx 23 of catalog [0x5:0xa:0x0]: rc = -2 [ 1890.291625] Modules linked in: loop lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) mbcache lquota(OE) lfsck(OE) jbd2 obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter vboxsf(OE) snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm iosf_mbi [ 1890.308621] crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper snd_timer ppdev pcspkr snd ablk_helper soundcore cryptd virtio_net sg vboxvideo(OE) vboxguest(OE) ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops video drm i2c_piix4 parport_pc parport i2c_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic sr_mod cdrom ata_generic pata_acpi ahci libahci e1000 ata_piix crct10dif_pclmul crct10dif_common crc32c_intel libata serio_raw virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [ 1890.317566] CPU: 3 PID: 19257 Comm: lctl Tainted: G OE ------------ 3.10.0-693.2.2.el7_lustre.x86_64 #1 [ 1890.320750] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 1890.322062] task: ffff88007beddee0 ti: ffff880075950000 task.ti: ffff880075950000 [ 1890.323310] RIP: 0010:[<ffffffffc07ba9b6>] [<ffffffffc07ba9b6>] llog_osd_declare_write_rec+0xb6/0x3d0 [obdclass] [ 1890.325834] RSP: 0018:ffff8800759537a8 EFLAGS: 00010206 [ 1890.327262] RAX: 5a5a5a5a5a5a5a5a RBX: ffff880067977000 RCX: 0000000000000000 [ 1890.328622] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88007a9b2580 [ 1890.329976] RBP: ffff8800759537d8 R08: 0000000000000001 R09: 000000000000ffef [ 1890.331375] R10: 0000000000002000 R11: 000000000000000f R12: ffff880075953d88 [ 1890.332884] R13: ffff880067d70840 R14: ffff880067977078 R15: ffff8800671e1c00 [ 1890.334374] FS: 00007f173df09740(0000) GS:ffff88007fd80000(0000) knlGS:0000000000000000 [ 1890.336313] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1890.338548] CR2: 00007fbcc671ae40 CR3: 000000006c198000 CR4: 00000000000406e0 [ 1890.339733] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1890.340838] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1890.341975] Stack: [ 1890.343053] ffff8800000029b9 ffff8800671e1200 ffffffffc1057100 ffff880075953d88 [ 1890.344121] ffff88006a054000 00000000000029b9 ffff880075953818 ffffffffc07af174 [ 1890.345200] ffff880067d70840 ffff8800671e1200 ffff880067d70840 ffff880075953d88 [ 1890.346339] Call Trace: [ 1890.347426] [<ffffffffc07af174>] llog_declare_write_rec+0x84/0x200 [obdclass] [ 1890.348452] [<ffffffffc07af8f2>] llog_cancel_rec+0xe2/0x870 [obdclass] [ 1890.349732] [<ffffffffc07b66fa>] llog_cat_cancel_records+0x13a/0x2e0 [obdclass] [ 1890.350742] [<ffffffffc0662c47>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [ 1890.351750] [<ffffffffc10143d4>] llog_changelog_cancel_cb+0xe4/0x1d0 [mdd] [ 1890.353074] [<ffffffffc07b08f5>] llog_process_thread+0x875/0x1460 [obdclass] [ 1890.354407] [<ffffffffc10142f0>] ? mdd_device_free+0x2a0/0x2a0 [mdd] [ 1890.355508] [<ffffffffc07b159c>] llog_process_or_fork+0xbc/0x450 [obdclass] [ 1890.356457] [<ffffffffc07b6ad9>] llog_cat_process_cb+0x239/0x250 [obdclass] [ 1890.357413] [<ffffffffc07b08f5>] llog_process_thread+0x875/0x1460 [obdclass] [ 1890.358301] [<ffffffffc07b68a0>] ? llog_cat_cancel_records+0x2e0/0x2e0 [obdclass] [ 1890.359194] [<ffffffffc07b159c>] llog_process_or_fork+0xbc/0x450 [obdclass] [ 1890.360036] [<ffffffffc07b68a0>] ? llog_cat_cancel_records+0x2e0/0x2e0 [obdclass] [ 1890.360885] [<ffffffffc07b5e69>] llog_cat_process_or_fork+0x199/0x2a0 [obdclass] [ 1890.361731] [<ffffffffc10142f0>] ? mdd_device_free+0x2a0/0x2a0 [mdd] [ 1890.362529] [<ffffffffc10142f0>] ? mdd_device_free+0x2a0/0x2a0 [mdd] [ 1890.363344] [<ffffffffc07b5f9e>] llog_cat_process+0x2e/0x30 [obdclass] ....
I can reproduce this using this test https://review.whamcloud.com/#/c/29035/ in loop mode.
10986: 4d 8b 7e 70 mov 0x70(%r14),%r15 1098a: 4d 85 ff test %r15,%r15 1098d: 0f 84 9c 02 00 00 je 10c2f <llog_osd_declare_write_rec+0x32f> 10993: 44 89 d0 mov %r10d,%eax 10996: 48 c7 43 78 00 00 00 movq $0x0,0x78(%rbx) 1099d: 00 1099e: 4c 8d 73 78 lea 0x78(%rbx),%r14 109a2: 48 89 83 80 00 00 00 mov %rax,0x80(%rbx) 109a9: 49 8b 47 30 mov 0x30(%r15),%rax 109ad: 48 85 c0 test %rax,%rax 109b0: 0f 84 7a 01 00 00 je 10b30 <llog_osd_declare_write_rec+0x230> 109b6: 48 8b 40 08 mov 0x8(%rax),%rax
I have done quick analyze, fault happened when accessing
llog_handle->dt_object->do_body_ops
it contains poison (rax) 0x5a5a5a5a5a..
It is great that dt_declare_record_write check for NULL pointer
LASSERT(dt->do_body_ops);
maybe poison check is required here also? It will simplify analyze of future fails.
Attachments
Issue Links
- is related to
-
LU-13018 NULL pointer dereference in osd_attr_get while consuming changelogs and reading changelog_size
-
- Resolved
-