[LU-11966] Crash in sanity-lfsck test 13: LFSCK can repair crashed lmm_oi Created: 12/Feb/19  Updated: 17/May/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Oleg Drokin Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

As I am trying to wade into more testing in a setup different from maloo with debug-enabled kernels, I found this crash in sanity-lfsck test 13 on ldiskfs. This is on master-next but no lfsck-related changes inside compared to master.

[  621.443088] BUG: unable to handle kernel paging request at ffff8800cf3b9000
[  621.443753] IP: [<ffffffff813f05f7>] memmove+0x37/0x1a0
[  621.444388] PGD 241b067 PUD 11e709067 PMD 11e68f067 PTE 80000000cf3b9060
[  621.445331] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[  621.445952] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic crct10dif_common zfs(PO) squashfs zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) i2c_piix4 pcspkr i2c_core binfmt_misc ip_tables rpcsec_gss_krb5 ata_generic pata_acpi ata_piix serio_raw virtio_blk libata floppy [last unloaded: libcfs]
[  621.451506] CPU: 6 PID: 26490 Comm: lfsck Kdump: loaded Tainted: P           OE  ------------   3.10.0-7.6-debug #2
[  621.452435] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  621.452931] task: ffff880087c10e40 ti: ffff8800b2df4000 task.ti: ffff8800b2df4000
[  621.453571] RIP: 0010:[<ffffffff813f05f7>]  [<ffffffff813f05f7>] memmove+0x37/0x1a0
[  621.454239] RSP: 0018:ffff8800b2df7a20  EFLAGS: 00010206
[  621.454662] RAX: ffff880111b84e30 RBX: 0000000000000000 RCX: 00000000000000f0
[  621.455286] RDX: 0000000000000030 RSI: ffff8800cf3b9000 RDI: ffff880111b84eb0
[  621.455897] RBP: ffff8800b2df7a88 R08: 0000000000000000 R09: 0000000000000000
[  621.456486] R10: ffffffff00000000 R11: 0000000000000000 R12: ffff88008d4af880
[  621.457129] R13: ffff880111b84e00 R14: 000000000000000a R15: ffffffffa0dc0494
[  621.457738] FS:  0000000000000000(0000) GS:ffff88011e380000(0000) knlGS:0000000000000000
[  621.458672] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  621.459268] CR2: ffff8800cf3b9000 CR3: 00000000d2bd4000 CR4: 00000000000006e0
[  621.460020] Call Trace:
[  621.460254]  [<ffffffffa01355fc>] ? nvlist_add_common.part.51+0x2cc/0x430 [znvpair]
[  621.461165]  [<ffffffffa0135d06>] nvlist_add_byte_array+0x26/0x30 [znvpair]
[  621.462042]  [<ffffffffa0e9e94b>] __osd_sa_xattr_set+0xbb/0x370 [osd_zfs]
[  621.462861]  [<ffffffffa0e9f6ea>] osd_xattr_set+0x50a/0x880 [osd_zfs]
[  621.463645]  [<ffffffffa01364b6>] ? nvlist_lookup_byte_array+0x26/0x30 [znvpair]
[  621.464426]  [<ffffffffa0e9d1a9>] ? osd_xattr_get_internal+0xa9/0x210 [osd_zfs]
[  621.465037]  [<ffffffffa0e9d4e5>] ? osd_xattr_get+0x1d5/0x5e0 [osd_zfs]
[  621.465574]  [<ffffffffa0d91556>] dt_xattr_set+0xa6/0x120 [lfsck]
[  621.466113]  [<ffffffffa0d9c1d8>] ? lfsck_layout_get_lovea+0xa8/0x240 [lfsck]
[  621.466684]  [<ffffffffa0da0065>] lfsck_layout_master_exec_oit+0x995/0xef0 [lfsck]
[  621.467375]  [<ffffffffa0d6e00f>] lfsck_master_oit_engine+0x7ff/0x14d0 [lfsck]
[  621.467954]  [<ffffffff8102a59d>] ? __switch_to+0xcd/0x4e0
[  621.468395]  [<ffffffffa0d6f666>] lfsck_master_engine+0x986/0x1390 [lfsck]
[  621.469265]  [<ffffffff810caae0>] ? wake_up_state+0x20/0x20
[  621.470015]  [<ffffffffa0d6ece0>] ? lfsck_master_oit_engine+0x14d0/0x14d0 [lfsck]
[  621.470902]  [<ffffffff810b4ed4>] kthread+0xe4/0xf0
[  621.471558]  [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140
[  621.472409]  [<ffffffff817c4c77>] ret_from_fork_nospec_begin+0x21/0x21
[  621.472976]  [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140
[  621.473537] Code: 00 48 39 fe 7d 0f 49 89 f0 49 01 d0 49 39 f8 0f 8f 9f 00 00 00 48 81 fa a8 02 00 00 72 05 40 38 fe 74 41 48 83 ea 20 48 83 ea 20 <4c> 8b 1e 4c 8b 56 08 4c 8b 4e 10 4c 8b 46 18 48 8d 76 20 4c 89 

Full report including crashdump, debug kernel and such here: http://testing.linuxhacker.ru:3333/lustre-reports/65/testresults/sanity-lfsck-zfs-centos7_x86_64-centos7_x86_64-retry2/



 Comments   
Comment by Alex Zhuravlev [ 17/May/19 ]

I think this is a duplicate of LU-12013, see details in that ticket.

Generated at Sat Feb 10 02:48:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.