Description
After issues reported in LU-10762 winding down sanity test leads to the following crash:
[13407.392152] Lustre: DEBUG MARKER: == sanity test complete, duration 13357 sec ========================================================== 18:41:43 (1520206903) [13415.646714] LustreError: 1942:0:(namei.c:87:ll_set_inode()) Can not initialize inode [0x280002b10:0x5:0x0] without object type: valid = 0x100000001 [13415.648323] LustreError: 1942:0:(llite_lib.c:2355:ll_prep_inode()) new_inode -fatal: rc -12 [13416.174418] LustreError: 1966:0:(namei.c:87:ll_set_inode()) Can not initialize inode [0x280002b10:0x10:0x0] without object type: valid = 0x100000001 [13416.182023] LustreError: 1966:0:(namei.c:87:ll_set_inode()) Skipped 2 previous similar messages [13416.183013] LustreError: 1966:0:(llite_lib.c:2355:ll_prep_inode()) new_inode -fatal: rc -12 [13416.184665] LustreError: 1966:0:(llite_lib.c:2355:ll_prep_inode()) Skipped 2 previous similar messages [13417.245051] LustreError: 2011:0:(namei.c:87:ll_set_inode()) Can not initialize inode [0x280002b10:0x26:0x0] without object type: valid = 0x100000001 [13417.257155] LustreError: 2011:0:(namei.c:87:ll_set_inode()) Skipped 6 previous similar messages [13417.258625] LustreError: 2011:0:(llite_lib.c:2355:ll_prep_inode()) new_inode -fatal: rc -12 [13417.259615] LustreError: 2011:0:(llite_lib.c:2355:ll_prep_inode()) Skipped 6 previous similar messages [13430.306910] LustreError: 2312:0:(namei.c:87:ll_set_inode()) Can not initialize inode [0x200000403:0x25:0x0] without object type: valid = 0x100000001 [13430.317961] LustreError: 2312:0:(llite_lib.c:2355:ll_prep_inode()) new_inode -fatal: rc -12 [13430.954786] BUG: unable to handle kernel paging request at ffff8800b4f19fe0 [13430.963030] IP: [<ffffffffa02d5fee>] lmv_striped_read_page.isra.30+0x33b/0x5f9 [lmv] [13430.964636] PGD 2e75067 PUD 33fa01067 PMD 33f859067 PTE 80000000b4f19060 [13430.965335] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [13430.966026] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) brd ext4 loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate mbcache jbd2 syscopyarea sysfillrect sysimgblt ata_generic ttm pata_acpi drm_kms_helper drm i2c_piix4 ata_piix virtio_balloon pcspkr serio_raw virtio_blk i2c_core virtio_console libata floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs] [13430.971838] CPU: 11 PID: 2316 Comm: ll_sa_1811 Tainted: P W OE ------------ 3.10.0-debug #2 [13430.973531] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [13430.974112] task: ffff8802de2947c0 ti: ffff880284680000 task.ti: ffff880284680000 [13430.975001] RIP: 0010:[<ffffffffa02d5fee>] [<ffffffffa02d5fee>] lmv_striped_read_page.isra.30+0x33b/0x5f9 [lmv] [13430.975946] RSP: 0018:ffff880284683bf0 EFLAGS: 00010282 [13430.976415] RAX: ffff88027197c018 RBX: ffff8800899f2fd0 RCX: 0000000000000073 [13430.977028] RDX: 0000000000000003 RSI: ffffffffa02d7c65 RDI: 0000000000000001 [13430.978421] RBP: ffff880284683c60 R08: 000000000000002e R09: 0000000280000403 [13430.979076] R10: 0000000000000000 R11: 0000000000000025 R12: 0000000000000001 [13430.979630] R13: ffff8802cf4adfc8 R14: ffff8800b4f19fd0 R15: ffff8802cf4adf80 [13430.980111] FS: 0000000000000000(0000) GS:ffff88033e560000(0000) knlGS:0000000000000000 [13430.981017] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [13430.982254] CR2: ffff8800b4f19fe0 CR3: 0000000297ddd000 CR4: 00000000000006e0 [13430.982933] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [13430.983621] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [13430.984306] Stack: [13430.992434] ffff880284683cd0 ffff8800899f2018 ffffea0002267c80 ffff8800899f2000 [13430.993756] 0000000000000068 ffff8801bba89e00 00088802a5468000 ffff8800899f2fa0 [13430.995084] 0000000000000030 ffff8801bba89e00 ffff880284683cc8 2f9bc3b7c9eb6d8e [13430.996367] Call Trace: [13430.996967] [<ffffffffa02bfbbb>] lmv_read_page+0x32b/0x3a0 [lmv] [13430.997958] [<ffffffffa16543d8>] ll_get_dir_page+0xc8/0x2d0 [lustre] [13430.998730] [<ffffffffa1690cf0>] ? ll_dom_lock_cancel+0x390/0x390 [lustre] [13430.999463] [<ffffffffa16a7cb3>] ll_statahead_thread+0x293/0x11d0 [lustre] [13431.000167] [<ffffffff810af8e4>] ? finish_task_switch+0x44/0x180 [13431.000858] [<ffffffff810b7cc0>] ? wake_up_state+0x20/0x20 [13431.001621] [<ffffffffa16a7a20>] ? ll_agl_thread+0x4d0/0x4d0 [lustre] [13431.002317] [<ffffffff810a2eba>] kthread+0xea/0xf0 [13431.002951] [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140 [13431.003637] [<ffffffff8170fb98>] ret_from_fork+0x58/0x90 [13431.004443] [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140 [13431.005131] Code: ff ff ff e9 8b 01 00 00 49 63 c4 44 89 e2 4c 89 ff 48 ff c0 48 c1 e0 05 49 8d 74 07 08 4c 8b 76 10 e8 55 f5 ff ff 4d 85 f6 74 cd <49> 8b 46 10 49 89 47 18 45 8b 6e 18 66 45 85 ed 75 24 41 0f b7 [13431.007816] RIP [<ffffffffa02d5fee>] lmv_striped_read_page.isra.30+0x33b/0x5f9 [lmv]
I have several samples of this.
(gdb) l *(lmv_striped_read_page+0x33b) 0xb64 is in lmv_striped_read_page (/home/green/git/lustre-release/lustre/lmv/lmv_obd.c:2360). 2355 /* end of directory */ 2356 if (!next) { 2357 ctxt->ldc_hash = MDS_DIR_END_OFF; 2358 break; 2359 } 2360 ctxt->ldc_hash = le64_to_cpu(next->lde_hash); 2361 2362 ent_size = le16_to_cpu(next->lde_reclen); 2363 2364 /* the last entry lde_reclen is 0, but it might not be the last
So it appears that we are getting a bad 'next' pointer from the list here.