Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.6.0, Lustre 2.7.0
-
3
-
14940
Description
While removing a striped directory if out_create_update_req() cannot allocate a update request or its 8K buffer we have the following GPF (likely use after free).
export MDSCOUNT=4 llmount.sh cd /mnt/lustre lfs mkdir -c4 d0 echo /root/lustre-release/lustre/ptlrpc/../../lustre/target/out_lib.c:70 > /proc/fs/lustre/alloc_fail # fail to allocate dt_update in out_create_update_req() rmdir d0 [ 85.080526] LustreError: 4395:0:(class_obd.c:198:obd_alloc_fail()) force kmalloc of dt_update (72 bytes) failed at /root/lustre-release/lustre/ptlrpc/../../lustre/target/out_lib.c:70 [ 85.085387] LustreError: 4395:0:(class_obd.c:205:obd_alloc_fail()) 63673246 total bytes and 1048576 total pages (256 bytes) allocated by Lustre, 407071412 total bytes by LNET [ 94.798945] Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000300000400-0x0000000340000400):2:mdt [ 94.801799] Lustre: Skipped 1 previous similar message [ 94.803637] Lustre: cli-ctl-lustre-MDT0002: Allocated super-sequence [0x0000000300000400-0x0000000340000400):2:mdt] [ 103.921859] LustreError: 4804:0:(class_obd.c:198:obd_alloc_fail()) force kmalloc of dt_update (72 bytes) failed at /root/lustre-release/lustre/ptlrpc/../../lustre/target/out_lib.c:70 [ 103.927270] LustreError: 4804:0:(class_obd.c:205:obd_alloc_fail()) 63959198 total bytes and 1048576 total pages (256 bytes) allocated by Lustre, 407352740 total bytes by LNET [ 103.932656] LustreError: 4804:0:(osp_md_object.c:237:osp_md_declare_attr_set()) lustre-MDT0001-osp-MDT0000: Get OSP update buf failed: -12 [ 103.936893] LustreError: 4804:0:(lod_object.c:1081:lod_declare_attr_set()) failed declaration: -12 [ 103.939281] general protection fault: 0000 [#1] SMP [ 103.940247] last sysfs file: /sys/devices/system/cpu/possible [ 103.940247] CPU 0 [ 103.940247] Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) mdd(U) mgs(U) nodemap(U) osd_ldiskfs(U) ldiskfs(U) exportfs lquota(U) lfsck(U) jbd obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] [ 103.940247] [ 103.940247] Pid: 4804, comm: mdt00_004 Not tainted 2.6.32-431.5.1.el6.lustre.x86_64 #1 Bochs Bochs [ 103.940247] RIP: 0010:[<ffffffffa0d3d890>] [<ffffffffa0d3d890>] lod_trans_stop+0x110/0x210 [lod] [ 103.940247] RSP: 0018:ffff8801e244bab0 EFLAGS: 00010292 [ 103.940247] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8801e85d6080 RCX: 0000000000000000 [ 103.940247] RDX: 0000000000000000 RSI: ffff8801e244cf58 RDI: ffff880219e58000 [ 103.940247] RBP: ffff8801e244bae0 R08: 0000000000000000 R09: 0000000000000000 [ 103.940247] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 103.940247] R13: ffff8802199fcab0 R14: ffff8801fbac8ae0 R15: ffff8801fe3fdba8 [ 103.940247] FS: 0000000000000000(0000) GS:ffff88002f800000(0000) knlGS:0000000000000000 [ 103.940247] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [ 103.940247] CR2: 000000377fedc920 CR3: 00000002162cd000 CR4: 00000000000006f0 [ 103.940247] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 103.940247] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 103.940247] Process mdt00_004 (pid: 4804, threadinfo ffff8801e244a000, task ffff8801e97a8580) [ 103.940247] Stack: [ 103.940247] ffff8801e244bad0 00000000fffffff4 ffff8801fbac8ae0 ffff8801eda3c820 [ 103.940247] <d> ffff8801fe3b2c30 ffff8801fe3fdba8 ffff8801e244baf0 ffffffffa081f08d [ 103.940247] <d> ffff8801e244bbb0 ffffffffa08097e9 ffffffffa0cb47ba ffff8801eda3a7b0 [ 103.940247] Call Trace: [ 103.940247] [<ffffffffa081f08d>] mdd_trans_stop+0x1d/0x20 [mdd] [ 103.940247] [<ffffffffa08097e9>] mdd_unlink+0x4b9/0xcc0 [mdd] [ 103.940247] [<ffffffffa0cb47ba>] ? mdt_reint_unlink+0x9ca/0x10b0 [mdt] [ 103.940247] [<ffffffffa0cab968>] mdo_unlink+0x18/0x50 [mdt] [ 103.940247] [<ffffffffa0cb47f4>] mdt_reint_unlink+0xa04/0x10b0 [mdt] [ 103.940247] [<ffffffffa0c8ee45>] ? mdt_ucred+0x15/0x20 [mdt] [ 103.940247] [<ffffffffa0cab701>] mdt_reint_rec+0x41/0xe0 [mdt] [ 103.940247] [<ffffffffa0c96a63>] mdt_reint_internal+0x4c3/0x7c0 [mdt] [ 103.940247] [<ffffffffa0c972eb>] mdt_reint+0x6b/0x120 [mdt] [ 103.940247] [<ffffffffa06e9675>] tgt_request_handle+0x245/0xad0 [ptlrpc] [ 103.940247] [<ffffffffa069c921>] ptlrpc_main+0xcf1/0x1870 [ptlrpc] [ 103.940247] [<ffffffffa069bc30>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] [ 103.940247] [<ffffffff8109eab6>] kthread+0x96/0xa0 [ 103.940247] [<ffffffff8100c30a>] child_rip+0xa/0x20 [ 103.940247] [<ffffffff81554710>] ? _spin_unlock_irq+0x30/0x40 [ 103.940247] [<ffffffff8100bb10>] ? restore_args+0x0/0x30 [ 103.940247] [<ffffffff8109ea20>] ? kthread+0x0/0xa0 [ 103.940247] [<ffffffff8100c300>] ? child_rip+0x0/0x20 [ 103.940247] Code: 00 bb 01 00 00 48 c7 05 eb b4 03 00 00 00 00 00 c7 05 d9 b4 03 00 01 00 00 00 e8 3c 17 59 ff e9 34 ff ff ff 49 8b 45 00 49 39 c5 <4c> 8b 38 74 4d 48 8b 70 f8 48 8b 46 40 48 8b 40 18 48 85 c0 0f [ 103.940247] RIP [<ffffffffa0d3d890>] lod_trans_stop+0x110/0x210 [lod] [ 103.940247] RSP <ffff8801e244bab0>
The fault is in the list_for_each_entry() block. It seems likely that there were no updates in the list and that the first dt_trans_stop() freed the thandle.
static int lod_trans_stop(const struct lu_env *env, struct dt_device *dt, struct thandle *th) { struct thandle_update *tu = th->th_update; struct dt_update_request *update; struct dt_update_request *tmp; int rc2 = 0; int rc; ENTRY; CERROR("dt = %p, th = %p\n", dt, th); rc = dt_trans_stop(env, th->th_dev, th); if (likely(tu == NULL)) RETURN(rc); list_for_each_entry_safe(update, tmp, &tu->tu_remote_update_list, dur_list) { /* update will be freed inside dt_trans_stop */ rc2 = dt_trans_stop(env, update->dur_dt, th); if (unlikely(rc2 != 0 && rc == 0)) rc = rc2; } RETURN(rc); }
This was found via memory allocation fault injection.
Attachments
Issue Links
- mentioned in
-
Page Loading...