Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
None
-
3
-
9223372036854775807
Description
Apparently when llog cannot allocate something, the error path is incorrect.
I frequently get this in sanity test 103b
[16131.247316] Lustre: Evicted from MGS (at 192.168.10.224@tcp) after server handle changed from 0xa0f8f476b253a968 to 0xa0f8f476b253ac39 [16131.386144] llog_process_th: page allocation failure: order:5, mode:0x200050 [16131.386810] CPU: 0 PID: 10584 Comm: llog_process_th Tainted: G W OE ------------ 3.10.0-debug #1 [16131.387991] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [16131.388601] 0000000000200050 0000000004b8075e ffff88005350b4b8 ffffffff816fe7c0 [16131.389400] ffff88005350b548 ffffffff8117910a 0000000000000000 ffff8800bd0d8100 [16131.390432] 0000000000000005 0000000000200050 ffff88005350b548 0000000004b8075e [16131.391447] Call Trace: [16131.391934] [<ffffffff816fe7c0>] dump_stack+0x19/0x1b [16131.392452] [<ffffffff8117910a>] warn_alloc_failed+0x11a/0x190 [16131.393006] [<ffffffff816fb9c4>] __alloc_pages_slowpath+0x6dd/0x74b [16131.393572] [<ffffffff8117d76f>] __alloc_pages_nodemask+0x49f/0x4c0 [16131.394112] [<ffffffff811c9503>] kmem_getpages+0x63/0x1d0 [16131.394475] [<ffffffff811ccfe0>] fallback_alloc+0x1b0/0x2a0 [16131.394864] [<ffffffff811ccdfb>] ____cache_alloc_node+0x18b/0x1c0 [16131.395233] [<ffffffff811cd5a9>] kmem_cache_alloc_trace+0x4d9/0x640 [16131.395648] [<ffffffffa0641d2d>] ? tgt_bitmap_chunk_alloc+0x2c/0x18e [ptlrpc] [16131.396604] [<ffffffffa0641d2d>] tgt_bitmap_chunk_alloc+0x2c/0x18e [ptlrpc] [16131.397136] [<ffffffffa06158f8>] tgt_reply_data_init+0x1538/0x1590 [ptlrpc] [16131.397759] [<ffffffff811cb668>] ? cache_alloc_debugcheck_after.isra.41+0x68/0x2a0 [16131.398984] [<ffffffffa060c3ec>] tgt_init+0x7fc/0xa90 [ptlrpc] [16131.399590] [<ffffffffa0b8d8f6>] mdt_init0+0xbd6/0x10f0 [mdt] [16131.400684] [<ffffffffa0b8de89>] mdt_device_alloc+0x79/0x110 [mdt] [16131.401317] [<ffffffffa03588d4>] obd_setup+0x114/0x2a0 [obdclass] [16131.401988] [<ffffffffa035b644>] class_setup+0x2f4/0x8f0 [obdclass] [16131.402437] [<ffffffffa03602b0>] class_process_config+0x1e30/0x3130 [obdclass] [16131.403132] [<ffffffff811ce399>] ? __kmalloc+0x649/0x660 [16131.403493] [<ffffffff811c9682>] ? kfree_debugcheck+0x12/0x30 [16131.403893] [<ffffffffa0364412>] class_config_llog_handler+0x1102/0x1fc0 [obdclass] [16131.404816] [<ffffffff817039ce>] ? mutex_unlock+0xe/0x10 [16131.405354] [<ffffffffa032696d>] llog_process_thread+0x5fd/0x1090 [obdclass] [16131.414386] [<ffffffffa0327d30>] ? llog_backup+0x510/0x510 [obdclass] [16131.414807] [<ffffffffa0327d7c>] llog_process_thread_daemonize+0x4c/0x80 [obdclass] [16131.415489] [<ffffffff810a404a>] kthread+0xea/0xf0 [16131.415874] [<ffffffff810a3f60>] ? kthread_create_on_node+0x140/0x140 [16131.416252] [<ffffffff81711758>] ret_from_fork+0x58/0x90 [16131.416609] [<ffffffff810a3f60>] ? kthread_create_on_node+0x140/0x140 [16131.417071] Mem-Info: [16131.417428] Node 0 DMA per-cpu: [16131.417841] CPU 0: hi: 0, btch: 1 usd: 0 [16131.418233] CPU 1: hi: 0, btch: 1 usd: 0 [16131.418579] CPU 2: hi: 0, btch: 1 usd: 0 [16131.418951] CPU 3: hi: 0, btch: 1 usd: 0 [16131.419301] CPU 4: hi: 0, btch: 1 usd: 0 [16131.419657] CPU 5: hi: 0, btch: 1 usd: 0 [16131.420018] CPU 6: hi: 0, btch: 1 usd: 0 [16131.420365] CPU 7: hi: 0, btch: 1 usd: 0 [16131.420744] Node 0 DMA32 per-cpu: [16131.421081] CPU 0: hi: 186, btch: 31 usd: 0 [16131.421430] CPU 1: hi: 186, btch: 31 usd: 0 [16131.421792] CPU 2: hi: 186, btch: 31 usd: 0 [16131.422147] CPU 3: hi: 186, btch: 31 usd: 0 [16131.422496] CPU 4: hi: 186, btch: 31 usd: 0 [16131.422952] CPU 5: hi: 186, btch: 31 usd: 0 [16131.423318] CPU 6: hi: 186, btch: 31 usd: 26 [16131.423683] CPU 7: hi: 186, btch: 31 usd: 0 [16131.424040] active_anon:119172 inactive_anon:119211 isolated_anon:0 active_file:2963 inactive_file:75348 isolated_file:0 unevictable:0 dirty:67 writeback:0 unstable:0 free:168109 slab_reclaimable:34266 slab_unreclaimable:89908 mapped:1915 shmem:233852 pagetables:364 bounce:0 free_cma:0 [16131.426367] Node 0 DMA free:11404kB min:248kB low:308kB high:372kB active_anon:76kB inactive_anon:8kB active_file:4kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:8kB slab_reclaimable:288kB slab_unreclaimable:3576kB kernel_stack:16kB pagetables:8kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [16131.428732] lowmem_reserve[]: 0 2775 2775 2775 [16131.429105] Node 0 DMA32 free:661032kB min:44804kB low:56004kB high:67204kB active_anon:476612kB inactive_anon:476836kB active_file:11848kB inactive_file:301392kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3081204kB managed:2841844kB mlocked:0kB dirty:268kB writeback:0kB mapped:7660kB shmem:935400kB slab_reclaimable:136776kB slab_unreclaimable:356056kB kernel_stack:4624kB pagetables:1448kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [16131.432899] lowmem_reserve[]: 0 0 0 0 [16131.433432] Node 0 DMA: 5*4kB (UEM) 28*8kB (UEM) 27*16kB (UEM) 9*32kB (UEM) 2*64kB (EM) 3*128kB (M) 3*256kB (M) 2*512kB (EM) 2*1024kB (UE) 3*2048kB (EMR) 0*4096kB = 11460kB [16131.435182] Node 0 DMA32: 1030*4kB (UEM) 4996*8kB (UEM) 11054*16kB (UEM) 10416*32kB (UEM) 1659*64kB (UM) 4*128kB (U) 3*256kB (MR) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 661720kB [16131.436693] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [16131.437417] 313527 total pagecache pages [16131.437776] 1292 pages in swap cache [16131.438114] Swap cache stats: add 531279, delete 529987, find 399862/418757 [16131.438525] Free swap = 2057976kB [16131.438876] Total swap = 2097148kB [16131.439234] 774299 pages RAM [16131.439798] 0 pages HighMem/MovableOnly [16131.440413] 59861 pages reserved [16131.440795] SLAB: Unable to allocate memory on node 0 (gfp=0x50) [16131.441159] cache: kmalloc-131072, object size: 131072, order: 5 [16131.441541] node 0: slabs: 184/184, objs: 184/184, free: 0 [16131.442181] llog_process_th: page allocation failure: order:5, mode:0x200050 [16131.442626] CPU: 0 PID: 10584 Comm: llog_process_th Tainted: G W OE ------------ 3.10.0-debug #1 [16131.443725] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [16131.444244] 0000000000200050 0000000004b8075e ffff88005350b4b8 ffffffff816fe7c0 [16131.445247] ffff88005350b548 ffffffff8117910a 0000000000000000 ffff8800bd0d8100 [16131.446249] 0000000000000005 0000000000200050 ffff88005350b548 0000000004b8075e [16131.447888] Call Trace: [16131.448421] [<ffffffff816fe7c0>] dump_stack+0x19/0x1b [16131.449025] [<ffffffff8117910a>] warn_alloc_failed+0x11a/0x190 [16131.449622] [<ffffffff816fb9c4>] __alloc_pages_slowpath+0x6dd/0x74b [16131.450183] [<ffffffff8117d76f>] __alloc_pages_nodemask+0x49f/0x4c0 [16131.450554] [<ffffffff811c9503>] kmem_getpages+0x63/0x1d0 [16131.451063] [<ffffffff811ccfe0>] fallback_alloc+0x1b0/0x2a0 [16131.451586] [<ffffffff811ccdfb>] ____cache_alloc_node+0x18b/0x1c0 [16131.452143] [<ffffffff811cd5a9>] kmem_cache_alloc_trace+0x4d9/0x640 [16131.452756] [<ffffffffa0641d2d>] ? tgt_bitmap_chunk_alloc+0x2c/0x18e [ptlrpc] [16131.453820] [<ffffffffa0641d2d>] tgt_bitmap_chunk_alloc+0x2c/0x18e [ptlrpc] [16131.454611] [<ffffffffa06158f8>] tgt_reply_data_init+0x1538/0x1590 [ptlrpc] [16131.455312] [<ffffffffa060c3ec>] tgt_init+0x7fc/0xa90 [ptlrpc] [16131.456103] [<ffffffffa0b8d8f6>] mdt_init0+0xbd6/0x10f0 [mdt] [16131.456737] [<ffffffffa0b8de89>] mdt_device_alloc+0x79/0x110 [mdt] [16131.457417] [<ffffffffa03588d4>] obd_setup+0x114/0x2a0 [obdclass] [16131.458072] [<ffffffffa035b644>] class_setup+0x2f4/0x8f0 [obdclass] [16131.458682] [<ffffffffa03602b0>] class_process_config+0x1e30/0x3130 [obdclass] [16131.459661] [<ffffffff811ce399>] ? __kmalloc+0x649/0x660 [16131.460701] [<ffffffff811c9682>] ? kfree_debugcheck+0x12/0x30 [16131.461330] [<ffffffffa0364412>] class_config_llog_handler+0x1102/0x1fc0 [obdclass] [16131.462222] [<ffffffff817039ce>] ? mutex_unlock+0xe/0x10 [16131.462595] [<ffffffffa032696d>] llog_process_thread+0x5fd/0x1090 [obdclass] [16131.463002] [<ffffffffa0327d30>] ? llog_backup+0x510/0x510 [obdclass] [16131.463387] [<ffffffffa0327d7c>] llog_process_thread_daemonize+0x4c/0x80 [obdclass] [16131.464393] [<ffffffff810a404a>] kthread+0xea/0xf0 [16131.465035] [<ffffffff810a3f60>] ? kthread_create_on_node+0x140/0x140 [16131.465655] [<ffffffff81711758>] ret_from_fork+0x58/0x90 [16131.466272] [<ffffffff810a3f60>] ? kthread_create_on_node+0x140/0x140 [16131.466846] Mem-Info: [16131.467303] Node 0 DMA per-cpu: [16131.467799] CPU 0: hi: 0, btch: 1 usd: 0 [16131.468298] CPU 1: hi: 0, btch: 1 usd: 0 [16131.468813] CPU 2: hi: 0, btch: 1 usd: 0 [16131.469354] CPU 3: hi: 0, btch: 1 usd: 0 [16131.469881] CPU 4: hi: 0, btch: 1 usd: 0 [16131.470380] CPU 5: hi: 0, btch: 1 usd: 0 [16131.470905] CPU 6: hi: 0, btch: 1 usd: 0 [16131.471404] CPU 7: hi: 0, btch: 1 usd: 0 [16131.471930] Node 0 DMA32 per-cpu: [16131.472410] CPU 0: hi: 186, btch: 31 usd: 0 [16131.472945] CPU 1: hi: 186, btch: 31 usd: 0 [16131.473506] CPU 2: hi: 186, btch: 31 usd: 0 [16131.474125] CPU 3: hi: 186, btch: 31 usd: 0 [16131.474744] CPU 4: hi: 186, btch: 31 usd: 0 [16131.475397] CPU 5: hi: 186, btch: 31 usd: 0 [16131.476017] CPU 6: hi: 186, btch: 31 usd: 15 [16131.476617] CPU 7: hi: 186, btch: 31 usd: 0 [16131.479285] active_anon:119172 inactive_anon:119211 isolated_anon:0 active_file:2963 inactive_file:75348 isolated_file:0 unevictable:0 dirty:67 writeback:0 unstable:0 free:168109 slab_reclaimable:34266 slab_unreclaimable:89908 mapped:1988 shmem:233852 pagetables:364 bounce:0 free_cma:0 [16131.483241] Node 0 DMA free:11404kB min:248kB low:308kB high:372kB active_anon:76kB inactive_anon:8kB active_file:4kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:8kB slab_reclaimable:288kB slab_unreclaimable:3576kB kernel_stack:16kB pagetables:8kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [16131.487465] lowmem_reserve[]: 0 2775 2775 2775 [16131.488159] Node 0 DMA32 free:661032kB min:44804kB low:56004kB high:67204kB active_anon:476612kB inactive_anon:476836kB active_file:11848kB inactive_file:301392kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3081204kB managed:2841844kB mlocked:0kB dirty:268kB writeback:0kB mapped:7952kB shmem:935400kB slab_reclaimable:136776kB slab_unreclaimable:356056kB kernel_stack:4624kB pagetables:1448kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [16131.493277] lowmem_reserve[]: 0 0 0 0 [16131.493702] Node 0 DMA: 5*4kB (UEM) 28*8kB (UEM) 27*16kB (UEM) 9*32kB (UEM) 2*64kB (EM) 3*128kB (M) 3*256kB (M) 2*512kB (EM) 2*1024kB (UE) 3*2048kB (EMR) 0*4096kB = 11460kB [16131.495541] Node 0 DMA32: 1021*4kB (UEM) 4995*8kB (UEM) 11055*16kB (UEM) 10416*32kB (UEM) 1659*64kB (UM) 4*128kB (U) 3*256kB (MR) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 661692kB [16131.497204] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [16131.498108] 313527 total pagecache pages [16131.498484] 1300 pages in swap cache [16131.498870] Swap cache stats: add 531299, delete 529999, find 399870/418769 [16131.499281] Free swap = 2058024kB [16131.499655] Total swap = 2097148kB [16131.500005] 774299 pages RAM [16131.500402] 0 pages HighMem/MovableOnly [16131.500836] 59861 pages reserved [16131.501393] SLAB: Unable to allocate memory on node 0 (gfp=0x50) [16131.502039] cache: kmalloc-131072, object size: 131072, order: 5 [16131.502701] node 0: slabs: 184/184, objs: 184/184, free: 0 [16131.817424] Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 1 client reconnects [16131.819517] BUG: unable to handle kernel NULL pointer dereference at (null) [16131.820220] IP: [<ffffffffa060cc87>] tgt_free_reply_data+0x97/0x300 [ptlrpc] [16131.821326] PGD 0 [16131.821773] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC [16131.822384] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) brd ext4 loop mbcache jbd2 sha512_generic crypto_null rpcsec_gss_krb5 syscopyarea sysfillrect sysimgblt ttm ata_generic drm_kms_helper pata_acpi drm i2c_piix4 ata_piix i2c_core virtio_console libata serio_raw virtio_balloon pcspkr virtio_blk floppy nfsd ip_tables [last unloaded: libcfs] [16131.837249] CPU: 4 PID: 10563 Comm: mdt01_002 Tainted: G W OE ------------ 3.10.0-debug #1 [16131.839421] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [16131.840001] task: ffff8800464e6100 ti: ffff88003dd3c000 task.ti: ffff88003dd3c000 [16131.841050] RIP: 0010:[<ffffffffa060cc87>] [<ffffffffa060cc87>] tgt_free_reply_data+0x97/0x300 [ptlrpc] [16131.842138] RSP: 0018:ffff88003dd3fb90 EFLAGS: 00010293 [16131.842650] RAX: 0000000000000000 RBX: ffff880080a7b380 RCX: 0000000000000000 [16131.843220] RDX: 0000000000000000 RSI: ffff880046197ba0 RDI: ffff880080a7b380 [16131.848997] RBP: ffff88003dd3fbd8 R08: ffff880080a7b380 R09: 0000000000000000 [16131.849530] R10: 0000000000000000 R11: ffff88003dd3fa2e R12: 0000000000000000 [16131.849919] R13: ffff8800824350b0 R14: ffff880046197c58 R15: ffff880046197ba0 [16131.850290] FS: 0000000000000000(0000) GS:ffff8800bc700000(0000) knlGS:0000000000000000 [16131.852183] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [16131.852754] CR2: 0000000000000000 CR3: 0000000001c0e000 CR4: 00000000000006e0 [16131.853354] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [16131.853903] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [16131.854278] Stack: [16131.854617] 0000000000000000 ffff88003dd3ffd8 ffff8800464e6100 ffff88003dd3fc30 [16131.855672] ffff880080a7b380 ffff880046197ba0 ffff880046197800 ffff880046197c58 [16131.856489] ffff8800824350b0 ffff88003dd3fc30 ffffffffa060cf4e 0000000000000246 [16131.857179] Call Trace: [16131.857546] [<ffffffffa060cf4e>] tgt_release_reply_data+0x5e/0x180 [ptlrpc] [16131.857971] [<ffffffffa06159e8>] tgt_handle_received_xid+0x98/0xe0 [ptlrpc] [16131.858377] [<ffffffffa061b558>] tgt_request_handle+0xb88/0x1330 [ptlrpc] [16131.858778] [<ffffffffa05c8fd1>] ptlrpc_server_handle_request+0x231/0xab0 [ptlrpc] [16131.859593] [<ffffffffa05c7858>] ? ptlrpc_wait_event+0xb8/0x370 [ptlrpc] [16131.860260] [<ffffffffa05ccdd0>] ptlrpc_main+0xa50/0x1db0 [ptlrpc] [16131.860639] [<ffffffff81707f27>] ? _raw_spin_unlock_irq+0x27/0x50 [16131.861148] [<ffffffffa05cc380>] ? ptlrpc_register_service+0xe70/0xe70 [ptlrpc] [16131.862266] [<ffffffff810a404a>] kthread+0xea/0xf0 [16131.862836] [<ffffffff810a3f60>] ? kthread_create_on_node+0x140/0x140 [16131.863419] [<ffffffff81711758>] ret_from_fork+0x58/0x90 [16131.863884] [<ffffffff810a3f60>] ? kthread_create_on_node+0x140/0x140 [16131.864266] Code: c1 fa 1f c1 ea 0c c1 f9 14 41 8d 04 14 25 ff ff 0f 00 29 d0 83 f9 0f 0f 8f 3a 02 00 00 49 8b 95 28 04 00 00 48 63 c9 48 8b 14 ca <f0> 0f b3 02 19 c0 85 c0 0f 84 53 01 00 00 48 85 db 0f 84 e3 01 [16131.866389] RIP [<ffffffffa060cc87>] tgt_free_reply_data+0x97/0x300 [ptlrpc] [16131.867055] RSP <ffff88003dd3fb90> [16131.867598] CR2: 0000000000000000
(gdb) l *(tgt_free_reply_data+0x95) 0xa3c85 is in tgt_free_reply_data (/home/green/git/lustre-release/lustre/ptlrpc/../../lustre/target/tgt_lastrcvd.c:158). 153 b = idx % LUT_REPLY_SLOTS_PER_CHUNK; 154 155 LASSERT(chunk < LUT_REPLY_SLOTS_MAX_CHUNKS); 156 LASSERT(b < LUT_REPLY_SLOTS_PER_CHUNK); 157 158 if (test_and_clear_bit(b, lut->lut_reply_bitmap[chunk]) == 0) { 159 CERROR("%s: slot %d already clear in bitmap\n", 160 tgt_name(lut), idx); 161 return
static int tgt_bitmap_chunk_alloc(struct lu_target *lut, int chunk) { unsigned long *bm; OBD_ALLOC(bm, BITS_TO_LONGS(LUT_REPLY_SLOTS_PER_CHUNK) * sizeof(long)); if (bm == NULL) return -ENOMEM;
So I guess at least here we have a primary target to convert to OBD_ALLOC_LARGE too.
Attachments
Issue Links
- is related to
-
LU-8316 BUG: unable to handle kernel NULL pointer dereference at tgt_free_reply_data+0x97/0x330
-
- Resolved
-
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20450/
Subject:
LU-8199ptlrpc: better lut_reply_bitmap handlingProject: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 5f08d032972256928a7a9ede3b526963c884778e