Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.1.1
-
3
-
9741
Description
During I/O testing we hit a client stack overrun on a Lustre 2.1.1 client. We have the LU-969 stack reduction patch in our local branch. The stacktrace looks very similar to the one reported for LU-969.
2012-03-18 06:05:23 BUG: unable to handle kernel paging request at 000000032cbc7960 2012-03-18 06:05:23 IP: [<ffffffff81052814>] update_curr+0x144/0x1f0 2012-03-18 06:05:23 PGD 417be5067 PUD 0 2012-03-18 06:05:23 Thread overran stack, or stack corrupted 2012-03-18 06:05:23 Oops: 0000 [#1] SMP 2012-03-18 06:05:23 last sysfs file: /sys/devices/system/cpu/cpu31/online 2012-03-18 06:05:23 CPU 6 2012-03-18 06:05:23 Modules linked in: xt_owner nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack lmv(U) mgc(U) lustre(U) lov(U) osc(U) lquota(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) acpi_cpufreq freq_table mperf ko2iblnd(U) lnet(U) libcfs(U) ipt_LOG xt_multiport iptable_filter ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm uinput sg sd_mod crc_t10dif isci libsas scsi_transport_sas ahci microcode sb_edac edac_core iTCO_wdt iTCO_vendor_support i2c_i801 i2c_core ib_qib(U) ib_mad ib_core wmi ioatdma ipv6 nfs lockd fscache nfs_acl auth_rpcgss sunrpc igb dca [last unloaded: cpufreq_ondemand] 2012-03-18 06:05:23 2012-03-18 06:05:23 Pid: 68872, comm: lmp Not tainted 2.6.32-220.7.1.3chaos.ch5.x86_64 #1 appro appro-512x/S2600JF 2012-03-18 06:05:23 RIP: 0010:[<ffffffff81052814>] [<ffffffff81052814>] update_curr+0x144/0x1f0 2012-03-18 06:05:23 RSP: 0018:ffff8800366c3db8 EFLAGS: 00010086 2012-03-18 06:05:23 RAX: ffff8808334c9580 RBX: 00000000755fa0c8 RCX: ffff880437f111c0 2012-03-18 06:05:23 RDX: 0000000000018b48 RSI: 0000000000000000 RDI: ffff8808334c95b8 2012-03-18 06:05:23 RBP: ffff8800366c3de8 R08: ffffffff8160b665 R09: 0000000000000000 2012-03-18 06:05:23 R10: 0000000000000010 R11: 0000000000000000 R12: ffff8800366d5fe8 2012-03-18 06:05:23 R13: 00000000000f3c4d R14: 000081fe1182f6c5 R15: ffff8808334c9580 2012-03-18 06:05:23 FS: 00002aaaacfd0800(0000) GS:ffff8800366c0000(0000) knlGS:0000000000000000 2012-03-18 06:05:23 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 2012-03-18 06:05:23 CR2: 000000032cbc7960 CR3: 0000000326499000 CR4: 00000000000406e0 2012-03-18 06:05:23 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 2012-03-18 06:05:23 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 2012-03-18 06:05:23 Process lmp (pid: 68872, threadinfo ffff8807755fa000, task ffff8808334c9580) 2012-03-18 06:05:23 Stack: 2012-03-18 06:05:23 ffff8800366c3dc8 ffffffff81013783 ffff8808334c95b8 ffff8800366d5fe8 2012-03-18 06:05:23 <0> 0000000000000000 0000000000000000 ffff8800366c3e18 ffffffff81052e2b 2012-03-18 06:05:23 <0> ffff8800366d5f80 0000000000000006 0000000000015f80 0000000000000006 2012-03-18 06:05:23 Call Trace: 2012-03-18 06:05:23 <IRQ> 2012-03-18 06:05:23 [<ffffffff81013783>] ? native_sched_clock+0x13/0x80 2012-03-18 06:05:23 [<ffffffff81052e2b>] task_tick_fair+0xdb/0x160 2012-03-18 06:05:23 [<ffffffff81056891>] scheduler_tick+0xc1/0x260 2012-03-18 06:05:23 [<ffffffff810a0e70>] ? tick_sched_timer+0x0/0xc0 2012-03-18 06:05:23 [<ffffffff8107c512>] update_process_times+0x52/0x70 2012-03-18 06:05:23 [<ffffffff810a0ed6>] tick_sched_timer+0x66/0xc0 2012-03-18 06:05:23 [<ffffffff8109555e>] __run_hrtimer+0x8e/0x1a0 2012-03-18 06:05:23 [<ffffffff81012b59>] ? read_tsc+0x9/0x20 2012-03-18 06:05:23 [<ffffffff81095906>] hrtimer_interrupt+0xe6/0x250 2012-03-18 06:05:23 [<ffffffff814f6f0b>] smp_apic_timer_interrupt+0x6b/0x9b 2012-03-18 06:05:23 [<ffffffff8100bc13>] apic_timer_interrupt+0x13/0x20 2012-03-18 06:05:23 <EOI> 2012-03-18 06:05:23 [<ffffffff8112adbe>] ? shrink_inactive_list+0x2de/0x740 2012-03-18 06:05:23 [<ffffffff8112ad72>] ? shrink_inactive_list+0x292/0x740 2012-03-18 06:05:23 [<ffffffff8112baef>] shrink_zone+0x38f/0x520 2012-03-18 06:05:23 [<ffffffff8112c894>] zone_reclaim+0x354/0x410 2012-03-18 06:05:23 [<ffffffff8112d4e0>] ? isolate_pages_global+0x0/0x350 2012-03-18 06:05:23 [<ffffffff81122d94>] get_page_from_freelist+0x694/0x820 2012-03-18 06:05:23 [<ffffffffa0475b6f>] ? cfs_hash_bd_from_key+0x3f/0xc0 [libcfs] 2012-03-18 06:05:23 [<ffffffffa05d4232>] ? lu_object_put+0x92/0x200 [obdclass] 2012-03-18 06:05:23 [<ffffffff811337c9>] ? zone_statistics+0x99/0xc0 2012-03-18 06:05:23 [<ffffffff81124011>] __alloc_pages_nodemask+0x111/0x940 2012-03-18 06:05:23 [<ffffffff8115e592>] kmem_getpages+0x62/0x170 2012-03-18 06:05:23 [<ffffffff8115ebff>] cache_grow+0x2cf/0x320 2012-03-18 06:05:23 [<ffffffff8115ee52>] cache_alloc_refill+0x202/0x240 2012-03-18 06:05:23 [<ffffffffa0469863>] ? cfs_alloc+0x63/0x90 [libcfs] 2012-03-18 06:05:23 [<ffffffff8115fb79>] __kmalloc+0x1a9/0x220 2012-03-18 06:05:23 [<ffffffffa0469863>] cfs_alloc+0x63/0x90 [libcfs] 2012-03-18 06:05:23 [<ffffffffa0702237>] ptlrpc_request_alloc_internal+0x167/0x360 [ptlrpc] 2012-03-18 06:05:23 [<ffffffffa070243e>] ptlrpc_request_alloc_pool+0xe/0x10 [ptlrpc] 2012-03-18 06:05:23 [<ffffffffa08d684b>] osc_brw_prep_request+0x1ab/0xcc0 [osc] 2012-03-18 06:05:23 [<ffffffffa08e7fbb>] ? osc_req_attr_set+0xfb/0x250 [osc] 2012-03-18 06:05:23 [<ffffffffa09f7638>] ? ccc_req_attr_set+0x78/0x150 [lustre] 2012-03-18 06:05:23 [<ffffffffa05e4e81>] ? cl_req_attr_set+0xd1/0x1a0 [obdclass] 2012-03-18 06:05:23 [<ffffffffa05e476c>] ? cl_req_prep+0x8c/0x130 [obdclass] 2012-03-18 06:05:23 [<ffffffffa08d821d>] osc_send_oap_rpc+0xebd/0x1690 [osc] 2012-03-18 06:05:23 [<ffffffffa08d17a6>] ? loi_list_maint+0xa6/0x130 [osc] 2012-03-18 06:05:23 [<ffffffffa08d8c9e>] osc_check_rpcs+0x2ae/0x3b0 [osc] 2012-03-18 06:05:23 [<ffffffffa08e86df>] osc_io_submit+0x1df/0x4a0 [osc] 2012-03-18 06:05:23 [<ffffffffa05e4938>] cl_io_submit_rw+0x78/0x130 [obdclass] 2012-03-18 06:05:23 [<ffffffffa094160f>] lov_io_submit+0x2bf/0x9d0 [lov] 2012-03-18 06:05:23 [<ffffffffa05e4938>] cl_io_submit_rw+0x78/0x130 [obdclass] 2012-03-18 06:05:23 [<ffffffffa09ddf15>] ll_writepage+0x205/0x3b0 [lustre] 2012-03-18 06:05:23 [<ffffffff8112a34b>] pageout.clone.1+0x12b/0x300 2012-03-18 06:05:23 [<ffffffff81129a01>] ? __remove_mapping+0x61/0x160 2012-03-18 06:05:23 [<ffffffff8112a8d2>] shrink_page_list.clone.0+0x3b2/0x5c0 2012-03-18 06:05:23 [<ffffffff811280bc>] ? release_pages+0x6c/0x250 2012-03-18 06:05:23 [<ffffffff8116804d>] ? mem_cgroup_del_lru_list+0x2d/0xc0 2012-03-18 06:05:23 [<ffffffff81168169>] ? mem_cgroup_del_lru+0x39/0x40 2012-03-18 06:05:23 [<ffffffff811689ae>] ? mem_cgroup_isolate_pages+0xee/0x1d0 2012-03-18 06:05:23 [<ffffffff8112addb>] shrink_inactive_list+0x2fb/0x740 2012-03-18 06:05:23 [<ffffffff8112baef>] shrink_zone+0x38f/0x520 2012-03-18 06:05:23 [<ffffffff811a0e4d>] ? bdi_queue_work+0x7d/0x110 2012-03-18 06:05:23 [<ffffffff8112bd7e>] do_try_to_free_pages+0xfe/0x520 2012-03-18 06:05:23 [<ffffffff8112c2ec>] try_to_free_mem_cgroup_pages+0x8c/0x90 2012-03-18 06:05:23 [<ffffffff811688c0>] ? mem_cgroup_isolate_pages+0x0/0x1d0 2012-03-18 06:05:23 [<ffffffff8116a450>] mem_cgroup_hierarchical_reclaim+0x2f0/0x460 2012-03-18 06:05:23 [<ffffffff8116b992>] __mem_cgroup_try_charge+0x202/0x460 2012-03-18 06:05:23 [<ffffffff811241e7>] ? __alloc_pages_nodemask+0x2e7/0x940 2012-03-18 06:05:23 [<ffffffff8116c4e7>] mem_cgroup_charge_common+0x87/0xd0 2012-03-18 06:05:23 [<ffffffff8116c658>] mem_cgroup_cache_charge+0x128/0x140 2012-03-18 06:05:23 [<ffffffff8111161a>] add_to_page_cache_locked+0x4a/0x140 2012-03-18 06:05:23 [<ffffffff8111173c>] add_to_page_cache_lru+0x2c/0x80 2012-03-18 06:05:23 [<ffffffff811122f9>] grab_cache_page_write_begin+0x99/0xc0 2012-03-18 06:05:23 [<ffffffffa09efa98>] ll_write_begin+0x58/0x170 [lustre] 2012-03-18 06:05:23 [<ffffffff81111b7e>] generic_file_buffered_write+0x10e/0x2a0 2012-03-18 06:05:23 [<ffffffff81070637>] ? current_fs_time+0x27/0x30 2012-03-18 06:05:23 [<ffffffff811134d0>] __generic_file_aio_write+0x250/0x480 2012-03-18 06:05:23 [<ffffffffa05d8d95>] ? cl_env_info+0x15/0x20 [obdclass] 2012-03-18 06:05:23 [<ffffffff8111376f>] generic_file_aio_write+0x6f/0xe0 2012-03-18 06:05:23 [<ffffffffa09fdf9e>] vvp_io_write_start+0x9e/0x1e0 [lustre] 2012-03-18 06:05:23 [<ffffffffa05e4a5a>] cl_io_start+0x6a/0x100 [obdclass] 2012-03-18 06:05:23 [<ffffffffa05e8094>] cl_io_loop+0xd4/0x160 [obdclass] 2012-03-18 06:05:23 [<ffffffffa09ba43b>] ll_file_io_generic+0x3bb/0x4b0 [lustre] 2012-03-18 06:05:23 [<ffffffffa09bf4ec>] ll_file_aio_write+0x13c/0x280 [lustre] 2012-03-18 06:05:23 [<ffffffffa09bf79c>] ll_file_write+0x16c/0x290 [lustre] 2012-03-18 06:05:23 [<ffffffff81177ae8>] vfs_write+0xb8/0x1a0 2012-03-18 06:05:23 [<ffffffff811784f1>] sys_write+0x51/0x90 2012-03-18 06:05:23 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b 2012-03-18 06:05:23 Code: 00 8b 15 cc 36 a4 00 85 d2 74 34 48 8b 50 08 8b 5a 18 48 8b 90 10 09 00 00 48 8b 4a 50 48 85 c9 74 1d 48 63 db 66 90 48 8b 51 20 <48> 03 14 dd 20 73 bf 81 4c 01 2a 48 8b 49 78 48 85 c9 75 e8 48 2012-03-18 06:05:23 RIP [<ffffffff81052814>] update_curr+0x144/0x1f0 2012-03-18 06:05:23 RSP <ffff8800366c3db8> 2012-03-18 06:05:23 CR2: 000000032cbc7960