Details
-
Bug
-
Resolution: Duplicate
-
Blocker
-
None
-
Lustre 2.4.0
-
Hyperion/LLNL Chaos5/RHEL6
-
3
-
7567
Description
Running IOR file-per-process, client crashes:
2013-04-05 01:55:43 Lustre: DEBUG MARKER: == parallel-scale test iorfpp: iorfpp == 01:55:43 (1365152143)
<ConMan> Console [iwc109] log at 2013-04-05 02:00:00 PDT.
2013-04-05 02:25:51 LustreError: 19233:0:(osc_dev.c:149:osc_session_init()) ASSERTION( (!(CFS_ALLOC_IO != CFS_ALLOC_ATOMIC) || (!(((current_thread_info()->preempt_count) & ((((1UL << (10))-1) << ((0 + 8) + 8)) | (((1UL << (8))-1) << (0 + 8)) | (((1UL << (1))-1) << (((0 + 8) + 8) + 10))))))) ) failed:
2013-04-05 02:25:51 LustreError: 19233:0:(osc_dev.c:149:osc_session_init()) LBUG
2013-04-05 02:25:51 BUG: unable to handle kernel paging request at 00000002db28d9e0
2013-04-05 02:25:51 IP:Apr 5 02:25:51 [<ffffffff81053264>] update_curr+0x144/0x1f0
2013-04-05 02:25:51 iwc109 kernel: LPGD 4e4433067 PUD 0 ustreError: 1923
2013-04-05 02:25:51 3:0:(osc_dev.c:1Thread overran stack, or stack corrupted
2013-04-05 02:25:51 49:osc_session_init()) ASSERTIONOops: 0000 [#1] SMP ( (!(CFS_ALLOC_I
2013-04-05 02:25:51 O != CFS_ALLOC_Alast sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:03:00.0/infiniband/mlx4_0/ports/1/pkeys/127
2013-04-05 02:25:51 TOMIC) || (!(((cCPU 20
2013-04-05 02:25:51 urrent_thread_inModules linked in:fo()->preempt_co lmv(U)unt) & ((((1UL < mgc(U) lustre< (10))-1) << (((U) lov0 + 8) + 8)) | ((U) osc((1UL << (8))-1)(U) mdc << (0 + 8)) | ((U) fid((1UL << (1))-1)(U) fld << (((0 + 8) + (U) ptlrpc8) + 10))))))) )(U) obdclass(U) failed:
2013-04-05 02:25:51 Apr 5 lvfs(U) 02:25:51 iwc109 ko2iblnd(U) kernel: LustreE lnet(U) sha512_genericrror: 19233:0:(o sha256_generic libcfssc_dev.c:149:osc(U) ipmi_devintf_session_init( ))ipmi_si ipmi_msghandler LBUG
2013-04-05 02:25:51 acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa mlx4_ib ib_mad iw_cxgb4 iw_cxgb3 ib_core dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm sg sd_mod crc_t10dif wmi dcdbas sb_edac edac_core i2c_i801 i2c_core ahci iTCO_wdt iTCO_vendor_support shpchp ioatdma nfs lockd fscache nfs_acl auth_rpcgss sunrpc mlx4_en mlx4_core igb dca be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: cpufreq_ondemand]
2013-04-05 02:25:51
2013-04-05 02:25:51
2013-04-05 02:25:51 Pid: 19233, comm: ior Not tainted 2.6.32-279.19.1.el6.x86_64 #1 Dell Inc. PowerEdge C6220/0HYFFG
2013-04-05 02:25:51 RIP: 0010:[<ffffffff81053264>] [<ffffffff81053264>] update_curr+0x144/0x1f0
2013-04-05 02:25:51 RSP: 0018:ffff880044783db8 EFLAGS: 00010086
2013-04-05 02:25:51 RAX: ffff88067102c080 RBX: 000000006b2d2b18 RCX: ffff88086ff111c0
2013-04-05 02:25:51 RDX: 0000000000019250 RSI: 0000000000000000 RDI: ffff88067102c0b8
2013-04-05 02:25:51 RBP: ffff880044783de8 R08: ffffffff8160b7e5 R09: 0000000000000012
2013-04-05 02:25:51 R10: 0000000000000010 R11: 0000000000000012 R12: ffff8800447966e8
2013-04-05 02:25:51 R13: 00000000012182dc R14: 00001469ad639df8 R15: ffff88067102c080
2013-04-05 02:25:51 FS: 00002aaaafebf8c0(0000) GS:ffff880044780000(0000) knlGS:0000000000000000
2013-04-05 02:25:51 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
2013-04-05 02:25:51 CR2: 00000002db28d9e0 CR3: 0000000174eb1000 CR4: 00000000000406e0
2013-04-05 02:25:51 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2013-04-05 02:25:52 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
2013-04-05 02:25:52 Process ior (pid: 19233, threadinfo ffff8805dba84000, task ffff88067102c080)
2013-04-05 02:25:52 Stack:
2013-04-05 02:25:52 ffff880044783dc8 ffffffff81013683 ffff88067102c0b8 ffff8800447966e8
2013-04-05 02:25:52 <d> 0000000000000000 0000000000000000 ffff880044783e18 ffffffff8105381b
2013-04-05 02:25:52 <d> ffff880044796680 0000000000000014 0000000000016680 0000000000000014
2013-04-05 02:25:52 Call Trace:
2013-04-05 02:25:52 <IRQ>
2013-04-05 02:25:52 [<ffffffff81013683>] ? native_sched_clock+0x13/0x80
2013-04-05 02:25:52 [<ffffffff8105381b>] task_tick_fair+0xdb/0x160
2013-04-05 02:25:52 [<ffffffff810570e1>] scheduler_tick+0xc1/0x260
2013-04-05 02:25:52 [<ffffffff810a0910>] ? tick_sched_timer+0x0/0xc0
2013-04-05 02:25:52 [<ffffffff8107cc8e>] update_process_times+0x6e/0x90
2013-04-05 02:25:52 [<ffffffff810a0976>] tick_sched_timer+0x66/0xc0
2013-04-05 02:25:52 [<ffffffff8109510e>] __run_hrtimer+0x8e/0x1a0
2013-04-05 02:25:52 [<ffffffff81012a69>] ? read_tsc+0x9/0x20
2013-04-05 02:25:52 [<ffffffff810954b6>] hrtimer_interrupt+0xe6/0x250
2013-04-05 02:25:52 [<ffffffff814f1f9b>] smp_apic_timer_interrupt+0x6b/0x9b
2013-04-05 02:25:52 [<ffffffff8100bb93>] apic_timer_interrupt+0x13/0x20
2013-04-05 02:25:52 <EOI>
2013-04-05 02:25:52 [<ffffffff814ec587>] ? _spin_unlock_irqrestore+0x17/0x20
2013-04-05 02:25:52 [<ffffffffa04d736c>] cfs_trace_unlock_tcd+0x5c/0xa0 [libcfs]
2013-04-05 02:25:52 [<ffffffffa04e7ce0>] libcfs_debug_vmsg2+0x610/0xbb0 [libcfs]
2013-04-05 02:25:52 [<ffffffffa04e7ce0>] ? libcfs_debug_vmsg2+0x610/0xbb0 [libcfs]
2013-04-05 02:25:52 [<ffffffff8115de27>] ? fallback_alloc+0x227/0x270
2013-04-05 02:25:52 [<ffffffffa04e82c1>] libcfs_debug_msg+0x41/0x50 [libcfs]
2013-04-05 02:25:52 [<ffffffffa04e82c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
2013-04-05 02:25:52 [<ffffffffa04d7e79>] lbug_with_loc+0x29/0xb0 [libcfs]
2013-04-05 02:25:52 [<ffffffffa09aaf73>] osc_session_init+0x1f3/0x200 [osc]
2013-04-05 02:25:52 [<ffffffffa0a4597d>] ? lov_sub_get+0x30d/0x690 [lov]
2013-04-05 02:25:52 [<ffffffffa0685ecf>] keys_fill+0x6f/0x190 [obdclass]
2013-04-05 02:25:52 [<ffffffffa0a4597d>] ? lov_sub_get+0x30d/0x690 [lov]
2013-04-05 02:25:52 [<ffffffffa0a4597d>] ? lov_sub_get+0x30d/0x690 [lov]
2013-04-05 02:25:52 [<ffffffffa068a1fb>] lu_context_init+0xab/0x260 [obdclass]
2013-04-05 02:25:52 [<ffffffffa068a3ce>] ? lu_env_init+0x1e/0x30 [obdclass]
2013-04-05 02:25:52 [<ffffffffa0a4597d>] ? lov_sub_get+0x30d/0x690 [lov]
2013-04-05 02:25:52 [<ffffffffa0690fa6>] cl_env_new+0x156/0x370 [obdclass]
2013-04-05 02:25:52 [<ffffffffa06918a5>] cl_env_get+0x55/0x1a0 [obdclass]
2013-04-05 02:25:52 [<ffffffffa0a4597d>] lov_sub_get+0x30d/0x690 [lov]
2013-04-05 02:25:52 [<ffffffffa0a45e1d>] lov_page_subio+0x11d/0x200 [lov]
2013-04-05 02:25:52 [<ffffffffa0a3facf>] lov_page_own+0xaf/0x170 [lov]
2013-04-05 02:25:52 [<ffffffffa06951bb>] cl_page_own0+0x11b/0x350 [obdclass]
2013-04-05 02:25:52 [<ffffffffa0695403>] cl_page_own_try+0x13/0x20 [obdclass]
2013-04-05 02:25:52 [<ffffffffa09acc79>] discard_pagevec+0x69/0x110 [osc]
2013-04-05 02:25:52 [<ffffffffa09ad1ae>] osc_lru_shrink+0x48e/0xe40 [osc]
2013-04-05 02:25:52 [<ffffffffa09ae7b6>] osc_lru_del+0x3c6/0x560 [osc]
2013-04-05 02:25:52 [<ffffffffa04ecd84>] ? cfs_hash_dual_bd_unlock+0x34/0x60 [libcfs]
2013-04-05 02:25:52 [<ffffffffa09aee34>] osc_page_delete+0xe4/0x320 [osc]
2013-04-05 02:25:52 [<ffffffffa0696635>] cl_page_delete0+0xc5/0x4e0 [obdclass]
2013-04-05 02:25:52 [<ffffffffa0696a92>] cl_page_delete+0x42/0x120 [obdclass]
2013-04-05 02:25:52 [<ffffffffa0b0750b>] ll_releasepage+0x12b/0x1a0 [lustre]
2013-04-05 02:25:52 [<ffffffff8110f0f0>] try_to_release_page+0x30/0x60
2013-04-05 02:25:52 [<ffffffff81129597>] shrink_page_list.clone.0+0x517/0x650
2013-04-05 02:25:52 [<ffffffff811299ec>] shrink_inactive_list+0x31c/0x7d0
2013-04-05 02:25:52 [<ffffffff81270cac>] ? put_dec+0x10c/0x110
2013-04-05 02:25:52 [<ffffffff81270f9e>] ? number+0x2ee/0x320
2013-04-05 02:25:52 [<ffffffff81123820>] ? __free_pages+0x60/0xa0
2013-04-05 02:25:52 [<ffffffff8112a76f>] shrink_zone+0x38f/0x520
2013-04-05 02:25:52 [<ffffffff8112b514>] zone_reclaim+0x354/0x410
2013-04-05 02:25:52 [<ffffffff8112c160>] ? isolate_pages_global+0x0/0x350
2013-04-05 02:25:52 [<ffffffff81121914>] get_page_from_freelist+0x694/0x820
2013-04-05 02:25:52 [<ffffffff811213dc>] ? get_page_from_freelist+0x15c/0x820
2013-04-05 02:25:53 [<ffffffff81132ad9>] ? zone_statistics+0x99/0xc0
2013-04-05 02:25:53 [<ffffffff81122b91>] __alloc_pages_nodemask+0x111/0x940
2013-04-05 02:25:53 [<ffffffff8115dba8>] ? ____cache_alloc_node+0x108/0x160
2013-04-05 02:25:53 [<ffffffff8115d1a2>] kmem_getpages+0x62/0x170
2013-04-05 02:25:53 [<ffffffff8115d80f>] cache_grow+0x2cf/0x320
2013-04-05 02:25:53 [<ffffffff8115da62>] cache_alloc_refill+0x202/0x240
2013-04-05 02:25:53 [<ffffffffa04d8b60>] ? cfs_alloc+0x30/0x60 [libcfs]
2013-04-05 02:25:53 [<ffffffff8115e929>] __kmalloc+0x1a9/0x220
2013-04-05 02:25:53 [<ffffffffa04d8b60>] cfs_alloc+0x30/0x60 [libcfs]
2013-04-05 02:25:53 [<ffffffffa082a400>] null_alloc_reqbuf+0x190/0x420 [ptlrpc]
2013-04-05 02:25:53 [<ffffffffa0819fc9>] sptlrpc_cli_alloc_reqbuf+0x69/0x220 [ptlrpc]
2013-04-05 02:25:53 [<ffffffffa07edfd1>] lustre_pack_request+0x81/0x180 [ptlrpc]
2013-04-05 02:25:53 [<ffffffffa07dbf95>] __ptlrpc_request_bufs_pack+0xe5/0x3b0 [ptlrpc]
2013-04-05 02:25:53 [<ffffffffa07dc2bc>] ptlrpc_request_bufs_pack+0x5c/0x80 [ptlrpc]
2013-04-05 02:25:53 [<ffffffffa07dc304>] ptlrpc_request_pack+0x24/0x70 [ptlrpc]
2013-04-05 02:25:53 [<ffffffffa099f0c5>] osc_brw_prep_request+0x155/0x1140 [osc]
2013-04-05 02:25:53 [<ffffffffa09b3051>] ? osc_req_attr_set+0x131/0x320 [osc]
2013-04-05 02:25:53 [<ffffffffa069f221>] ? cl_req_attr_set+0xd1/0x230 [obdclass]
2013-04-05 02:25:53 [<ffffffffa09a4e7b>] osc_build_rpc+0x86b/0x1730 [osc]
2013-04-05 02:25:53 [<ffffffff81132ad9>] ? zone_statistics+0x99/0xc0
2013-04-05 02:25:53 [<ffffffffa09bc1d3>] osc_send_read_rpc+0x6a3/0x880 [osc]
2013-04-05 02:25:53 [<ffffffff81122b91>] ? __alloc_pages_nodemask+0x111/0x940
2013-04-05 02:25:53 [<ffffffffa04ed522>] ? cfs_hash_bd_from_key+0x42/0xd0 [libcfs]
2013-04-05 02:25:53 [<ffffffffa09bfd56>] osc_io_unplug0+0xb46/0x12b0 [osc]
2013-04-05 02:25:53 [<ffffffffa0692785>] ? cl_page_slice_add+0x55/0x140 [obdclass]
2013-04-05 02:25:53 [<ffffffffa09c21e1>] osc_io_unplug+0x11/0x20 [osc]
2013-04-05 02:25:53 [<ffffffffa09c8dd0>] osc_queue_sync_pages+0x1d0/0x360 [osc]
2013-04-05 02:25:53 [<ffffffffa09b378f>] osc_io_submit+0x1cf/0x4a0 [osc]
2013-04-05 02:25:53 [<ffffffffa069eb8c>] cl_io_submit_rw+0x6c/0x160 [obdclass]
2013-04-05 02:25:53 [<ffffffffa0a48701>] lov_io_submit+0x351/0xbc0 [lov]
2013-04-05 02:25:53 [<ffffffffa069eb8c>] cl_io_submit_rw+0x6c/0x160 [obdclass]
2013-04-05 02:25:53 [<ffffffffa06a11ae>] cl_io_read_page+0xae/0x170 [obdclass]
2013-04-05 02:25:53 [<ffffffffa0694f77>] ? cl_page_assume+0xf7/0x220 [obdclass]
2013-04-05 02:25:53 [<ffffffffa0aeef96>] ll_readpage+0x96/0x1f0 [lustre]
2013-04-05 02:25:53 [<ffffffff811117ec>] generic_file_aio_read+0x1fc/0x700
2013-04-05 02:25:53 [<ffffffffa0b1bff7>] vvp_io_read_start+0x257/0x470 [lustre]
2013-04-05 02:25:53 [<ffffffffa069ecea>] cl_io_start+0x6a/0x140 [obdclass]
2013-04-05 02:25:53 [<ffffffffa06a3424>] cl_io_loop+0xb4/0x1b0 [obdclass]
2013-04-05 02:25:53 [<ffffffffa0ac367f>] ll_file_io_generic+0x33f/0x600 [lustre]
2013-04-05 02:25:53 [<ffffffffa0ac4baf>] ll_file_aio_read+0x13f/0x2c0 [lustre]
2013-04-05 02:25:53 [<ffffffffa0ac4e9c>] ll_file_read+0x16c/0x2a0 [lustre]
2013-04-05 02:25:53 [<ffffffff81176cb5>] vfs_read+0xb5/0x1a0
2013-04-05 02:25:53 [<ffffffff81176df1>] sys_read+0x51/0x90
2013-04-05 02:25:53 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
2013-04-05 02:25:53 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
2013-04-05 02:25:54 Code: 00 8b 15 bc 2c a4 00 85 d2 74 34 48 8b 50 08 8b 5a 18 48 8b 90 10 09 00 00 48 8b 4a 50 48 85 c9 74 1d 48 63 db 66 90 48 8b 51 20 <48> 03 14 dd 20 81 bf 81 4c 01 2a 48 8b 49 78 48 85 c9 75 e8 48
2013-04-05 02:25:54 RIP [<ffffffff81053264>] update_curr+0x144/0x1f0
2013-04-05 02:25:54 RSP <ffff880044783db8>
2013-04-05 02:25:54 CR2: 00000002db28d9e0
Cliff was running the 2.3.63 tag which did not contain the fix for
LU-2909. I will mark this ticket as a duplicate and we can reopen it if this issue reappears in the next tag