Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0, Lustre 2.8.0
    • Lustre 2.4.0
    • hyperion/LLNL
    • 3
    • 8119

    Description

      Testing fix for LU-3188 http://review.whamcloud.com/#change,6191
      Client panics immediately when running IOR

      2013-05-05 12:06:15 Lustre: DEBUG MARKER: == test iorssf: iorssf == 12:06:15
      2013-05-05 12:30:42 BUG: scheduling while atomic: ior/5692/0x10000002
      2013-05-05 12:30:42 BUG: unable to handle kernel paging request at 0000000315c2e000
      2013-05-05 12:30:42 IP: [<ffffffff810568e4>] update_curr+0x144/0x1f0
      2013-05-05 12:30:42 PGD 106a964067 PUD 0
      2013-05-05 12:30:42 Thread overran stack, or stack corrupted
      2013-05-05 12:30:42 Oops: 0000 [#1] SMP
      2013-05-05 12:30:42 last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:03:00.0/infiniband/mlx4_0/ports/1/pkeys/127
      2013-05-05 12:30:42 CPU 9
      2013-05-05 12:30:42 Modules linked in: lmv(U) mgc(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ko2iblnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) ipmi_devintf acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr mlx4_ib ib_sa ib_mad iw_cxgb4 iw_cxgb3 ib_core ext4 mbcache jbd2 dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm sg sd_mod crc_t10dif wmi dcdbas sb_edac edac_core ahci i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp ioatdma nfs lockd fscache auth_rpcgss nfs_acl sunrpc mlx4_en mlx4_core igb dca ptp pps_core be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: cpufreq_ondemand]
      2013-05-05 12:30:43
      
      2013-05-05 12:30:43 Pid: 5692, comm: ior Not tainted 2.6.32-358.2.1.el6.x86_64 #1 Dell Inc. PowerEdge C6220/0HYFFG
      2013-05-05 12:30:43 RIP: 0010:[<ffffffff810568e4>]  [<ffffffff810568e4>] update_curr+0x144/0x1f0
      2013-05-05 12:30:43 RSP: 0018:ffff88089c423db8  EFLAGS: 00010086
      2013-05-05 12:30:43 RAX: ffff880840d79540 RBX: 0000000072806048 RCX: ffff880877f101c0
      2013-05-05 12:30:43 RDX: 00000000000192d8 RSI: 0000000000000000 RDI: ffff880840d79578
      2013-05-05 12:30:43 RBP: ffff88089c423de8 R08: ffffffff8160bb65 R09: 0000000000000007
      2013-05-05 12:30:43 R10: 0000000000000010 R11: 0000000000000007 R12: ffff88089c436768
      2013-05-05 12:30:43 R13: 00000000007c9fa8 R14: 0000082565f22284 R15: ffff880840d79540
      2013-05-05 12:30:43 FS:  00002aaaafebf8c0(0000) GS:ffff88089c420000(0000) knlGS:0000000000000000
      2013-05-05 12:30:43 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      2013-05-05 12:30:43 CR2: 0000000315c2e000 CR3: 000000106aea6000 CR4: 00000000000407e0
      2013-05-05 12:30:43 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      2013-05-05 12:30:43 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      2013-05-05 12:30:43 Process ior (pid: 5692, threadinfo ffff880872806000, task ffff880840d79540)
      2013-05-05 12:30:43 Stack:
      2013-05-05 12:30:43  ffff88089c423dc8 ffffffff81013783 ffff880840d79578 ffff88089c436768
      2013-05-05 12:30:43 <d> 0000000000000000 0000000000000000 ffff88089c423e18 ffffffff81056e9b
      2013-05-05 12:30:43 <d> ffff88089c436700 0000000000000009 0000000000016700 0000000000000009
      2013-05-05 12:30:43 Call Trace:
      2013-05-05 12:30:43  <IRQ>
      2013-05-05 12:30:43  [<ffffffff81013783>] ? native_sched_clock+0x13/0x80
      2013-05-05 12:30:43 BUG: unable to handle kernel paging request at 000000000001182f
      2013-05-05 12:30:43 IP: [<ffffffff8100f4dd>] print_context_stack+0xad/0x140
      2013-05-05 12:30:43 PGD 106a964067 PUD 106a825067 PMD 0
      2013-05-05 12:30:43 Thread overran stack, or stack corrupted
      2013-05-05 12:30:43 Oops: 0000 [#2] SMP
      2013-05-05 12:30:43 last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:03:00.0/infiniband/mlx4_0/ports/1/pkeys/127
      2013-05-05 12:30:43 CPU 9
      

      Same as LU-3188 - continuous stack dumps until node crash

      Attachments

        1. console.iwc113
          103 kB
        2. console.iwc21
          79 kB
        3. console.iwc4
          93 kB

        Issue Links

          Activity

            [LU-3281] IO Fails - client stack overrun
            jgmitter Joseph Gmitter (Inactive) made changes -
            Fix Version/s New: Lustre 2.8.0 [ 11113 ]

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16692/
            Subject: LU-3281 obdclass: remove structure holes to reduce memory
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 790ca7b3c47d27639390bd7a2b9693182d3f24d6

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16692/ Subject: LU-3281 obdclass: remove structure holes to reduce memory Project: fs/lustre-release Branch: master Current Patch Set: Commit: 790ca7b3c47d27639390bd7a2b9693182d3f24d6

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/16692
            Subject: LU-3281 obdclass: remove structure holes to reduce memory
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: d4bee9e4e76e3e7e9701595f6f45b7e27cff3a78

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/16692 Subject: LU-3281 obdclass: remove structure holes to reduce memory Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d4bee9e4e76e3e7e9701595f6f45b7e27cff3a78
            jlevi Jodi Levi (Inactive) made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]

            Closing ticket as patch landed to master. Please let me know if more work is needed and I will reopen.

            jlevi Jodi Levi (Inactive) added a comment - Closing ticket as patch landed to master. Please let me know if more work is needed and I will reopen.
            jlevi Jodi Levi (Inactive) made changes -
            Fix Version/s New: Lustre 2.4.0 [ 10154 ]

            Ok I ran the v3 of patch against checkstack:

            Overall it is a good reduction. Note: I believe gcc is doing some differing handing of some of the static symbols that explains the loss of osc_send_read_rpc from the list.

            376 osc:osc_io_unplug0
            200 osc:osc_build_rpc
            184 osc:osc_lru_shrink
            168 osc:osc_brw_prep_request
            144 obdclass:cl_page_find0
            104 osc:osc_io_submit
            96 lustre:vvp_io_read_start
            96 lustre:ll_file_read
            88 lov:lov_io_submit
            80 ptlrpc:__ptlrpc_request_bufs_pack
            80 obdclass:cl_page_put
            72 osc:osc_lru_del
            72 lustre:ll_file_io_generic
            64 ptlrpc:ptlrpc_request_bufs_pack
            64 lustre:ll_releasepage
            56 osc:osc_queue_sync_pages
            56 obdclass:lu_object_put
            48 osc:osc_req_attr_set
            48 lustre:ll_readpage
            40 ptlrpc:lustre_pack_request
            40 obdclass:lprocfs_alloc_stats
            40 obdclass:cl_req_attr_set
            40 lustre:ll_file_aio_read
            32 ptlrpc:sptlrpc_cli_alloc_reqbuf
            32 ptlrpc:ptlrpc_request_pack
            32 ptlrpc:null_alloc_reqbuf
            32 osc:osc_page_delete
            32 obdclass:lprocfs_alloc_obd_stats
            32 obdclass:lprocfs_alloc_md_stats
            32 libcfs:cfs_hash_dual_bd_unlock
            32 libcfs:cfs_hash_bd_from_key
            32 libcfs:cfs_alloc
            24 osc:discard_pagevec
            24 obdclass:cl_page_assume
            24 obdclass:cl_io_submit_rw
            24 obdclass:cl_io_read_page
            16 obdclass:cl_page_delete
            8 obdclass:cl_page_free
            8 obdclass:cl_page_delete0
            8 obdclass:cl_io_start
            8 obdclass:cl_env_hops_keycmp
            
            keith Keith Mannthey (Inactive) added a comment - Ok I ran the v3 of patch against checkstack: Overall it is a good reduction. Note: I believe gcc is doing some differing handing of some of the static symbols that explains the loss of osc_send_read_rpc from the list. 376 osc:osc_io_unplug0 200 osc:osc_build_rpc 184 osc:osc_lru_shrink 168 osc:osc_brw_prep_request 144 obdclass:cl_page_find0 104 osc:osc_io_submit 96 lustre:vvp_io_read_start 96 lustre:ll_file_read 88 lov:lov_io_submit 80 ptlrpc:__ptlrpc_request_bufs_pack 80 obdclass:cl_page_put 72 osc:osc_lru_del 72 lustre:ll_file_io_generic 64 ptlrpc:ptlrpc_request_bufs_pack 64 lustre:ll_releasepage 56 osc:osc_queue_sync_pages 56 obdclass:lu_object_put 48 osc:osc_req_attr_set 48 lustre:ll_readpage 40 ptlrpc:lustre_pack_request 40 obdclass:lprocfs_alloc_stats 40 obdclass:cl_req_attr_set 40 lustre:ll_file_aio_read 32 ptlrpc:sptlrpc_cli_alloc_reqbuf 32 ptlrpc:ptlrpc_request_pack 32 ptlrpc:null_alloc_reqbuf 32 osc:osc_page_delete 32 obdclass:lprocfs_alloc_obd_stats 32 obdclass:lprocfs_alloc_md_stats 32 libcfs:cfs_hash_dual_bd_unlock 32 libcfs:cfs_hash_bd_from_key 32 libcfs:cfs_alloc 24 osc:discard_pagevec 24 obdclass:cl_page_assume 24 obdclass:cl_io_submit_rw 24 obdclass:cl_io_read_page 16 obdclass:cl_page_delete 8 obdclass:cl_page_free 8 obdclass:cl_page_delete0 8 obdclass:cl_io_start 8 obdclass:cl_env_hops_keycmp
            bobijam Zhenyu Xu added a comment - - edited

            Is iwc21 still using old client image (without http://review.whamcloud.com/#change,6191 patch), its stack seems indicate that.

            And iwc4 shows another long calling path (memory pressure leading a long calling chain).

            extract from iwc4 log
            2013-05-05 13:05:45 WARNING: at kernel/sched_fair.c:1846 hrtick_start_fair+0x18b/0x190() (Not tainted)
            2013-05-05 13:05:45 Hardware name: PowerEdge C6220
            2013-05-05 13:05:45 Modules linked in: lmv(U) mgc(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ko2iblnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) ipmi_devintf acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr mlx4_ib ib_sa ib_mad iw_cxgb4 iw_cxgb3 ib_core ext4 mbcache jbd2 dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm sg sd_mod crc_t10dif wmi dcdbas sb_edac edac_core i2c_i801 i2c_core ahci iTCO_wdt iTCO_vendor_support shpchp ioatdma nfs lockd fscache auth_rpcgss nfs_acl sunrpc mlx4_en mlx4_core igb dca ptp pps_core be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: cpufreq_ondemand]
            2013-05-05 13:05:45 Pid: 7442, comm: ior Not tainted 2.6.32-358.2.1.el6.x86_64 #1
            2013-05-05 13:05:45 Call Trace:
            2013-05-05 13:05:45  [<ffffffff8106e2e7>] ? warn_slowpath_common+0x87/0xc0
            2013-05-05 13:05:45  [<ffffffff8106e33a>] ? warn_slowpath_null+0x1a/0x20
            2013-05-05 13:05:45  [<ffffffff810575eb>] ? hrtick_start_fair+0x18b/0x190
            2013-05-05 13:05:45  [<ffffffff81064a00>] ? pick_next_task_fair+0xd0/0x130
            2013-05-05 13:05:45  [<ffffffff8150d6d8>] ? schedule+0x178/0x3b2
            2013-05-05 13:05:45  [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20
            2013-05-05 13:05:45  [<ffffffff81064d6a>] ? __cond_resched+0x2a/0x40
            2013-05-05 13:05:45  [<ffffffff8150e320>] ? _cond_resched+0x30/0x40
            2013-05-05 13:05:45  [<ffffffffa0712df5>] ? cl_page_free+0xd5/0x5b0 [obdclass]
            2013-05-05 13:05:45  [<ffffffffa07134bf>] ? cl_page_put+0x1ef/0x4c0 [obdclass]
            2013-05-05 13:05:45  [<ffffffff8112c74f>] ? free_hot_page+0x2f/0x60
            2013-05-05 13:05:45  [<ffffffff8112f36e>] ? __put_single_page+0x1e/0x30
            2013-05-05 13:05:45  [<ffffffffa0a9beca>] ? lov_page_fini+0x6a/0x170 [lov]
            2013-05-05 13:05:45  [<ffffffffa0712e2b>] ? cl_page_free+0x10b/0x5b0 [obdclass]
            2013-05-05 13:05:45  [<ffffffffa07134bf>] ? cl_page_put+0x1ef/0x4c0 [obdclass]
            2013-05-05 13:05:45  [<ffffffffa0711628>] ? cl_page_disown0+0x88/0x180 [obdclass]
            2013-05-05 13:05:45  [<ffffffffa0712502>] ? cl_page_disown+0xa2/0x120 [obdclass]
            2013-05-05 13:05:45  [<ffffffffa0a087e7>] ? discard_pagevec+0x47/0x110 [osc]
            2013-05-05 13:05:45  [<ffffffffa0a092ac>] ? osc_lru_shrink+0x9fc/0xe40 [osc]
            2013-05-05 13:05:45  [<ffffffffa0a0a346>] ? osc_lru_del+0x3c6/0x560 [osc]
            2013-05-05 13:05:45  [<ffffffffa0588d84>] ? cfs_hash_dual_bd_unlock+0x34/0x60 [libcfs]
            2013-05-05 13:05:45  [<ffffffffa0a0a9c4>] ? osc_page_delete+0xe4/0x320 [osc]
            2013-05-05 13:05:45  [<ffffffffa0713eb5>] ? cl_page_delete0+0xc5/0x4e0 [obdclass]
            2013-05-05 13:05:45  [<ffffffffa0714312>] ? cl_page_delete+0x42/0x120 [obdclass]
            2013-05-05 13:05:45  [<ffffffffa0b6274b>] ? ll_releasepage+0x12b/0x1a0 [lustre]
            2013-05-05 13:05:45  [<ffffffff811190a0>] ? try_to_release_page+0x30/0x60
            2013-05-05 13:05:45  [<ffffffff81132827>] ? shrink_page_list.clone.3+0x517/0x650
            2013-05-05 13:05:45  [<ffffffff8117086b>] ? mem_cgroup_lru_del_list+0x2b/0xb0
            2013-05-05 13:05:45  [<ffffffff81132bf7>] ? isolate_lru_pages.clone.0+0xd7/0x170
            2013-05-05 13:05:45  [<ffffffff81133343>] ? shrink_inactive_list+0x343/0x830
            2013-05-05 13:05:45  [<ffffffff8112d04a>] ? determine_dirtyable_memory+0x1a/0x30
            2013-05-05 13:05:45  [<ffffffff8112d0f7>] ? get_dirty_limits+0x27/0x2f0
            2013-05-05 13:05:45  [<ffffffff81133bde>] ? shrink_mem_cgroup_zone+0x3ae/0x610
            2013-05-05 13:05:45  [<ffffffff8117257d>] ? mem_cgroup_iter+0xfd/0x280
            2013-05-05 13:05:45  [<ffffffff81133ea3>] ? shrink_zone+0x63/0xb0
            2013-05-05 13:05:46  [<ffffffff81134aa9>] ? zone_reclaim+0x349/0x400
            2013-05-05 13:05:46  [<ffffffff8112a62c>] ? get_page_from_freelist+0x69c/0x830
            2013-05-05 13:05:46  [<ffffffffa0710005>] ? cl_page_slice_add+0x55/0x140 [obdclass]
            2013-05-05 13:05:46  [<ffffffff8112bbc3>] ? __alloc_pages_nodemask+0x113/0x8d0
            2013-05-05 13:05:46  [<ffffffff8116095a>] ? alloc_pages_current+0xaa/0x110
            2013-05-05 13:05:46  [<ffffffff8111a1d7>] ? __page_cache_alloc+0x87/0x90
            2013-05-05 13:05:46  [<ffffffff81119bbe>] ? find_get_page+0x1e/0xa0
            2013-05-05 13:05:46  [<ffffffff8111b560>] ? grab_cache_page_nowait+0x70/0xc0
            2013-05-05 13:05:46  [<ffffffffa0b4afeb>] ? ll_readahead+0x8bb/0x10f0 [lustre]
            2013-05-05 13:05:46  [<ffffffffa0b75af5>] ? vvp_io_read_page+0x305/0x340 [lustre]
            2013-05-05 13:05:46  [<ffffffffa071ea0d>] ? cl_io_read_page+0x8d/0x170 [obdclass]
            2013-05-05 13:05:46  [<ffffffffa07127f7>] ? cl_page_assume+0xf7/0x220 [obdclass]
            2013-05-05 13:05:46  [<ffffffffa0b4a196>] ? ll_readpage+0x96/0x1a0 [lustre]
            2013-05-05 13:05:46  [<ffffffff81096ce0>] ? wake_bit_function+0x0/0x50
            2013-05-05 13:05:46  [<ffffffff8111b7ac>] ? generic_file_aio_read+0x1fc/0x700
            2013-05-05 13:05:46  [<ffffffffa0b77297>] ? vvp_io_read_start+0x257/0x470 [lustre]
            2013-05-05 13:05:46  [<ffffffffa071c56a>] ? cl_io_start+0x6a/0x140 [obdclass]
            2013-05-05 13:05:46  [<ffffffffa0720ca4>] ? cl_io_loop+0xb4/0x1b0 [obdclass]
            2013-05-05 13:05:46  [<ffffffffa0b1efcf>] ? ll_file_io_generic+0x33f/0x600 [lustre]
            2013-05-05 13:05:46  [<ffffffffa0b1f3cf>] ? ll_file_aio_read+0x13f/0x2c0 [lustre]
            2013-05-05 13:05:46  [<ffffffffa0b1fc6c>] ? ll_file_read+0x16c/0x2a0 [lustre]
            2013-05-05 13:05:46  [<ffffffff811817a5>] ? vfs_read+0xb5/0x1a0
            2013-05-05 13:05:46  [<ffffffff811818e1>] ? sys_read+0x51/0x90
            2013-05-05 13:05:46  [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
            2013-05-05 13:05:46 ---[ end trace a96e3bc7fcd2c0a4 ]---
            
            bobijam Zhenyu Xu added a comment - - edited Is iwc21 still using old client image (without http://review.whamcloud.com/#change,6191 patch), its stack seems indicate that. And iwc4 shows another long calling path (memory pressure leading a long calling chain). extract from iwc4 log 2013-05-05 13:05:45 WARNING: at kernel/sched_fair.c:1846 hrtick_start_fair+0x18b/0x190() (Not tainted) 2013-05-05 13:05:45 Hardware name: PowerEdge C6220 2013-05-05 13:05:45 Modules linked in: lmv(U) mgc(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ko2iblnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) ipmi_devintf acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr mlx4_ib ib_sa ib_mad iw_cxgb4 iw_cxgb3 ib_core ext4 mbcache jbd2 dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm sg sd_mod crc_t10dif wmi dcdbas sb_edac edac_core i2c_i801 i2c_core ahci iTCO_wdt iTCO_vendor_support shpchp ioatdma nfs lockd fscache auth_rpcgss nfs_acl sunrpc mlx4_en mlx4_core igb dca ptp pps_core be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: cpufreq_ondemand] 2013-05-05 13:05:45 Pid: 7442, comm: ior Not tainted 2.6.32-358.2.1.el6.x86_64 #1 2013-05-05 13:05:45 Call Trace: 2013-05-05 13:05:45 [<ffffffff8106e2e7>] ? warn_slowpath_common+0x87/0xc0 2013-05-05 13:05:45 [<ffffffff8106e33a>] ? warn_slowpath_null+0x1a/0x20 2013-05-05 13:05:45 [<ffffffff810575eb>] ? hrtick_start_fair+0x18b/0x190 2013-05-05 13:05:45 [<ffffffff81064a00>] ? pick_next_task_fair+0xd0/0x130 2013-05-05 13:05:45 [<ffffffff8150d6d8>] ? schedule+0x178/0x3b2 2013-05-05 13:05:45 [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20 2013-05-05 13:05:45 [<ffffffff81064d6a>] ? __cond_resched+0x2a/0x40 2013-05-05 13:05:45 [<ffffffff8150e320>] ? _cond_resched+0x30/0x40 2013-05-05 13:05:45 [<ffffffffa0712df5>] ? cl_page_free+0xd5/0x5b0 [obdclass] 2013-05-05 13:05:45 [<ffffffffa07134bf>] ? cl_page_put+0x1ef/0x4c0 [obdclass] 2013-05-05 13:05:45 [<ffffffff8112c74f>] ? free_hot_page+0x2f/0x60 2013-05-05 13:05:45 [<ffffffff8112f36e>] ? __put_single_page+0x1e/0x30 2013-05-05 13:05:45 [<ffffffffa0a9beca>] ? lov_page_fini+0x6a/0x170 [lov] 2013-05-05 13:05:45 [<ffffffffa0712e2b>] ? cl_page_free+0x10b/0x5b0 [obdclass] 2013-05-05 13:05:45 [<ffffffffa07134bf>] ? cl_page_put+0x1ef/0x4c0 [obdclass] 2013-05-05 13:05:45 [<ffffffffa0711628>] ? cl_page_disown0+0x88/0x180 [obdclass] 2013-05-05 13:05:45 [<ffffffffa0712502>] ? cl_page_disown+0xa2/0x120 [obdclass] 2013-05-05 13:05:45 [<ffffffffa0a087e7>] ? discard_pagevec+0x47/0x110 [osc] 2013-05-05 13:05:45 [<ffffffffa0a092ac>] ? osc_lru_shrink+0x9fc/0xe40 [osc] 2013-05-05 13:05:45 [<ffffffffa0a0a346>] ? osc_lru_del+0x3c6/0x560 [osc] 2013-05-05 13:05:45 [<ffffffffa0588d84>] ? cfs_hash_dual_bd_unlock+0x34/0x60 [libcfs] 2013-05-05 13:05:45 [<ffffffffa0a0a9c4>] ? osc_page_delete+0xe4/0x320 [osc] 2013-05-05 13:05:45 [<ffffffffa0713eb5>] ? cl_page_delete0+0xc5/0x4e0 [obdclass] 2013-05-05 13:05:45 [<ffffffffa0714312>] ? cl_page_delete+0x42/0x120 [obdclass] 2013-05-05 13:05:45 [<ffffffffa0b6274b>] ? ll_releasepage+0x12b/0x1a0 [lustre] 2013-05-05 13:05:45 [<ffffffff811190a0>] ? try_to_release_page+0x30/0x60 2013-05-05 13:05:45 [<ffffffff81132827>] ? shrink_page_list.clone.3+0x517/0x650 2013-05-05 13:05:45 [<ffffffff8117086b>] ? mem_cgroup_lru_del_list+0x2b/0xb0 2013-05-05 13:05:45 [<ffffffff81132bf7>] ? isolate_lru_pages.clone.0+0xd7/0x170 2013-05-05 13:05:45 [<ffffffff81133343>] ? shrink_inactive_list+0x343/0x830 2013-05-05 13:05:45 [<ffffffff8112d04a>] ? determine_dirtyable_memory+0x1a/0x30 2013-05-05 13:05:45 [<ffffffff8112d0f7>] ? get_dirty_limits+0x27/0x2f0 2013-05-05 13:05:45 [<ffffffff81133bde>] ? shrink_mem_cgroup_zone+0x3ae/0x610 2013-05-05 13:05:45 [<ffffffff8117257d>] ? mem_cgroup_iter+0xfd/0x280 2013-05-05 13:05:45 [<ffffffff81133ea3>] ? shrink_zone+0x63/0xb0 2013-05-05 13:05:46 [<ffffffff81134aa9>] ? zone_reclaim+0x349/0x400 2013-05-05 13:05:46 [<ffffffff8112a62c>] ? get_page_from_freelist+0x69c/0x830 2013-05-05 13:05:46 [<ffffffffa0710005>] ? cl_page_slice_add+0x55/0x140 [obdclass] 2013-05-05 13:05:46 [<ffffffff8112bbc3>] ? __alloc_pages_nodemask+0x113/0x8d0 2013-05-05 13:05:46 [<ffffffff8116095a>] ? alloc_pages_current+0xaa/0x110 2013-05-05 13:05:46 [<ffffffff8111a1d7>] ? __page_cache_alloc+0x87/0x90 2013-05-05 13:05:46 [<ffffffff81119bbe>] ? find_get_page+0x1e/0xa0 2013-05-05 13:05:46 [<ffffffff8111b560>] ? grab_cache_page_nowait+0x70/0xc0 2013-05-05 13:05:46 [<ffffffffa0b4afeb>] ? ll_readahead+0x8bb/0x10f0 [lustre] 2013-05-05 13:05:46 [<ffffffffa0b75af5>] ? vvp_io_read_page+0x305/0x340 [lustre] 2013-05-05 13:05:46 [<ffffffffa071ea0d>] ? cl_io_read_page+0x8d/0x170 [obdclass] 2013-05-05 13:05:46 [<ffffffffa07127f7>] ? cl_page_assume+0xf7/0x220 [obdclass] 2013-05-05 13:05:46 [<ffffffffa0b4a196>] ? ll_readpage+0x96/0x1a0 [lustre] 2013-05-05 13:05:46 [<ffffffff81096ce0>] ? wake_bit_function+0x0/0x50 2013-05-05 13:05:46 [<ffffffff8111b7ac>] ? generic_file_aio_read+0x1fc/0x700 2013-05-05 13:05:46 [<ffffffffa0b77297>] ? vvp_io_read_start+0x257/0x470 [lustre] 2013-05-05 13:05:46 [<ffffffffa071c56a>] ? cl_io_start+0x6a/0x140 [obdclass] 2013-05-05 13:05:46 [<ffffffffa0720ca4>] ? cl_io_loop+0xb4/0x1b0 [obdclass] 2013-05-05 13:05:46 [<ffffffffa0b1efcf>] ? ll_file_io_generic+0x33f/0x600 [lustre] 2013-05-05 13:05:46 [<ffffffffa0b1f3cf>] ? ll_file_aio_read+0x13f/0x2c0 [lustre] 2013-05-05 13:05:46 [<ffffffffa0b1fc6c>] ? ll_file_read+0x16c/0x2a0 [lustre] 2013-05-05 13:05:46 [<ffffffff811817a5>] ? vfs_read+0xb5/0x1a0 2013-05-05 13:05:46 [<ffffffff811818e1>] ? sys_read+0x51/0x90 2013-05-05 13:05:46 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b 2013-05-05 13:05:46 ---[ end trace a96e3bc7fcd2c0a4 ]---

            I just took a quick run at finding the stack info out.

            checkstack and my v1 script don't seem to resolve all the stack calls but this I what I got as a first pass.

            There are a few heavy functions in this code path.

            360 osc:osc_send_read_rpc
            296 osc:osc_build_rpc
            216 osc:osc_io_unplug0
            184 osc:osc_lru_shrink
            168 osc:osc_brw_prep_request
            144 obdclass:cl_page_find0
            104 osc:osc_io_submit
            96 lustre:vvp_io_read_start
            96 lustre:ll_file_read
            88 lov:lov_io_submit
            80 ptlrpc:__ptlrpc_request_bufs_pack
            80 obdclass:cl_page_put
            72 osc:osc_lru_del
            72 lustre:ll_file_io_generic
            64 ptlrpc:ptlrpc_request_bufs_pack
            64 lustre:ll_releasepage
            56 osc:osc_queue_sync_pages
            56 obdclass:lu_object_put
            48 osc:osc_req_attr_set
            48 lustre:ll_readpage
            40 ptlrpc:lustre_pack_request
            40 obdclass:lprocfs_alloc_stats
            40 obdclass:cl_req_attr_set
            40 lustre:ll_file_aio_read
            32 ptlrpc:sptlrpc_cli_alloc_reqbuf
            32 ptlrpc:ptlrpc_request_pack
            32 ptlrpc:null_alloc_reqbuf
            32 osc:osc_page_delete
            32 obdclass:lprocfs_alloc_obd_stats
            32 obdclass:lprocfs_alloc_md_stats
            32 libcfs:cfs_hash_dual_bd_unlock
            32 libcfs:cfs_hash_bd_from_key
            32 libcfs:cfs_alloc
            24 osc:discard_pagevec
            24 obdclass:cl_page_assume
            24 obdclass:cl_io_submit_rw
            24 obdclass:cl_io_read_page
            16 obdclass:cl_page_delete
            8 obdclass:cl_page_free
            8 obdclass:cl_page_delete0
            8 obdclass:cl_io_start
            8 obdclass:cl_env_hops_keycmp
            
            keith Keith Mannthey (Inactive) added a comment - I just took a quick run at finding the stack info out. checkstack and my v1 script don't seem to resolve all the stack calls but this I what I got as a first pass. There are a few heavy functions in this code path. 360 osc:osc_send_read_rpc 296 osc:osc_build_rpc 216 osc:osc_io_unplug0 184 osc:osc_lru_shrink 168 osc:osc_brw_prep_request 144 obdclass:cl_page_find0 104 osc:osc_io_submit 96 lustre:vvp_io_read_start 96 lustre:ll_file_read 88 lov:lov_io_submit 80 ptlrpc:__ptlrpc_request_bufs_pack 80 obdclass:cl_page_put 72 osc:osc_lru_del 72 lustre:ll_file_io_generic 64 ptlrpc:ptlrpc_request_bufs_pack 64 lustre:ll_releasepage 56 osc:osc_queue_sync_pages 56 obdclass:lu_object_put 48 osc:osc_req_attr_set 48 lustre:ll_readpage 40 ptlrpc:lustre_pack_request 40 obdclass:lprocfs_alloc_stats 40 obdclass:cl_req_attr_set 40 lustre:ll_file_aio_read 32 ptlrpc:sptlrpc_cli_alloc_reqbuf 32 ptlrpc:ptlrpc_request_pack 32 ptlrpc:null_alloc_reqbuf 32 osc:osc_page_delete 32 obdclass:lprocfs_alloc_obd_stats 32 obdclass:lprocfs_alloc_md_stats 32 libcfs:cfs_hash_dual_bd_unlock 32 libcfs:cfs_hash_bd_from_key 32 libcfs:cfs_alloc 24 osc:discard_pagevec 24 obdclass:cl_page_assume 24 obdclass:cl_io_submit_rw 24 obdclass:cl_io_read_page 16 obdclass:cl_page_delete 8 obdclass:cl_page_free 8 obdclass:cl_page_delete0 8 obdclass:cl_io_start 8 obdclass:cl_env_hops_keycmp
            pjones Peter Jones made changes -
            Assignee Original: WC Triage [ wc-triage ] New: Zhenyu Xu [ bobijam ]

            People

              bobijam Zhenyu Xu
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: