[LU-1243] Lustre 2.1.1 client stack overrun Created: 20/Mar/12  Updated: 29/May/17  Resolved: 29/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.1
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Ned Bass Assignee: Hongchao Zhang
Resolution: Cannot Reproduce Votes: 0
Labels: llnl
Environment:

https://github.com/chaos/lustre/commits/2.1.1-llnl


Severity: 3
Rank (Obsolete): 9741

 Description   

During I/O testing we hit a client stack overrun on a Lustre 2.1.1 client. We have the LU-969 stack reduction patch in our local branch. The stacktrace looks very similar to the one reported for LU-969.

2012-03-18 06:05:23 BUG: unable to handle kernel paging request at 000000032cbc7960
2012-03-18 06:05:23 IP: [<ffffffff81052814>] update_curr+0x144/0x1f0
2012-03-18 06:05:23 PGD 417be5067 PUD 0
2012-03-18 06:05:23 Thread overran stack, or stack corrupted
2012-03-18 06:05:23 Oops: 0000 [#1] SMP
2012-03-18 06:05:23 last sysfs file: /sys/devices/system/cpu/cpu31/online
2012-03-18 06:05:23 CPU 6
2012-03-18 06:05:23 Modules linked in: xt_owner nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack lmv(U) mgc(U)
lustre(U) lov(U) osc(U) lquota(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) acpi_cpufreq freq_table mperf
ko2iblnd(U) lnet(U) libcfs(U) ipt_LOG xt_multiport iptable_filter ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
rdma_cm ib_cm iw_cm ib_addr ib_sa dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm uinput sg
sd_mod crc_t10dif isci libsas scsi_transport_sas ahci microcode sb_edac edac_core iTCO_wdt iTCO_vendor_support i2c_i801
i2c_core ib_qib(U) ib_mad ib_core wmi ioatdma ipv6 nfs lockd fscache nfs_acl auth_rpcgss sunrpc igb dca [last unloaded:
cpufreq_ondemand]
2012-03-18 06:05:23
2012-03-18 06:05:23 Pid: 68872, comm: lmp Not tainted 2.6.32-220.7.1.3chaos.ch5.x86_64 #1 appro appro-512x/S2600JF
2012-03-18 06:05:23 RIP: 0010:[<ffffffff81052814>]  [<ffffffff81052814>] update_curr+0x144/0x1f0
2012-03-18 06:05:23 RSP: 0018:ffff8800366c3db8  EFLAGS: 00010086
2012-03-18 06:05:23 RAX: ffff8808334c9580 RBX: 00000000755fa0c8 RCX: ffff880437f111c0
2012-03-18 06:05:23 RDX: 0000000000018b48 RSI: 0000000000000000 RDI: ffff8808334c95b8
2012-03-18 06:05:23 RBP: ffff8800366c3de8 R08: ffffffff8160b665 R09: 0000000000000000
2012-03-18 06:05:23 R10: 0000000000000010 R11: 0000000000000000 R12: ffff8800366d5fe8
2012-03-18 06:05:23 R13: 00000000000f3c4d R14: 000081fe1182f6c5 R15: ffff8808334c9580
2012-03-18 06:05:23 FS:  00002aaaacfd0800(0000) GS:ffff8800366c0000(0000) knlGS:0000000000000000
2012-03-18 06:05:23 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
2012-03-18 06:05:23 CR2: 000000032cbc7960 CR3: 0000000326499000 CR4: 00000000000406e0
2012-03-18 06:05:23 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2012-03-18 06:05:23 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
2012-03-18 06:05:23 Process lmp (pid: 68872, threadinfo ffff8807755fa000, task ffff8808334c9580)
2012-03-18 06:05:23 Stack:
2012-03-18 06:05:23  ffff8800366c3dc8 ffffffff81013783 ffff8808334c95b8 ffff8800366d5fe8
2012-03-18 06:05:23 <0> 0000000000000000 0000000000000000 ffff8800366c3e18 ffffffff81052e2b
2012-03-18 06:05:23 <0> ffff8800366d5f80 0000000000000006 0000000000015f80 0000000000000006
2012-03-18 06:05:23 Call Trace:
2012-03-18 06:05:23  <IRQ>
2012-03-18 06:05:23  [<ffffffff81013783>] ? native_sched_clock+0x13/0x80
2012-03-18 06:05:23  [<ffffffff81052e2b>] task_tick_fair+0xdb/0x160
2012-03-18 06:05:23  [<ffffffff81056891>] scheduler_tick+0xc1/0x260
2012-03-18 06:05:23  [<ffffffff810a0e70>] ? tick_sched_timer+0x0/0xc0
2012-03-18 06:05:23  [<ffffffff8107c512>] update_process_times+0x52/0x70
2012-03-18 06:05:23  [<ffffffff810a0ed6>] tick_sched_timer+0x66/0xc0
2012-03-18 06:05:23  [<ffffffff8109555e>] __run_hrtimer+0x8e/0x1a0
2012-03-18 06:05:23  [<ffffffff81012b59>] ? read_tsc+0x9/0x20
2012-03-18 06:05:23  [<ffffffff81095906>] hrtimer_interrupt+0xe6/0x250
2012-03-18 06:05:23  [<ffffffff814f6f0b>] smp_apic_timer_interrupt+0x6b/0x9b
2012-03-18 06:05:23  [<ffffffff8100bc13>] apic_timer_interrupt+0x13/0x20
2012-03-18 06:05:23  <EOI>
2012-03-18 06:05:23  [<ffffffff8112adbe>] ? shrink_inactive_list+0x2de/0x740
2012-03-18 06:05:23  [<ffffffff8112ad72>] ? shrink_inactive_list+0x292/0x740
2012-03-18 06:05:23  [<ffffffff8112baef>] shrink_zone+0x38f/0x520
2012-03-18 06:05:23  [<ffffffff8112c894>] zone_reclaim+0x354/0x410
2012-03-18 06:05:23  [<ffffffff8112d4e0>] ? isolate_pages_global+0x0/0x350
2012-03-18 06:05:23  [<ffffffff81122d94>] get_page_from_freelist+0x694/0x820
2012-03-18 06:05:23  [<ffffffffa0475b6f>] ? cfs_hash_bd_from_key+0x3f/0xc0 [libcfs]
2012-03-18 06:05:23  [<ffffffffa05d4232>] ? lu_object_put+0x92/0x200 [obdclass]
2012-03-18 06:05:23  [<ffffffff811337c9>] ? zone_statistics+0x99/0xc0
2012-03-18 06:05:23  [<ffffffff81124011>] __alloc_pages_nodemask+0x111/0x940
2012-03-18 06:05:23  [<ffffffff8115e592>] kmem_getpages+0x62/0x170
2012-03-18 06:05:23  [<ffffffff8115ebff>] cache_grow+0x2cf/0x320
2012-03-18 06:05:23  [<ffffffff8115ee52>] cache_alloc_refill+0x202/0x240
2012-03-18 06:05:23  [<ffffffffa0469863>] ? cfs_alloc+0x63/0x90 [libcfs]
2012-03-18 06:05:23  [<ffffffff8115fb79>] __kmalloc+0x1a9/0x220
2012-03-18 06:05:23  [<ffffffffa0469863>] cfs_alloc+0x63/0x90 [libcfs]
2012-03-18 06:05:23  [<ffffffffa0702237>] ptlrpc_request_alloc_internal+0x167/0x360 [ptlrpc]
2012-03-18 06:05:23  [<ffffffffa070243e>] ptlrpc_request_alloc_pool+0xe/0x10 [ptlrpc]
2012-03-18 06:05:23  [<ffffffffa08d684b>] osc_brw_prep_request+0x1ab/0xcc0 [osc]
2012-03-18 06:05:23  [<ffffffffa08e7fbb>] ? osc_req_attr_set+0xfb/0x250 [osc]
2012-03-18 06:05:23  [<ffffffffa09f7638>] ? ccc_req_attr_set+0x78/0x150 [lustre]
2012-03-18 06:05:23  [<ffffffffa05e4e81>] ? cl_req_attr_set+0xd1/0x1a0 [obdclass]
2012-03-18 06:05:23  [<ffffffffa05e476c>] ? cl_req_prep+0x8c/0x130 [obdclass]
2012-03-18 06:05:23  [<ffffffffa08d821d>] osc_send_oap_rpc+0xebd/0x1690 [osc]
2012-03-18 06:05:23  [<ffffffffa08d17a6>] ? loi_list_maint+0xa6/0x130 [osc]
2012-03-18 06:05:23  [<ffffffffa08d8c9e>] osc_check_rpcs+0x2ae/0x3b0 [osc]
2012-03-18 06:05:23  [<ffffffffa08e86df>] osc_io_submit+0x1df/0x4a0 [osc]
2012-03-18 06:05:23  [<ffffffffa05e4938>] cl_io_submit_rw+0x78/0x130 [obdclass]
2012-03-18 06:05:23  [<ffffffffa094160f>] lov_io_submit+0x2bf/0x9d0 [lov]
2012-03-18 06:05:23  [<ffffffffa05e4938>] cl_io_submit_rw+0x78/0x130 [obdclass]
2012-03-18 06:05:23  [<ffffffffa09ddf15>] ll_writepage+0x205/0x3b0 [lustre]
2012-03-18 06:05:23  [<ffffffff8112a34b>] pageout.clone.1+0x12b/0x300
2012-03-18 06:05:23  [<ffffffff81129a01>] ? __remove_mapping+0x61/0x160
2012-03-18 06:05:23  [<ffffffff8112a8d2>] shrink_page_list.clone.0+0x3b2/0x5c0
2012-03-18 06:05:23  [<ffffffff811280bc>] ? release_pages+0x6c/0x250
2012-03-18 06:05:23  [<ffffffff8116804d>] ? mem_cgroup_del_lru_list+0x2d/0xc0
2012-03-18 06:05:23  [<ffffffff81168169>] ? mem_cgroup_del_lru+0x39/0x40
2012-03-18 06:05:23  [<ffffffff811689ae>] ? mem_cgroup_isolate_pages+0xee/0x1d0
2012-03-18 06:05:23  [<ffffffff8112addb>] shrink_inactive_list+0x2fb/0x740
2012-03-18 06:05:23  [<ffffffff8112baef>] shrink_zone+0x38f/0x520
2012-03-18 06:05:23  [<ffffffff811a0e4d>] ? bdi_queue_work+0x7d/0x110
2012-03-18 06:05:23  [<ffffffff8112bd7e>] do_try_to_free_pages+0xfe/0x520
2012-03-18 06:05:23  [<ffffffff8112c2ec>] try_to_free_mem_cgroup_pages+0x8c/0x90
2012-03-18 06:05:23  [<ffffffff811688c0>] ? mem_cgroup_isolate_pages+0x0/0x1d0
2012-03-18 06:05:23  [<ffffffff8116a450>] mem_cgroup_hierarchical_reclaim+0x2f0/0x460
2012-03-18 06:05:23  [<ffffffff8116b992>] __mem_cgroup_try_charge+0x202/0x460
2012-03-18 06:05:23  [<ffffffff811241e7>] ? __alloc_pages_nodemask+0x2e7/0x940
2012-03-18 06:05:23  [<ffffffff8116c4e7>] mem_cgroup_charge_common+0x87/0xd0
2012-03-18 06:05:23  [<ffffffff8116c658>] mem_cgroup_cache_charge+0x128/0x140
2012-03-18 06:05:23  [<ffffffff8111161a>] add_to_page_cache_locked+0x4a/0x140
2012-03-18 06:05:23  [<ffffffff8111173c>] add_to_page_cache_lru+0x2c/0x80
2012-03-18 06:05:23  [<ffffffff811122f9>] grab_cache_page_write_begin+0x99/0xc0
2012-03-18 06:05:23  [<ffffffffa09efa98>] ll_write_begin+0x58/0x170 [lustre]
2012-03-18 06:05:23  [<ffffffff81111b7e>] generic_file_buffered_write+0x10e/0x2a0
2012-03-18 06:05:23  [<ffffffff81070637>] ? current_fs_time+0x27/0x30
2012-03-18 06:05:23  [<ffffffff811134d0>] __generic_file_aio_write+0x250/0x480
2012-03-18 06:05:23  [<ffffffffa05d8d95>] ? cl_env_info+0x15/0x20 [obdclass]
2012-03-18 06:05:23  [<ffffffff8111376f>] generic_file_aio_write+0x6f/0xe0
2012-03-18 06:05:23  [<ffffffffa09fdf9e>] vvp_io_write_start+0x9e/0x1e0 [lustre]
2012-03-18 06:05:23  [<ffffffffa05e4a5a>] cl_io_start+0x6a/0x100 [obdclass]
2012-03-18 06:05:23  [<ffffffffa05e8094>] cl_io_loop+0xd4/0x160 [obdclass]
2012-03-18 06:05:23  [<ffffffffa09ba43b>] ll_file_io_generic+0x3bb/0x4b0 [lustre]
2012-03-18 06:05:23  [<ffffffffa09bf4ec>] ll_file_aio_write+0x13c/0x280 [lustre]
2012-03-18 06:05:23  [<ffffffffa09bf79c>] ll_file_write+0x16c/0x290 [lustre]
2012-03-18 06:05:23  [<ffffffff81177ae8>] vfs_write+0xb8/0x1a0
2012-03-18 06:05:23  [<ffffffff811784f1>] sys_write+0x51/0x90
2012-03-18 06:05:23  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
2012-03-18 06:05:23 Code: 00 8b 15 cc 36 a4 00 85 d2 74 34 48 8b 50 08 8b 5a 18 48 8b 90 10 09 00 00 48 8b 4a 50 48 85
c9 74 1d 48 63 db 66 90 48 8b 51 20 <48> 03 14 dd 20 73 bf 81 4c 01 2a 48 8b 49 78 48 85 c9 75 e8 48
2012-03-18 06:05:23 RIP  [<ffffffff81052814>] update_curr+0x144/0x1f0
2012-03-18 06:05:23  RSP <ffff8800366c3db8>
2012-03-18 06:05:23 CR2: 000000032cbc7960


 Comments   
Comment by Peter Jones [ 20/Mar/12 ]

Hongchao

Could you please comment on this one?

Thanks

Peter

Comment by Hongchao Zhang [ 20/Mar/12 ]

it's similar with LU-969, except this one goes a little further, we should use less stack space in our client IO layers.
there are some places which can reduce the stack by ~100 bytes, will attach a patch soon, but the better solution should be
moving datas in OSC layer into lu_env, e.g.

static int osc_brw_prep_request(int cmd, struct client_obd *cli,struct obdo *oa,
struct lov_stripe_md *lsm, obd_count page_count,
struct brw_page **pga,
struct ptlrpc_request **reqp,
struct obd_capa *ocapa, int reserve,
int resend)
{
struct ptlrpc_request *req;
struct ptlrpc_bulk_desc *desc;
struct ost_body *body;
struct obd_ioobj *ioobj;
struct niobuf_remote *niobuf;
int niocount, i, requested_nob, opc, rc;
struct osc_brw_async_args *aa;
struct req_capsule *pill;
struct brw_page *pg_prev;

ENTRY;
...
}

Comment by Hongchao Zhang [ 22/Mar/12 ]

the patch to move some data in OSC to lu_env is under creation&test, and will attach it ASAP

Comment by Hongchao Zhang [ 28/Mar/12 ]

status update:
moving variables into lu_env is of little usage to reduce stack, and a new patch with a separated patch to split the IO
in OSC layer(mainly call osc_check_rpcs) is under test, and it encounters several issues during testing, then it will
need a little more time to debug it before attaching it.

Comment by Hongchao Zhang [ 30/Mar/12 ]

the patch is tracked at http://review.whamcloud.com/#change,2418

by moving the variables into lu_env, the stack reduction is not obvious, the attached patch is using an separated thread
to call "osc_check_rpcs" to flush the pending IO of OSCs, and using this specific thread only if the stack usage of current
thread is too much (more than 3 *THREAD_SIZE / 4), then it should not affect the whole IO performance comparing to previous
case with many threads doing IO

Comment by Andreas Dilger [ 30/Mar/12 ]

We already have threads that send RPCs - that is ptlrpcd. Adding more threads in the very common IO path just to reduce stack usage will increase complexity and hurt performance due to extra context switches. We already have to do work to improve the NUMA CPU page locality for multiple ptlrpcds, and we would need to repeat this work for this new OSC thread to keep the page handling local to the CPU on which the pages were previously handled.

It looks like the main problem is that the write path is going into kernel memory allocation/reclamation in ptlrpc_request_alloc_pool(), but this thread is already in the memory reclaim path from the top-level write:

ll_file_write
  ->generic_file_buffered_write
    ->ll_write_begin
      ->grab_cache_page_write_begin
        ->do_try_to_free_pages
         ->shrink_zone
          ->shrink_page_list
            ->ll_writepage
              ->osc_send_oap_rpc
                ->ptlrpc_request_alloc_pool
                  ->cfs_alloc
                    ->get_page_from_freelist
                      ->zone_reclaim
                        ->shrink_zone

so adding another thread will not help very much because there will still be memory pressure, but it will slow down EVERY IO operation. A much better approach is to figure out why the ptlrpc request pool is not being used for this write request. It is entirely possible that the changes in 2.x for CLIO broke the initialization or usage of the ptlrpc request pool. This was added specifically to avoid memory allocations in the write path, to avoid memory allocation under pressure.

Comment by Build Master (Inactive) [ 30/Mar/12 ]

Integrated in lustre-reviews » x86_64,client,el6,inkernel #4589
LU-1243 osc: use separated thread for IO to reduce stack (Revision 4ce80fb0405bc88a69456a3e2a33d21b6fed3977)

Result = SUCCESS
hongchao.zhang : 4ce80fb0405bc88a69456a3e2a33d21b6fed3977
Files :

  • lustre/obdclass/lprocfs_status.c
  • lustre/osc/osc_request.c
  • lustre/lov/lov_obd.c
  • lustre/include/obd.h
  • lustre/include/obd_class.h
Comment by Build Master (Inactive) [ 30/Mar/12 ]

Integrated in lustre-reviews » i686,client,el6,inkernel #4589
LU-1243 osc: use separated thread for IO to reduce stack (Revision 4ce80fb0405bc88a69456a3e2a33d21b6fed3977)

Result = SUCCESS
hongchao.zhang : 4ce80fb0405bc88a69456a3e2a33d21b6fed3977
Files :

  • lustre/lov/lov_obd.c
  • lustre/include/obd.h
  • lustre/osc/osc_request.c
  • lustre/include/obd_class.h
  • lustre/obdclass/lprocfs_status.c
Comment by Build Master (Inactive) [ 30/Mar/12 ]

Integrated in lustre-reviews » x86_64,client,el5,inkernel #4589
LU-1243 osc: use separated thread for IO to reduce stack (Revision 4ce80fb0405bc88a69456a3e2a33d21b6fed3977)

Result = SUCCESS
hongchao.zhang : 4ce80fb0405bc88a69456a3e2a33d21b6fed3977
Files :

  • lustre/osc/osc_request.c
  • lustre/include/obd_class.h
  • lustre/obdclass/lprocfs_status.c
  • lustre/include/obd.h
  • lustre/lov/lov_obd.c
Comment by Build Master (Inactive) [ 30/Mar/12 ]

Integrated in lustre-reviews » i686,server,el5,inkernel #4589
LU-1243 osc: use separated thread for IO to reduce stack (Revision 4ce80fb0405bc88a69456a3e2a33d21b6fed3977)

Result = SUCCESS
hongchao.zhang : 4ce80fb0405bc88a69456a3e2a33d21b6fed3977
Files :

  • lustre/obdclass/lprocfs_status.c
  • lustre/include/obd_class.h
  • lustre/lov/lov_obd.c
  • lustre/include/obd.h
  • lustre/osc/osc_request.c
Comment by Build Master (Inactive) [ 30/Mar/12 ]

Integrated in lustre-reviews » x86_64,server,el6,inkernel #4589
LU-1243 osc: use separated thread for IO to reduce stack (Revision 4ce80fb0405bc88a69456a3e2a33d21b6fed3977)

Result = SUCCESS
hongchao.zhang : 4ce80fb0405bc88a69456a3e2a33d21b6fed3977
Files :

  • lustre/obdclass/lprocfs_status.c
  • lustre/osc/osc_request.c
  • lustre/include/obd.h
  • lustre/lov/lov_obd.c
  • lustre/include/obd_class.h
Comment by Build Master (Inactive) [ 30/Mar/12 ]

Integrated in lustre-reviews » i686,server,el6,inkernel #4589
LU-1243 osc: use separated thread for IO to reduce stack (Revision 4ce80fb0405bc88a69456a3e2a33d21b6fed3977)

Result = SUCCESS
hongchao.zhang : 4ce80fb0405bc88a69456a3e2a33d21b6fed3977
Files :

  • lustre/osc/osc_request.c
  • lustre/lov/lov_obd.c
  • lustre/include/obd_class.h
  • lustre/include/obd.h
  • lustre/obdclass/lprocfs_status.c
Comment by Build Master (Inactive) [ 30/Mar/12 ]

Integrated in lustre-reviews » x86_64,server,el5,inkernel #4589
LU-1243 osc: use separated thread for IO to reduce stack (Revision 4ce80fb0405bc88a69456a3e2a33d21b6fed3977)

Result = SUCCESS
hongchao.zhang : 4ce80fb0405bc88a69456a3e2a33d21b6fed3977
Files :

  • lustre/osc/osc_request.c
  • lustre/include/obd_class.h
  • lustre/obdclass/lprocfs_status.c
  • lustre/include/obd.h
  • lustre/lov/lov_obd.c
Comment by Build Master (Inactive) [ 30/Mar/12 ]

Integrated in lustre-reviews » i686,client,el5,inkernel #4589
LU-1243 osc: use separated thread for IO to reduce stack (Revision 4ce80fb0405bc88a69456a3e2a33d21b6fed3977)

Result = SUCCESS
hongchao.zhang : 4ce80fb0405bc88a69456a3e2a33d21b6fed3977
Files :

  • lustre/osc/osc_request.c
  • lustre/lov/lov_obd.c
  • lustre/include/obd_class.h
  • lustre/obdclass/lprocfs_status.c
  • lustre/include/obd.h
Comment by Hongchao Zhang [ 31/Mar/12 ]

1, the stack used by osc_check_rpcs0/osc_send_oap_rpc/osc_build_req/osc_brw_prep_request is ~350bytes
(only account local variables, not include arguments).

2, the ptlrpc_request is called by osc_check_rpcs, which can't guarantee it must to be write request, and for read request,
it will don't use the pool

cfs_alloc+0x63/0x90 [libcfs]
ptlrpc_request_alloc_internal+0x167/0x360 [ptlrpc]
ptlrpc_request_alloc_pool+0xe/0x10 [ptlrpc]
osc_brw_prep_request+0x1ab/0xcc0 [osc]
? osc_req_attr_set+0xfb/0x250 [osc]
? ccc_req_attr_set+0x78/0x150 [lustre]
? cl_req_attr_set+0xd1/0x1a0 [obdclass]
? cl_req_prep+0x8c/0x130 [obdclass]
osc_send_oap_rpc+0xebd/0x1690 [osc]
? loi_list_maint+0xa6/0x130 [osc]
osc_check_rpcs+0x2ae/0x3b0 [osc]
osc_io_submit+0x1df/0x4a0 [osc]
cl_io_submit_rw+0x78/0x130 [obdclass]
lov_io_submit+0x2bf/0x9d0 [lov]
cl_io_submit_rw+0x78/0x130 [obdclass]
ll_writepage+0x205/0x3b0 [lustre]

static int osc_brw_prep_request(int cmd, struct client_obd *cli,struct obdo *oa, ...
{
...
if ((cmd & OBD_BRW_WRITE) != 0)

{ opc = OST_WRITE; req = ptlrpc_request_alloc_pool(cli->cl_import, cli->cl_import->imp_rq_pool, &RQF_OST_BRW_WRITE); }

else

{ opc = OST_READ; req = ptlrpc_request_alloc(cli->cl_import, &RQF_OST_BRW_READ); }

...
}

struct ptlrpc_request *ptlrpc_request_alloc(struct obd_import *imp,
const struct req_format *format)
{
return ptlrpc_request_alloc_internal(imp, NULL, format);
}

3, the ptlrpcd is for async io, the new introduced thread will block the calling thread until it finish, which is better?

4, the separated thread will only be used if the calling thread is under low stack space, and will it also affect IO much?

Comment by Andreas Dilger [ 03/Apr/12 ]

Even in the case of read operations at the top level, if the free memory is low and the VM is trying to flush dirty data from cache, it will be a write operation at the bottom of the stack that will use the RPC request pool.

Getting the RPC request pool working will not only (hopefully) fix this stack usage issue, but also has the benefit of avoiding memory allocation deadlocks when the client and server are running on the same machine, and avoids making large RPC request allocations frequenty (which is why it was added originally). So there is a benefit in getting this code working properly for several reasons, not just the stack usage. Also, it is best to avoid making the code more complex (i.e. adding another thread) if this is not necessary.

Comment by Jinshan Xiong (Inactive) [ 16/Apr/12 ]

I'd like to suggest to issue BRW RPC asynchronously so that we can reduce stack usage dramatically.

I have added the functionality to issue async RPC at commit 2064c2a7e616b172f72462884b23d899bfc040ff so I think we can port it to b2_1 and fire writeback request in osc_io_submit() too.

To best honest, obd interfaces are almost obsoleted in client IO so any attempt to add new obd ops is not good.

Comment by Hongchao Zhang [ 17/Apr/12 ]

the updated patch based on the above async PTLRPC is under creation&testing.

Comment by Hongchao Zhang [ 19/Apr/12 ]

the updated patch is under testing and will attach it soon.

Comment by Peter Jones [ 20/Apr/12 ]

http://review.whamcloud.com/#change,2418

Comment by Hongchao Zhang [ 07/Jun/12 ]

after rebasing the patch, two new problems are found, one is related to IO requests advancing, the IO performance is very
slow during testing sanity, another is a deadlock issue during testing sanity sub-test 42e

the updated test is under testing, hope to attach it tomorrow.

Comment by Hongchao Zhang [ 14/Aug/12 ]

there are much change in OSC module recently, and the patch needed to be updated against it.

Comment by Andreas Dilger [ 29/May/17 ]

Close old ticket.

Generated at Sat Feb 10 01:14:53 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.