Loading...

Details

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.12.8
Labels:
None
Environment:
CentOS 7.9 - Lustre 2.12.7 servers and 2.12.8 clients with kernel 3.10.0-1160.53.1.el7.x86_64

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

We're re-exporting a Lustre filesystem over NFSv4+krb5 (using nfsd) and I believe we're hitting the problem of poor write performance described in LU-11070 (Partial page write by NFS).

When I write to a file over NFS with nice 1 MB blocks, I can see many 1-page read rpcs to the OSTs (from the client that re-exports Lustre over NFSv4).

The writes done by nfsd must be small or misaligned leading to ll_io_read_page() being called on ll_write_begin():

[1840548.682492] nfsd            S ffff900c6a12a100     0  2408      2 0x00000080
[1840548.683626] Call Trace:
[1840548.684002]  [<ffffffffb05891b9>] schedule+0x29/0x70
[1840548.684810]  [<ffffffffc0ed0815>] cl_sync_io_wait+0x2b5/0x3d0 [obdclass]
[1840548.685843]  [<ffffffffafedadf0>] ? wake_up_state+0x20/0x20
[1840548.686703]  [<ffffffffc130c076>] ll_io_read_page+0x306/0x4c0 [lustre]
[1840548.687715]  [<ffffffffc11c44b2>] ? lov_attr_get+0x22/0x60 [lov]
[1840548.688629]  [<ffffffffc13200b5>] ll_write_begin+0x505/0xc00 [lustre]
[1840548.689611]  [<ffffffffaffbdc7f>] generic_file_buffered_write+0x10f/0x270
[1840548.690628]  [<ffffffffaffc08e2>] __generic_file_aio_write+0x1e2/0x400
[1840548.691639]  [<ffffffffc132dfab>] __generic_file_write_iter+0xcb/0x340 [lustre]
[1840548.692763]  [<ffffffffc1332304>] vvp_io_write_start+0x504/0xa70 [lustre]
[1840548.693844]  [<ffffffffc0ecd485>] ? cl_lock_enqueue+0x65/0x120 [obdclass]
[1840548.694885]  [<ffffffffc0ecf9e8>] cl_io_start+0x68/0x130 [obdclass]
[1840548.695886]  [<ffffffffc0ed1c5c>] cl_io_loop+0xcc/0x1c0 [obdclass]
[1840548.696879]  [<ffffffffc12e73bb>] ll_file_io_generic+0x63b/0xc90 [lustre]
[1840548.697935]  [<ffffffffafe3800f>] ? __kernel_fpu_end+0x1f/0x40
[1840548.698845]  [<ffffffffc12e7ea9>] ll_file_aio_write+0x289/0x660 [lustre]
[1840548.699852]  [<ffffffffb004f5be>] ? rw_copy_check_uvector+0x5e/0x140
[1840548.700826]  [<ffffffffb004db6b>] do_sync_readv_writev+0x7b/0xd0
[1840548.701759]  [<ffffffffb004f76e>] do_readv_writev+0xce/0x260
[1840548.702652]  [<ffffffffc12e7c20>] ? ll_file_splice_read+0x210/0x210 [lustre]
[1840548.703722]  [<ffffffffc12e8280>] ? ll_file_aio_write+0x660/0x660 [lustre]
[1840548.704780]  [<ffffffffafed31f1>] ? __wake_up_common_lock+0x91/0xc0
[1840548.705748]  [<ffffffffafed2c60>] ? task_rq_unlock+0x20/0x20
[1840548.706688]  [<ffffffffb004f995>] vfs_writev+0x35/0x60
[1840548.707528]  [<ffffffffc0887e85>] nfsd_vfs_write+0xb5/0x390 [nfsd]
[1840548.708498]  [<ffffffffc0895b7f>] nfsd4_write+0x1cf/0x240 [nfsd]
[1840548.709456]  [<ffffffffc089774d>] nfsd4_proc_compound+0x3dd/0x780 [nfsd]
[1840548.710538]  [<ffffffffc0882810>] nfsd_dispatch+0xe0/0x290 [nfsd]
[1840548.711463]  [<ffffffffc083d850>] svc_process_common+0x400/0x7d0 [sunrpc]
[1840548.712526]  [<ffffffffc083dd13>] svc_process+0xf3/0x110 [sunrpc]
[1840548.713463]  [<ffffffffc088216f>] nfsd+0xdf/0x150 [nfsd]
[1840548.714300]  [<ffffffffc0882090>] ? nfsd_destroy+0x80/0x80 [nfsd]
[1840548.715244]  [<ffffffffafec5e61>] kthread+0xd1/0xe0
[1840548.715987]  [<ffffffffafec5d90>] ? insert_kthread_work+0x40/0x40
[1840548.716963]  [<ffffffffb0595de4>] ret_from_fork_nospec_begin+0xe/0x21
[1840548.717976]  [<ffffffffafec5d90>] ? insert_kthread_work+0x40/0x40

This is dramatically affecting write performance to Lustre over NFS... The NFS server/lustre client is busy with many read rpcs in flight to the OST serving the file (98% of 1-page rpcs):

[root@oak-h03v09 .testsrcc]# lctl get_param osc.oak-OST015c*.rpc_stats
osc.oak-OST015c-osc-ffff900c98c2a000.rpc_stats=
snapshot_time:         1646445606.359753793 (secs.nsecs)
read RPCs in flight:  9
write RPCs in flight: 0
pending write pages:  1270
pending read pages:   1

			read			write
pages per rpc         rpcs   % cum % |       rpcs   % cum %
1:		    940581  98  98   |        206   4   4
2:		      8652   0  98   |         76   1   5
4:		       810   0  98   |         53   1   6
8:		        80   0  98   |         26   0   7
16:		       100   0  99   |         26   0   7
32:		       618   0  99   |         31   0   8
64:		       656   0  99   |        315   6  14
128:		        67   0  99   |        341   6  21
256:		       231   0  99   |        696  13  35
512:		        55   0  99   |        535  10  45
1024:		      7886   0 100   |       2727  54 100

			read			write
rpcs in flight        rpcs   % cum % |       rpcs   % cum %
0:		         0   0   0   |          0   0   0
1:		      9849   1   1   |       3854  76  76
2:		     51413   5   6   |        635  12  89
3:		      2920   0   6   |        214   4  93
4:		      1945   0   6   |        106   2  95
5:		      1992   0   7   |         65   1  96
6:		      5397   0   7   |         49   0  97
7:		      9105   0   8   |         28   0  98
8:		     30909   3  11   |         21   0  98
9:		    845544  88  99   |         58   1  99
10:		       662   0 100   |          2   0 100

			read			write
offset                rpcs   % cum % |       rpcs   % cum %
0:		      1520   0   0   |       1073  21  21
1:		        51   0   0   |          5   0  21
2:		       128   0   0   |          1   0  21
4:		       171   0   0   |          1   0  21
8:		       100   0   0   |          2   0  21
16:		       188   0   0   |          0   0  21
32:		       397   0   0   |          0   0  21
64:		       797   0   0   |          0   0  21
128:		      1545   0   0   |          0   0  21
256:		      3816   0   0   |          4   0  21
512:		      5070   0   1   |          8   0  21
1024:		      8483   0   2   |         59   1  22
2048:		     10840   1   3   |         78   1  24
4096:		     17936   1   5   |        131   2  27
8192:		     40789   4   9   |        203   4  31
16384:		     32536   3  12   |        253   5  36
32768:		     35348   3  16   |        296   5  42
65536:		     43505   4  21   |        429   8  50
131072:		    209057  21  42   |        568  11  61
262144:		    358818  37  80   |        909  18  79
524288:		    188641  19 100   |       1012  20 100

If I take some Lustre debug logs, I see many small writes:

[root@oak-h03v09 .testsrcc]# grep ll_write_end /tmp/dk5  | grep pos | head
00000080:00200000:11.0:1646438240.275220:0:2401:0:(rw26.c:830:ll_write_end()) pos 84250624, len 3848, copied 3848
00000080:00200000:11.0:1646438240.275335:0:2401:0:(rw26.c:830:ll_write_end()) pos 84254472, len 248, copied 248
00000080:00200000:13.0:1646438240.290720:0:2405:0:(rw26.c:830:ll_write_end()) pos 86269952, len 3848, copied 3848
00000080:00200000:13.0:1646438240.290810:0:2405:0:(rw26.c:830:ll_write_end()) pos 86273800, len 248, copied 248
00000080:00200000:13.0:1646438240.295724:0:2403:0:(rw26.c:830:ll_write_end()) pos 89305088, len 3848, copied 3848
00000080:00200000:13.0:1646438240.295808:0:2403:0:(rw26.c:830:ll_write_end()) pos 89308936, len 248, copied 248
00000080:00200000:11.0:1646438240.325755:0:2401:0:(rw26.c:830:ll_write_end()) pos 84254720, len 3848, copied 3848
00000080:00200000:11.0:1646438240.325846:0:2401:0:(rw26.c:830:ll_write_end()) pos 84258568, len 248, copied 248
00000080:00200000:13.0:1646438240.340791:0:2405:0:(rw26.c:830:ll_write_end()) pos 86274048, len 3848, copied 3848
00000080:00200000:13.0:1646438240.340971:0:2405:0:(rw26.c:830:ll_write_end()) pos 86277896, len 248, copied 248

Also many of these:

/tmp/dk6:00000080:00000001:10.0:1646441821.230362:0:2402:0:(rw26.c:634:ll_prepare_partial_page()) Process leaving
/tmp/dk6:00000080:00000001:10.0:1646441821.230589:0:2402:0:(rw26.c:591:ll_prepare_partial_page()) Process entered

I'm attaching a sample of logs with +ALL and +rpc as client-nfs-server_oak-h03v09.log.gz

On the Lustre side, I disabled readahead, statahead, played with max_dirty_mb, max_rpcs_in_flight... without luck.
On the NFS side, I tried sec=krb5/krb5i/krb5p without luck. We run Kernel 3.10.0-1160.53.1.el7.x86_64 on the client / NFS server.

I'm wondering if anyone has some ideas on how to fix this problem. Happy to provide more logs if needed. Thanks!

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

client-nfs-server_oak-h03v09.log.gz
5.47 MB
05/Mar/22 2:13 AM

Poor write performance with nfsd over Lustre

Details

Description

Attachments

Attachments

Activity

People

Dates