[LU-2001] read operation is slow when mmap is enabled Created: 21/Sep/12 Updated: 08/Feb/18 Resolved: 08/Feb/18 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.8 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Shuichi Ihara (Inactive) | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | patch | ||
| Environment: |
lustre-1.8.8 CentOS5.8 |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 8891 |
| Description |
|
There is an application and running on the lustre. When we run this application trough the lustre client, it's really slow and application seems to be stall.
The application is doing the following operations. Any ideas why application is fast if it goes through the NFS layer to lustre? |
| Comments |
| Comment by Peter Jones [ 21/Sep/12 ] |
|
Ihara Is this different behaviour compared to earlier 1.8.x releases? If not could this be a known issue. Jinshan You did the performance improvements for mmap that went into 2.2 (under Thanks Peter |
| Comment by Jinshan Xiong (Inactive) [ 21/Sep/12 ] |
|
This seems like readahead issue. Ihara, does the application use mmap to read file and write it via vfs interface? If this is the case, we can try to reproduce by issuing mmap read only.
it smells unrelated. If my above guess is wrong, I will verify this. |
| Comment by Shuichi Ihara (Inactive) [ 21/Sep/12 ] |
|
Jinshan, |
| Comment by Jinshan Xiong (Inactive) [ 22/Sep/12 ] |
|
This could be nfs or lustre problem. Please show me the stats of: |
| Comment by Shuichi Ihara (Inactive) [ 23/Sep/12 ] |
|
Jinshan, the job needs the long time to finish, so, I started 8 jobs on single client, and collected statics only first 1800 sec. even first 1800 sec, we can see the performance difference between NFS on Lustre vs native Lustre. there is two test cases. (1xOSS, 1xOST and 1xClient) 1. Native lustre client
2. NFS on Lustre, (the lustre client mounts the lustre and export it with for NFS. same client mounts it via NFS as loopbackmount)
192.168.100.126@tcp:/lustre
15283944352 68657584 14450657920 1% /lustre
localhost:/lustre 15283944448 68657152 14450658304 1% /lustre-nfs
Here is output files during 1800sec. 2.3GB file per jobs are created when application runs through the native lustre client, but 7.4GB files per jobs are created when if application goes through the NFS to the Lustre. Native Lustre # cat lustre.8/output.log total 18161696 -rw-r--r-- 1 root root 2325741568 Sep 24 03:01 18855.out -rw-r--r-- 1 root root 2325741568 Sep 24 03:01 18858.out -rw-r--r-- 1 root root 2325741568 Sep 24 03:01 18861.out -rw-r--r-- 1 root root 2325741568 Sep 24 03:01 18864.out -rw-r--r-- 1 root root 2325741568 Sep 24 03:01 18867.out -rw-r--r-- 1 root root 2325741568 Sep 24 03:01 18870.out -rw-r--r-- 1 root root 2325741568 Sep 24 03:01 18873.out -rw-r--r-- 1 root root 2325741568 Sep 24 03:01 18876.out NFS on Lustre # cat nfs_on_lustre.8/output.log total 58710112 -rw-r--r-- 1 root root 7584350208 Sep 24 02:28 17205.out -rw-r--r-- 1 root root 7462715392 Sep 24 02:28 17208.out -rw-r--r-- 1 root root 7462715392 Sep 24 02:28 17211.out -rw-r--r-- 1 root root 7462715392 Sep 24 02:28 17214.out -rw-r--r-- 1 root root 7523532800 Sep 24 02:28 17217.out -rw-r--r-- 1 root root 7523532800 Sep 24 02:28 17220.out -rw-r--r-- 1 root root 7643070464 Sep 24 02:28 17223.out -rw-r--r-- 1 root root 7523532800 Sep 24 02:28 17226.out You can see Lustre statics as well. with native Lustre, mostly 1 page per RPC. Lustre read write pages per rpc rpcs % cum % | rpcs % cum % 1: 4973077 99 99 | 0 0 0 2: 9224 0 99 | 0 0 0 4: 1216 0 99 | 1 0 0 8: 949 0 99 | 0 0 0 16: 1081 0 99 | 0 0 0 32: 1255 0 99 | 0 0 0 64: 1099 0 99 | 0 0 0 128: 600 0 99 | 3 0 0 256: 16625 0 100 | 17736 99 100 NFS on Lustre read write pages per rpc rpcs % cum % | rpcs % cum % 1: 2394 7 7 | 0 0 0 2: 812 2 9 | 0 0 0 4: 935 2 12 | 1 0 0 8: 1111 3 16 | 1 0 0 16: 1624 5 21 | 1 0 0 32: 2283 7 28 | 1 0 0 64: 3083 9 37 | 0 0 0 128: 4324 13 51 | 0 0 0 256: 15755 48 100 | 57389 99 100 |
| Comment by Jinshan Xiong (Inactive) [ 24/Sep/12 ] |
|
From the readahead stats in the log file, it can interpret that what's going on: llite.lustre-ffff810744e39800.read_ahead_stats= snapshot_time 1348423289.615683 secs.usecs hits 4345662 samples [pages] misses 5015426 samples [pages] readpage not consecutive 5469845 samples [pages] miss inside window 1812 samples [pages] failed grab_cache_page 3187455 samples [pages] read but discarded 15712 samples [pages] zero size window 6238737 samples [pages] and nfs over lustre: llite.lustre-ffff81054d3c6c00.read_ahead_stats= snapshot_time 1348421290.346850 secs.usecs hits 4263548 samples [pages] misses 17453 samples [pages] readpage not consecutive 335391 samples [pages] failed grab_cache_page 5578837 samples [pages] read but discarded 24 samples [pages] zero size window 113 samples [pages] hit max r-a issue 3 samples [pages] See the miss rate, lustre native is much higher, this caused one page RPCs were issued. The application is accessing the file in random pattern. See the zero-size-window stats in native lustre case. (I don't check the code of nfs) I think there are two readahead in the nfs-over-lustre case, where 1, mmap on nfs client and 2, vfs_read in lustre. This makes the readahead would be a little more aggressive - the side effect is that the high rate of 'failed grab_cache_page', I guess it was short of memory at that time. In general, this kind of issue is hard to fix as what should be blamed is the application itself |
| Comment by Shuichi Ihara (Inactive) [ 25/Sep/12 ] |
|
Jinshan, I just tested it on lustre-2.1.3 and it worked well as well as that we are seeing numbers over nfs on lustre-1.8.x. |
| Comment by Jinshan Xiong (Inactive) [ 25/Sep/12 ] |
|
Interesting.. both rpc and read ahead stats are much better in 2.1.3. I remember there was a readahead bug fixed by wangdi. I'll update it here after I find it out. |
| Comment by Jinshan Xiong (Inactive) [ 25/Sep/12 ] |
|
Recently there are two readahead patches committed:
Can you please check if they are in your branch? |
| Comment by Shuichi Ihara (Inactive) [ 25/Sep/12 ] |
|
Hm. these are already landed in 1.8.8 (or before 1.8.8). We are using lustre-1.8.8 here, these patches should be included.. |
| Comment by Di Wang [ 25/Sep/12 ] |
|
Ihara, could you please gather some "+reada" debug log(maybe only reada, lctl set_param debug=reada) for me during the running? Thanks. |
| Comment by Shuichi Ihara (Inactive) [ 25/Sep/12 ] |
|
Wangdi, |
| Comment by Di Wang [ 25/Sep/12 ] |
|
Ihara, I just post a patch here, http://review.whamcloud.com/4097 Could you please try it? According to the debug log you post, it seems Readahead does not work well in mmap, the reason why exporting NFS works better is that NFS will open the file for every read, which refreshs the readahead state, and it happens to help "correct" the behaviour. Also I found a lot random 4k read in the debug log, is this expected for this job? And I also do not understand why 2.1.3 works better here. So could you please 1 set debug mask as "warning error emerg vfstrace reada console" 2. run the test over 1.8.8 and 2.1.3. And post the debug_log , llite..read_ahead_stats and osc..rpc_stats here. Thanks. |
| Comment by Shuichi Ihara (Inactive) [ 26/Sep/12 ] |
|
wangdi, yes, there are few inputs files during the single job. The application reads these input files on some points, but it's very random read IO I believe. Anyway, I tested patches, but it didn't help very much. And, I collected debug and statics the following version and uploaded files to under /uploads/ centos5.8-lustre-1.8.8 The Linux distro is client side and the server side is running with lustre-2.1.3 on centos6.3. I tested lustre-1.8.8 on centos5.8 for server, but no much differences. For 2.1.3 vs 1.8.x testing, I used 2.1.3 on Centos6.3 for server. Please check and analyze on them. Thank you! |
| Comment by Di Wang [ 27/Sep/12 ] |
|
Ihara, I did not see 2.1.3 debug log there? Could you please upload 2.1.3 log for me, if you have. Thanks. |
| Comment by Shuichi Ihara (Inactive) [ 27/Sep/12 ] |
|
Di, sorry, just uploaded centos6.3-lustre-2.1.3-IB.tar.gz btw, the customer environment is 1.8.x, we want to fix and speedup on 1.8.x that we are seeing nfs on lustre-1.8.x.. thank you very much! |
| Comment by Di Wang [ 27/Sep/12 ] |
|
Ihara, I just upgraded the patch, http://review.whamcloud.com/#change,4097 Could you please try this one, and get those information for me. Thanks. |
| Comment by Shuichi Ihara (Inactive) [ 29/Sep/12 ] |
|
Di, I tested new patch (patch set 4), but still no luck. The original behavior was better. I've uploaded the debug information and results on the same place. And, the application is opensource, I uploaded it as well if it helps for analysis. Thanks! |
| Comment by Di Wang [ 05/Oct/12 ] |
|
Ihara, unfortunately I am busy with another urgent project these days, so I did not take time to see your log yet. And I will take vacation next week, my colleague will probably continue work on this problem. Could you please tell me how did you run LAST (exact command line please)? Thanks. |
| Comment by Shuichi Ihara (Inactive) [ 05/Oct/12 ] |
|
Hi WangDi, |
| Comment by Shuichi Ihara (Inactive) [ 13/Oct/12 ] |
|
Hi I uploaded all comporments (includes input files) on uploads/ Please let me know if you have any questions. |
| Comment by Shuichi Ihara (Inactive) [ 01/Nov/12 ] |
|
WanDi, did you have an chance to test LAST on your environment? |
| Comment by Di Wang [ 03/Nov/12 ] |
|
Hi, ihara Sorry, not yet. I am quite busy with DNE since I got back from vacation. I will try to set aside sometime next week. Btw: did you try this test on a local environment? Does that mean I have to setup a 30GB lustre environment to repeat what you have done? |
| Comment by Di Wang [ 05/Nov/12 ] |
|
Hi, ihara Hmm, I just checked patch-4(centos5.8-lustre-1.8.8-IB-patch4) result. The rpc quality seems better osc.lustre-OST0000-osc-ffff88106a6ab000.rpc_stats= read write Here is the original result osc.lustre-OST0000-osc-ffff81107910cc00.rpc_stats= read write So why do you said "the original behaviour was better" ? Please correct me if I missed sth. |
| Comment by Shuichi Ihara (Inactive) [ 05/Nov/12 ] |
|
WangDi, Thanks for your testing. Yes, I also saw better behavior on rpc_stats with patch4, but read performance was dropped compared without patch. Also, when did you get this statistics after job starts? My understanding, once job starts it read files with mmap(), then, compute something (and read() ) and write out files with write(). My statices was after the mmap(). |
| Comment by Di Wang [ 05/Nov/12 ] |
|
Oh, I did not try the test, Just looked at the result you post here. I do not have rhel5 environment locally. I will see what I can do here. Thanks. |
| Comment by Di Wang [ 06/Nov/12 ] |
|
Hi, Ihara sorry, I can not find node to run the test locally, all my nodes are rhel6 here. I investigated those upload logs again, centos6.3-lustre-2.1.3-IB is for lustre 2.1 or lustre 2.1 + nfs, I saw a lot read, instead of mmap in this log. Could you please check. Do you have debug log for lustre 1.8 + nfs ? |
| Comment by Shuichi Ihara (Inactive) [ 29/Jun/13 ] |
|
comparing miss rates of lustre-2.4, lustre-1.8 and nfs over lustre-1.8. |
| Comment by Shuichi Ihara (Inactive) [ 29/Jun/13 ] |
|
lustre-1.8's peak miss rates were much higher than lustre-2.x and nfs over lustre-1.8 during application was running. |
| Comment by Li Xi (Inactive) [ 29/Jun/13 ] |
|
Here is the patch that makes the application much faster. |
| Comment by Li Xi (Inactive) [ 22/Jul/13 ] |
|
Again, following patch makes the application runs much faster. |
| Comment by Jinshan Xiong (Inactive) [ 08/Feb/18 ] |
|
close old tickets |