Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 1.8.6
-
None
-
Tested on DDN SFA 10K with Infiniband and patches for
LU-15applied to 1.8.4. This testing started before 1.8.6 was tagged.
-
2
-
8548
Description
Running obdfilter-survey with LU-15 patches on DDN SFA 10K it appeared the IO from Lustre to disk was not aligned because the sizes were observed to be 1020K and 4k. As the file size exceeded cache the performance issue was very aparent. Setting vm.min_free_kbyte did not help this performance issue at all. For example using obdfilter-survey to write an 8GB file to each OST woudl show approximately 30% unaligned I/O. The alingment issue was seen by observing cache statics on the DDN SFA 10K controller.
Once we remove the shrink file_max_cache patch [define FILTER_MAX_CACHE_SIZE (8 * 1024 * 1024)] the alignment issue goes away. The many unaligned IO seems to be caused by this change in this patch and once I changed cache_file_size to 18446744073709551615 (which is 1.8.4 and 1.8.5 deafult), all IO were comming to SFA10K as aligned I/O.
Disabling the read cache (lctl set_param=obdfilter.*.read_cache_enable=0) doesn't help which is still very strange to me..
The only work around we have found is changing cache_file_size to a large size is only way to avoid this issue 1.8.6WC. This could have other performance implications as well.
We hope to post some numbers and statitics but we need additional runs to gather that information.
Attachments
Issue Links
- is related to
-
LU-918 ensure that BRW requests prevent lock timeout
- Closed
-
LU-12071 bypass pagecache for large files
- Resolved
- Trackbacks
-
Lustre 1.8.x known issues tracker While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA