Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.12.1
-
None
-
RHEL 7.6
-
3
-
9223372036854775807
Description
See below for stripe details on the file "mirror10". If OST idx 1 is unmounted and made unavailable, performance drops down to 1/10th of expected performance. The client has to timeout on OST idx1 before it tries to read from OST idx 7. This happens for each 1MB block as that is the block size being used resulting in very poor performance.
$ lfs getstripe mirror10 mirror10 lcm_layout_gen: 5 lcm_mirror_count: 2 lcm_entry_count: 2 lcme_id: 65537 lcme_mirror_id: 1 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 lmm_pool: 01 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x280a8:0x0] } lcme_id: 131074 lcme_mirror_id: 2 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 7 lmm_pool: 02 lmm_objects: - 0: { l_ost_idx: 7, l_fid: [0x100070000:0x28066:0x0] }
Alex, ideally there would be a two-stage approach for FLR. For reads it would try whichever OST is preferred. If the OSC is offline then it could be skipped initially, and the read go to the other mirror copies if the OSCs are online. If none are online, then it should wait on the preferred OSC. For writes, the MDS selects which replica should be used, so the client will have to wait until the OSC is connected again.