Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.12.1
-
None
-
RHEL 7.6
-
3
-
9223372036854775807
Description
See below for stripe details on the file "mirror10". If OST idx 1 is unmounted and made unavailable, performance drops down to 1/10th of expected performance. The client has to timeout on OST idx1 before it tries to read from OST idx 7. This happens for each 1MB block as that is the block size being used resulting in very poor performance.
$ lfs getstripe mirror10 mirror10 lcm_layout_gen: 5 lcm_mirror_count: 2 lcm_entry_count: 2 lcme_id: 65537 lcme_mirror_id: 1 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 lmm_pool: 01 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x280a8:0x0] } lcme_id: 131074 lcme_mirror_id: 2 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 7 lmm_pool: 02 lmm_objects: - 0: { l_ost_idx: 7, l_fid: [0x100070000:0x28066:0x0] }
Yes, I found this piece of code from my local branch. That version of patch was an earlier version of implementation. Notice that the name was 'replica' at that time. I believe it should be in one of abandoned patch.
If the connection is in IDLE state, probably it should return immediately if the RPC has `rq_no_delay` set, and I tend to think it also should kick off the reconnection asynchronously.
In the current implementation of FLR, it iterates all mirrors until it finds one available to read. If none is available, it will wait for 10ms and restart trying. Hopefully, it would find that one mirror that becomes available to read if it kicks off reconnect earlier.