Affects Version/s: Lustre 2.6.0
Fix Version/s: None
1. OUT RPC service threads on MDT and OST using different reply portals confused the OUT RPC user.
On MDT-side, it is:
On OST-side, it is:
For the case that both MDT and OST runs on the same physical server node (especially for VM environment testing), when OSP wants to talk with OST via OUT_PORTAL, the OUT RPC maybe handled by MDT-side OUT RPC service thread unexpected, and replied via MDC_REPLY_PORTAL, instead of OSC_REPLY_PORTAL on which the OSP is waiting for the reply. Then caused the OSP-side OUT RPC timeout and resend again and again.
The bad case also can happen when OSP wants to talk with MDT via OUT_PORTAL.
Because NDE I has already used the OUT RPC for talking among MDTs. To be compatible with the old version, we cannot change the MDT-side OUT RPC reply portal. So we have to chance OST-side OUT RPC reply portal to "MDC_REPLY_PORTAL". But it is strange for OST-side to use MDT-side reply portal.
2. The OUT RPC version is fixed on "LUSTRE_MDS_VERSION", in spite of the RPC is to MDT or to OST. Also confused others. We can re-define "tgt_out_handlers". But it may break the policy of Unified Target.
3. Pack multiple idempotent sub-requests into single OUT RPC. In general, the OUT RPC should not assume that the sub-requests are related with each other. So even if one sub-request failed to be executed, the others should not be ignored. But in current implementation, it is not. If the other sub-requests are not related with the failed one, then such behavior is unexpected. Unfortunately, it is not easy to judge whether one sub-request is related with the others within current OUT request format, especially consider to be compatible with DNE I.
4. Iteration via OUT. I found some client-side iteration framework in osp_md_object.c, but seems no server side handler. Do we have any plan to support that?