Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.6.0
-
3
-
11467
Description
1. OUT RPC service threads on MDT and OST using different reply portals confused the OUT RPC user.
On MDT-side, it is:
.psc_buf = { .bc_nbufs = MDS_NBUFS, .bc_buf_size = OUT_BUFSIZE, .bc_req_max_size = OUT_MAXREQSIZE, .bc_rep_max_size = OUT_MAXREPSIZE, .bc_req_portal = OUT_PORTAL, .bc_rep_portal = MDC_REPLY_PORTAL, },
On OST-side, it is:
.psc_buf = { .bc_nbufs = OST_NBUFS, .bc_buf_size = OUT_BUFSIZE, .bc_req_max_size = OUT_MAXREQSIZE, .bc_rep_max_size = OUT_MAXREPSIZE, .bc_req_portal = OUT_PORTAL, .bc_rep_portal = OSC_REPLY_PORTAL, },
For the case that both MDT and OST runs on the same physical server node (especially for VM environment testing), when OSP wants to talk with OST via OUT_PORTAL, the OUT RPC maybe handled by MDT-side OUT RPC service thread unexpected, and replied via MDC_REPLY_PORTAL, instead of OSC_REPLY_PORTAL on which the OSP is waiting for the reply. Then caused the OSP-side OUT RPC timeout and resend again and again.
The bad case also can happen when OSP wants to talk with MDT via OUT_PORTAL.
Because NDE I has already used the OUT RPC for talking among MDTs. To be compatible with the old version, we cannot change the MDT-side OUT RPC reply portal. So we have to chance OST-side OUT RPC reply portal to "MDC_REPLY_PORTAL". But it is strange for OST-side to use MDT-side reply portal.
2. The OUT RPC version is fixed on "LUSTRE_MDS_VERSION", in spite of the RPC is to MDT or to OST. Also confused others. We can re-define "tgt_out_handlers". But it may break the policy of Unified Target.
3. Pack multiple idempotent sub-requests into single OUT RPC. In general, the OUT RPC should not assume that the sub-requests are related with each other. So even if one sub-request failed to be executed, the others should not be ignored. But in current implementation, it is not. If the other sub-requests are not related with the failed one, then such behavior is unexpected. Unfortunately, it is not easy to judge whether one sub-request is related with the others within current OUT request format, especially consider to be compatible with DNE I.
4. Iteration via OUT. I found some client-side iteration framework in osp_md_object.c, but seems no server side handler. Do we have any plan to support that?
Attachments
Issue Links
- is blocked by
-
LU-7318 OUT: dynamic reply buffer
-
- Resolved
-
-
LU-7319 OUT: continue updates processing upon an error
-
- Open
-
- is blocking
-
LU-4009 Add ZIL support to osd-zfs
-
- Open
-
- is related to
-
LU-17818 LMR: Lustre Metadata Redundancy
-
- Open
-
-
LU-4690 sanity test_4: Expect error removing in-use dir /mnt/lustre/remote_dir
-
- Resolved
-
-
LU-12310 MDT Device-level Replication/Mirroring
-
- Open
-
-
LU-7426 DNE3: improve llog format for remote update llog
-
- Open
-
-
LU-7427 DNE3: multiple entries for BATCHID
-
- Open
-
- is related to
-
LU-3467 Unified request handler on OST
-
- Resolved
-
-
LU-3539 Change update RPC format
-
- Resolved
-
-
LU-7426 DNE3: improve llog format for remote update llog
-
- Open
-
I just checked current master code, which seems not resolved yet, not sure in Nasf's patches. For DNE, it always fail immediately, which is good enough even for DNE2. For LFSCK, is this only for read-only updates like getattr? Hmm, there is padding in OSP update request
We can add the flag there.