[LU-6284] FLD read is not swabbed correctly Created: 25/Feb/15 Updated: 14/Mar/17 Resolved: 10/Sep/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | llnl, ppc | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 17615 | ||||||||||||||||
| Description |
|
The RQF_FLD_READ RPC reply format is defined with "generic_data" (RQF_GENERIC_DATA) as the second field, but in fact this is always struct lu_seq_range_array in the two places that I see it used. Since RQF_FLD_READ does not have any swabber associated with generic data, the client will not swab the FLD reply data. That doesn't cause a problem for current code, because it always assumes MDT0000 for any FID lookup, and we haven't tested DNE + PPC client systems yet, nor do we have mixed-endian MDS/OSS combinations. However, this will break for DNE + PPC clients and should be fixed. |
| Comments |
| Comment by Andreas Dilger [ 01/Oct/15 ] |
|
This is likely causing problems for ppc_review-dne-part-[12] testing. |
| Comment by Peter Jones [ 12/Aug/16 ] |
|
Fan Yong Could you please help with this one? Thanks Peter |
| Comment by nasf (Inactive) [ 13/Aug/16 ] |
|
I have checked the current master and found that both the FLD-server and FLD-client have handled "lu_seq_range_array" properly as following: On the server side: int fld_server_read(const struct lu_env *env, struct lu_server_fld *fld,
struct lu_seq_range *range, void *data, int data_len)
{
...
if (rc > 0)
rc = 0;
out:
==> range_array_cpu_to_le(lsra, lsra);
...
}
On the client side: int fld_update_from_controller(const struct lu_env *env,
struct lu_server_fld *fld)
{
...
do {
rc = fld_client_rpc(fld->lsf_control_exp, range, FLD_READ,
&req);
if (rc != 0 && rc != -EAGAIN)
GOTO(out, rc);
LASSERT(req != NULL);
lsra = (struct lu_seq_range_array *)req_capsule_server_get(
&req->rq_pill, &RMF_GENERIC_DATA);
if (lsra == NULL)
GOTO(out, rc = -EPROTO);
==> range_array_le_to_cpu(lsra, lsra);
...
}
That means before the server replying FLD_READ RPC, it will convert the "lu_seq_range_array" as little-endian mode via "range_array_cpu_to_le()". And when the client receives the FLD_READ reply, it will converts the little-endian mode "lu_seq_range_array" to CPU mode via "range_array_le_to_cpu()". So there should be no cross-platforms interoperability trouble. |
| Comment by Andreas Dilger [ 14/Aug/16 ] |
|
Is there a reason we don't just register range_array_le_to_cpu() as the swabber for this buffer so that it is swabbed like normal RPC buffers as part of the RPC handling? |
| Comment by nasf (Inactive) [ 14/Aug/16 ] |
|
Honestly, I do not know the original reason for why not registered a swabber() as other RPCs do. Do you think it is good time to adjust that now? |
| Comment by Andreas Dilger [ 15/Aug/16 ] |
|
Hmm, it seems that range_array_le_to_cpu() would not be a proper function for swabbing the buffer, since this would convert the array to little endian instead of host endian. If this is registered as the swabbed function it would need to always swab the array if it is called, but it would only be called if swabbing is needed. |
| Comment by nasf (Inactive) [ 15/Aug/16 ] |
|
Yes, the current implementation will transfer the buffer as little endian on the network, and convert it as host endian in RAM when handle it. Comparing with the registered swabber solution that is only called when necessary, current implementation is NOT efficient, anyway it can work. If we replace it with the registered swabber as other RPCs do, that may cause some interoperability trouble if the peer (different host endian) still uses range_array_{le,cpu}to{cpu,le}. I think that if there will be some other on-wire protocol changes in the future, then it will be good time-point to adjust it. |
| Comment by Andreas Dilger [ 15/Aug/16 ] |
|
If we have only ever supported little-endian servers, then the only clients that would need to swab would be PPC clients, so I don't think this should cause any problems... |
| Comment by nasf (Inactive) [ 15/Aug/16 ] |
|
I may found why we did not registered a swabber() for 'FLD_READ' RPC. In theory, we can do that. But from the implementation view, it is not easy to be done within our current "struct req_msg_field" framework. Because the sequence range array is not fixed length, instead, its length depends on the 'lu_seq_range' count, that is unknown when prepare the RPC buffer. Generally, for such flexible length RPC usage, there will be a field in the RPC layout to indicate the data length. But for 'FLD_READ' RPC, we have no way to indicate the length unless we add new length filed that will broken the on-wire RPC protocol. |
| Comment by Andreas Dilger [ 02/Sep/16 ] |
|
I think the only thing that may be needed here is to add a comment at the swabbed function and at the struct definition that includes the explanation here about what is happening and why. |
| Comment by nasf (Inactive) [ 04/Sep/16 ] |
|
Here is the patch on master: |
| Comment by Gerrit Updater [ 10/Sep/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22309/ |
| Comment by Peter Jones [ 10/Sep/16 ] |
|
Landed for 2.9 |