Hi, Peter, in my former patches, I was trying to unpack the uid and gid from ptlrpc request to avoid change the protocol. When I look into the bugs that Mahmoud Hanafi mentioned, I noticed that, ptlrpc_request::rq_pill is initialized in ptlrpc_service_ops::so_hpreq_handler. However not every request use so_hpreq_handler.
mdt/mdt_mds.c: .so_hpreq_handler = ptlrpc_hpreq_handler,
mdt/mdt_mds.c: .so_hpreq_handler = NULL,
mdt/mdt_mds.c: .so_hpreq_handler = NULL,
mdt/mdt_mds.c: .so_hpreq_handler = NULL,
mdt/mdt_mds.c: .so_hpreq_handler = NULL,
mdt/mdt_mds.c: .so_hpreq_handler = NULL,
ost/ost_handler.c: .so_hpreq_handler = ptlrpc_hpreq_handler,
ost/ost_handler.c: .so_hpreq_handler = tgt_hpreq_handler,
ost/ost_handler.c: .so_hpreq_handler = NULL,
ost/ost_handler.c: .so_hpreq_handler = NULL,
That means, not every ptlrpc_request:rq_pill has been initialized before it is used in nrs_tbf.c. When the rq_pill is not initialized, using req_capsule_client_get() will trigger a LBUG. Ptlrpc requests from client to MDS, belongs to this case, So it caused a LBUG.
I think adding the uid and gid to ptlrpc protocol can easily address this issue and it is a better way to keep the logic simpler and easier to understand. Although it will introduce some duplication field in the ptlrpc request, and break the protocol, I think it is worth doing so.
. Our idea is as follows:
1. Add uid and gid to struct ptlrpc_body_v3 similar as jobid.
2. Combine the process logic of uid, gid, jobid together.
3. Handle uid and gid in nrs_tbf similar as jobid.
I uploaded the patch at https://review.whamcloud.com/#/c/27608/. And it need further improvement in coding to better merge the codes of jobid and uid/gid.
Hi, Andreas. I analyzed the ptlrpc codes and its processing logic on server side(both mdt and ost) these days. Initializing rq_pill in requests is a feasible solution, although doing this is tiring. I have done the related codes without breaking the original ptlrpc handling logic, and I have uploaded the patch.
In TBF, I think, we do not need limit every request. Only limiting the requests that have great influence on performance is enough (such as read, write, getattr, setattr etc). For others, (such connect, statfs, quotactl, ping), we might just let them go. Besides, if we need to control every request from client, I think, modifying the request protocol is inevitable and is also favorable in the long run.