[LU-3239] ofd_internal.h:518:ofd_info_init()) ASSERTION( info ) Created: 28/Apr/13 Updated: 02/May/13 Resolved: 02/May/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Minh Diep | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | LB | ||
| Environment: |
https://maloo.whamcloud.com/test_sets/0e21f9ba-afec-11e2-b8a3-52540035b04c |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 7954 | ||||||||
| Description |
|
Lustre: DEBUG MARKER: == parallel-scale test simul: simul == 22:08:33 (1367125713) Call Trace: LustreError: dumping log to /tmp/lustre-log.1367125716.19182 Call Trace: LustreError: dumping log to /tmp/lustre-log.1367125719.19190 I set nrs on mds to crrn, oss02 to orr, oss03 to trr and run parallel-scale |
| Comments |
| Comment by Nathaniel Clark [ 29/Apr/13 ] |
|
Possibly related similar stack trace but from next ASSERTION line (sanityn test_71): https://maloo.whamcloud.com/test_sets/4c4286e0-b0de-11e2-bece-52540035b04c |
| Comment by Di Wang [ 29/Apr/13 ] |
|
It seems to me, NRS makes get_info step into OFD before it is initialized. |
| Comment by Andreas Dilger [ 29/Apr/13 ] |
|
Alex, should the OFD environment already be initialized by this point? Is there something that can easily be done to avoid this problem? |
| Comment by Di Wang [ 29/Apr/13 ] |
|
IMHO, we can simply add lu_env_refill in ofd_get_info. I will cook a patch now. |
| Comment by Di Wang [ 29/Apr/13 ] |
| Comment by Minh Diep [ 30/Apr/13 ] |
|
Wang Di, I hit this after running your patch general protection fault: 0000 1 SMP ^M |
| Comment by Mikhail Pershin [ 30/Apr/13 ] |
|
This is not about missed keys but non-initialized context at all because obd_get_info() is called not from ptlrpc_server_handle_request(). We need all lu_context setup things in ptlrpc_server_handle_req_in like that is done in ptlrpc_server_handle_request() since ptlrpc_server_handle_req_in() can cause calls to device stack now. |
| Comment by Di Wang [ 30/Apr/13 ] |
|
yes, I forgot this is in handle_req_in, instead of handle_request. |
| Comment by Di Wang [ 30/Apr/13 ] |
|
I just updated the patch. To avoid the hassle in handle_req_in, ofd_get_info(KEY_FIEMAP) will initialize the env itself, since ofd_get_info(KEY_FIEMAP) is the only use case from handle_req_in, adding env initialization in handle_req_in might not worth for now. |
| Comment by Mikhail Pershin [ 30/Apr/13 ] |
|
I tend to agree, if this is the only single case. |
| Comment by Alex Zhuravlev [ 30/Apr/13 ] |
|
we do already have env in ptlrpc_main().. |
| Comment by Di Wang [ 30/Apr/13 ] |
|
well, those envs might be initialized too early before ofd stack setup. that is why we do env_refill in ost_handle? |
| Comment by Alex Zhuravlev [ 30/Apr/13 ] |
|
env_refill() is fine, but recreation env from scratch on potentially every read/write is not, i think. |
| Comment by Di Wang [ 30/Apr/13 ] |
|
hmm, only for ofd_get_info, because get_info is the only use case from handle_request_in, where env(le_ses) is not initialized correctly. so I figured initializing env(le_ses) for all requests in req_in might not worth. Only do this env_init in ofd_get_info should be enough. Besides we can not use env_refill here, because le_ses might be in some uninitialized state, you can see from Minh's test. Hmm, I did not see ofd_get_info is called in the process of read/write. Do I miss sth? |
| Comment by Alex Zhuravlev [ 30/Apr/13 ] |
|
get_info(KEY_FIEMAP) is used to order requests with regard physical offset, AFAIK. |
| Comment by Di Wang [ 30/Apr/13 ] |
|
hmm, yes, probably env_refill should be better. I change it back to env_refill here in the update patch. |
| Comment by Andreas Dilger [ 01/May/13 ] |
|
Please do not close this bug until a test that runs at least basic NRS policy functionality. |
| Comment by Peter Jones [ 02/May/13 ] |
|
Landed for 2.4 |