[LU-7453] osp_sync_interpret assertion Created: 20/Nov/15 Updated: 11/Feb/16 Resolved: 11/Feb/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Jesse Hanley | Assignee: | Alex Zhuravlev |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
2.5.4-2.6.32_504.30.3.el6.atlas.x86_64.x86_64 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Wednesday morning one of our production MDS nodes hit an assertion: {{ Is this related to https://jira.hpdd.intel.com/browse/LU-5629 ? |
| Comments |
| Comment by Jian Yu [ 20/Nov/15 ] |
|
Hi Dmitry, It looks like the assertion and back traces in this ticket are the same as those in Could you please take a look at this ticket as well? Thank you. |
| Comment by Alex Zhuravlev [ 07/Dec/15 ] |
|
is it possible to get any logs for the case? the most valuable information at the moment is info on RPC (was it OST_DESTROY or OST_SETATTR). |
| Comment by Jian Yu [ 08/Dec/15 ] |
|
Hi Jesse, Does this assertion still occur? Is it possible to gather Lustre debug logs required by Alex above? Thank you. |
| Comment by Jesse Hanley [ 10/Dec/15 ] |
|
Hey Alex and Jian, Sorry for not responding sooner. We haven't seen this assertion since I submitted this. I don't have any debug logs from when it happened, but I do have a vmcore dump. I'll take a look and see if I can find anything relevant in it (not sure if it'll be there). |
| Comment by Jian Yu [ 10/Dec/15 ] |
|
Thank you very much, Jesse. |
| Comment by Jesse Hanley [ 10/Dec/15 ] |
|
From the backtrace, it looks like ptlrpcd called ptlrpcd_check, which called ptlrpc_check_set, which finally called osp_sync_interpret. If I followed the code correctly (very likely that I didn't - I'm not that familiar with the internals of Lustre), it looks like ptlrpc_req_interpret is only called from ptlrpc_set_destroy and ptlrpc_check_set. Am I right in assuming that ptlrpc_set_destroy corresponds to an OST_DESTROY and ptlrpc_check_set corresponds to a OST_SETATTR call? If so, does that mean this is associated with an OST_SETATTR call? Sorry if this isn't much help; I'm still learning how to dig around core dumps. |
| Comment by Jian Yu [ 17/Dec/15 ] |
|
Hi Alex, |
| Comment by Alex Zhuravlev [ 17/Dec/15 ] |
|
there is a ticket with a binary and instructions: https://bugzilla.lustre.org/show_bug.cgi?id=13155 |
| Comment by Jesse Hanley [ 07/Jan/16 ] |
|
I've uploaded the log. Sorry for the delay. I've never seen this extension for crash before. Thanks for the info! – |
| Comment by Patrick Farrell (Inactive) [ 07/Jan/16 ] |
|
Jesse - Just FYI, those ptlrpc functions aren't related to the higher level operations you're describing. They both act on sets of RPCs, so they don't have anything to do with SETATTR or DESTROY.
|
| Comment by Jian Yu [ 13/Jan/16 ] |
|
Hi Alex, Jesse has uploaded the logs. Could you please investigate and suggest? Thank you. |
| Comment by Alex Zhuravlev [ 14/Jan/16 ] |
|
it was a resend for OST_DESTROY, though I have no a good idea for the root cause yet.. |
| Comment by Alex Zhuravlev [ 04/Feb/16 ] |
|
yes, we can add a bit more debug.. which target branch is supposed? |
| Comment by Jian Yu [ 04/Feb/16 ] |
|
Thank you, Alex. The server version is Lustre 2.5.4 (2.5.4-2.6.32_504.30.3.el6.atlas.x86_64.x86_64). |
| Comment by Alex Zhuravlev [ 04/Feb/16 ] |
|
any additional patches on top of that? |
| Comment by Peter Jones [ 11/Feb/16 ] |
|
As per ORNL ok to close - this just happened one time a long time ago |