[LU-3356] LBUG LustreError: 3202:0:(mds_open.c:1494:mds_mfd_close()) ASSERTION(pending_child->d_inode != NULL) failed Created: 18/May/13 Updated: 25/Nov/14 Resolved: 25/Nov/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Frederik Ferner (Inactive) | Assignee: | Zhenyu Xu |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 8309 |
| Description |
|
We have now had the same LBUG twice in one month on the MDS for one of our Lustre file systems. The error in syslog on the MDS is this:
[bnh65367@cs04r-sc-mds03-02 ~]$ cat /proc/fs/lustre/version This version has been running on these MDS without any problems for quite some time now. I'm not entirely sure without checking why we are running this version but I believe it contains a fix for one issue we have seen frequently. Unfortunately we have so far not been able to identify any reproducer etc but after the LBUG until the fail-over today at least 4 clients were hanging on every access to the file system, other clients were fine. The logs are still available and we can upload them if it helps. |
| Comments |
| Comment by Peter Jones [ 19/May/13 ] |
|
Bobijam Could you please advise on this one? Thanks Peter |
| Comment by Zhenyu Xu [ 20/May/13 ] |
|
please upload the logs. |
| Comment by Dave Bond (Inactive) [ 21/May/13 ] |
|
/var/log/messages from server cs04r-sc-mds03-02 |
| Comment by Dave Bond (Inactive) [ 21/May/13 ] |
|
Lustre log files for cs04r-sc-mds03-02 |
| Comment by Zhenyu Xu [ 22/May/13 ] |
|
patch tracking at http://review.whamcloud.com/6412 |
| Comment by Frederik Ferner (Inactive) [ 30/May/13 ] |
|
I noticed the patch fails very early (in lustre-initialization-1) and the last update has been a while ago. We have a maintenance window coming up next week. If there is a patch we should start testing at least on our test file systems and maybe on the affected file systems, it would be good to have this by then. Thanks, |
| Comment by Zhenyu Xu [ 30/May/13 ] |
|
the test failure is due to TT-1072 issue, I think you can test with this patch. |
| Comment by Peter Jones [ 30/May/13 ] |
|
Frederik The TT project is not open because it tracks configuration issues in our test lab. So, the failure itself means that the verification testing has not yet taken place rather than there is a problem with the patch. Peter |
| Comment by Peter Jones [ 25/Nov/14 ] |
|
Frederik I think that this issue is no longer relevant since your upgrade to 2.5.x Peter |