[LU-5107] MDS oops during mount with latest lustre 2.5.1 snapshot Created: 27/May/14 Updated: 30/May/14 Resolved: 30/May/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.1 |
| Fix Version/s: | Lustre 2.5.2 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | James A Simmons | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
MDS server |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 14091 | ||||||||
| Description |
|
With the latest 2.5.1 snapshot when I attempt to bring up a file system I'm seeing the following bug on the MDS during the MDT mount. Because of this I can' currently mount a 2.5 file system for testing. May 27 16:55:19 tick-dne-mds1 kernel: [ 546.512335] LustreError: 13869:0:(osp_dev.c:864:osp_prepare_fid_client()) ASSERTION( osp->opd |
| Comments |
| Comment by James Nunez (Inactive) [ 27/May/14 ] |
|
Di, Would you please comment on this ticket? Thank you, |
| Comment by Di Wang [ 27/May/14 ] |
|
James, Did you setup lustre with single MDT or DNE? Are there any other console error message? Could you tell me which build are you using? It is a new formatted FS? Do you have the dump log for this LBUG? Thank you. |
| Comment by James A Simmons [ 28/May/14 ] |
|
I tried a build with a few extra patches. Then I tried the tip of b2_5 and it was the same problem. Yes it is a DNE setup with 3 MDS servers. When I encountered this error I was using a already formatted 2.5 file system. I later reformatted to make sure that was not the issue but the MDS oops was still there. I found that reverting |
| Comment by Andreas Dilger [ 28/May/14 ] |
|
James, there are two patches on
Which one did you revert to fix the problem? |
| Comment by James A Simmons [ 28/May/14 ] |
|
I reverted patch 8997. |
| Comment by Di Wang [ 28/May/14 ] |
|
Hmm, there are some problems for 8997 when port it to 2.5. Since we do not need OSP(for MDT) to allocate FID, so osp_prepare_fid_client(d) needs to be moved after if (d->opd_connect_mdt) check in osp_import_event. diff --git a/lustre/osp/osp_dev.c b/lustre/osp/osp_dev.c
index a4a2f90..15f2ec0 100644
--- a/lustre/osp/osp_dev.c
+++ b/lustre/osp/osp_dev.c
@@ -1053,15 +1053,16 @@ static int osp_import_event(struct obd_device *obd, struct obd_import *imp,
case IMP_EVENT_ACTIVE:
d->opd_imp_active = 1;
- if (osp_prepare_fid_client(d) != 0)
- break;
-
if (d->opd_got_disconnected)
d->opd_new_connection = 1;
d->opd_imp_connected = 1;
d->opd_imp_seen_connected = 1;
if (d->opd_connect_mdt)
break;
+
+ if (osp_prepare_fid_client(d) != 0)
+ break;
+
wake_up(&d->opd_pre_waitq);
__osp_sync_check_for_work(d);
CDEBUG(D_HA, "got connected\n");
probably fix the problem, I will cook a patch. |
| Comment by Di Wang [ 28/May/14 ] |
| Comment by James A Simmons [ 29/May/14 ] |
|
The patch appears to have resolved the issue. Thank you. |
| Comment by Andreas Dilger [ 29/May/14 ] |
|
Problem was caused by backport of patch http://review.whamcloud.com/9875 to b2_5. |
| Comment by Peter Jones [ 30/May/14 ] |
|
Landed for 2.5.2. As I understand it, this issue only affected b2_5 so is not needed on other branches |