[LU-350] port bug24050 to master("lustre_start" caused client nodes failed to mount.) Created: 20/May/11  Updated: 18/Aug/11  Resolved: 18/Aug/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: Lustre 2.1.0

Type: Bug Priority: Minor
Reporter: Hongchao Zhang Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Bugzilla ID: 24,050
Rank (Obsolete): 4927

 Description   

for newly formatted Lustre system, the MDT must be started before OST, or it will cause clients can't mount the Lustre
for the filesystem configuration will be erased when the MDT registers itself in MGS. currently , the script tool "lustre_start"
uses the wrong order (it start OST before MDT) and cause clients can't mount.



 Comments   
Comment by Brian Murrell (Inactive) [ 20/May/11 ]

So, is the bug here the wrong order being used in lustre_start or is the actual bug the requirement that the MDT be mounted first for a newly formatted filesystem?

To be sure, one has always been able to mount the servers in any order (for non-newly formatted filesystems at least) but I don't recall if that flexibility also applies/applied to newly formatted filesystems. We probably need somebody with a clearer picture (and/or memory) than I to chime in with an opinion on that in order to determine which is the real bug here.

Comment by Peter Jones [ 07/Jun/11 ]

http://review.whamcloud.com/#change,595

Comment by Hongchao Zhang [ 14/Jun/11 ]

paste comments in Gerrit for future reference

Oleg Drokin:

Huh? I think the entire point of starting MGS first then OSTs was so that MDS will start and be able to connect to all OSTs.

Why would MDT erase any config on MGS at all? That sounds like a different problem that needs to be solved differently I think.

Yu Jian:

I also doubted this in bug 24050 comment #30, #48, #51, #57 and got a partial answer from Johann in comment #55. I think the fix of lustre_start utility was just a workaround, not fixing the Lustre issue. Lustre really supported the starting order of MGS->OST->MDT before. So, Hongchao, could you please investigate this? We need figure out the real issue.

Oleg Drokin:

The comment in bugzilla indicates that this order only needs to happen on first mount which I guess I can believe even though it is still somewhat strange that first MDT connect would wipe config data.

Perhaps we just need to incorporate a real mount in MGS->MDT->OST) order in our formatting scripts instead?

Comment by Hongchao Zhang [ 21/Jun/11 ]

this problem is introduced by the patch in bug22464, which add "writeconf" option during formatting the disk, but it only cause problem in b1_8,
in master, there is another condition "mti->mti_stripe_index == 0" to determine whether the whole log can be erased.

but I try to start Lustre by MGS-OST-MDS order at master, it still fails for there are bugs in "mgs_steal_llog_for_mdt_from_client", the problem is
the "index" and "svname" of the OST are missed when MDT call "mgs_steal_llog_for_mdt_from_client" to add the corresponding OSCs into its LOV.
Lustre mounts successfully after applying the following patch

diff --git a/lustre/mgs/mgs_llog.c b/lustre/mgs/mgs_llog.c
index a8bb398..1ce2f85 100644
— a/lustre/mgs/mgs_llog.c
+++ b/lustre/mgs/mgs_llog.c
@@ -1124,7 +1124,12 @@ static int mgs_steal_llog_handler(struct llog_handle *llh,
marker = lustre_cfg_buf(lcfg, 1);
if (!strncmp(marker->cm_comment,"add osc",7) &&
(marker->cm_flags & CM_START)){
+ char *osc_svname;
+ name_create(&osc_svname, marker->cm_tgtname, "");
+
got_an_osc_or_mdc = 1;
+ strncpy(tmti->mti_svname, osc_svname,
+ sizeof(tmti->mti_svname));
rc = record_start_log(obd, &mdt_llh, mti->mti_svname);
rc = record_marker(obd, mdt_llh, fsdb, CM_START,
mti->mti_svname,"add osc(copied)");
@@ -1208,6 +1213,7 @@ static int mgs_steal_llog_handler(struct llog_handle *llh,
name_create_mdt_and_lov(&logname, &lovname, fsdb, mti->mti_stripe_index);
sprintf(mdt_index, "MDT%04x", mti>mti_stripe_index);

+ sscanf(lustre_cfg_buf(lcfg, 2), "%d", &tmti->mti_stripe_index);
mgs_write_log_osc_to_lov(obd, fsdb, tmti, logname,
mdt_index, lovname,
LUSTRE_SP_MDT, 0);

Comment by Build Master (Inactive) [ 08/Aug/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #246
LU-350 fix bug in mgs to allow starting OST before MDT

Oleg Drokin : 3227436e11b4bc77ffd261e8f13adf905fae2353
Files :

  • lustre/mgs/mgs_llog.c
Comment by Build Master (Inactive) [ 08/Aug/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #246
LU-350 fix bug in mgs to allow starting OST before MDT

Oleg Drokin : 3227436e11b4bc77ffd261e8f13adf905fae2353
Files :

  • lustre/mgs/mgs_llog.c
Comment by Build Master (Inactive) [ 08/Aug/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #246
LU-350 fix bug in mgs to allow starting OST before MDT

Oleg Drokin : 3227436e11b4bc77ffd261e8f13adf905fae2353
Files :

  • lustre/mgs/mgs_llog.c
Comment by Build Master (Inactive) [ 08/Aug/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #246
LU-350 fix bug in mgs to allow starting OST before MDT

Oleg Drokin : 3227436e11b4bc77ffd261e8f13adf905fae2353
Files :

  • lustre/mgs/mgs_llog.c
Comment by Build Master (Inactive) [ 08/Aug/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #246
LU-350 fix bug in mgs to allow starting OST before MDT

Oleg Drokin : 3227436e11b4bc77ffd261e8f13adf905fae2353
Files :

  • lustre/mgs/mgs_llog.c
Comment by Build Master (Inactive) [ 08/Aug/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #246
LU-350 fix bug in mgs to allow starting OST before MDT

Oleg Drokin : 3227436e11b4bc77ffd261e8f13adf905fae2353
Files :

  • lustre/mgs/mgs_llog.c
Comment by Build Master (Inactive) [ 08/Aug/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #246
LU-350 fix bug in mgs to allow starting OST before MDT

Oleg Drokin : 3227436e11b4bc77ffd261e8f13adf905fae2353
Files :

  • lustre/mgs/mgs_llog.c
Comment by Build Master (Inactive) [ 08/Aug/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #246
LU-350 fix bug in mgs to allow starting OST before MDT

Oleg Drokin : 3227436e11b4bc77ffd261e8f13adf905fae2353
Files :

  • lustre/mgs/mgs_llog.c
Comment by Build Master (Inactive) [ 08/Aug/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #246
LU-350 fix bug in mgs to allow starting OST before MDT

Oleg Drokin : 3227436e11b4bc77ffd261e8f13adf905fae2353
Files :

  • lustre/mgs/mgs_llog.c
Comment by Build Master (Inactive) [ 08/Aug/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #246
LU-350 fix bug in mgs to allow starting OST before MDT

Oleg Drokin : 3227436e11b4bc77ffd261e8f13adf905fae2353
Files :

  • lustre/mgs/mgs_llog.c
Comment by Build Master (Inactive) [ 08/Aug/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #246
LU-350 fix bug in mgs to allow starting OST before MDT

Oleg Drokin : 3227436e11b4bc77ffd261e8f13adf905fae2353
Files :

  • lustre/mgs/mgs_llog.c
Comment by Build Master (Inactive) [ 08/Aug/11 ]

Integrated in lustre-master » i686,server,el5,ofa #246
LU-350 fix bug in mgs to allow starting OST before MDT

Oleg Drokin : 3227436e11b4bc77ffd261e8f13adf905fae2353
Files :

  • lustre/mgs/mgs_llog.c
Comment by Build Master (Inactive) [ 08/Aug/11 ]

Integrated in lustre-master » i686,client,el5,ofa #246
LU-350 fix bug in mgs to allow starting OST before MDT

Oleg Drokin : 3227436e11b4bc77ffd261e8f13adf905fae2353
Files :

  • lustre/mgs/mgs_llog.c
Comment by Build Master (Inactive) [ 08/Aug/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #246
LU-350 fix bug in mgs to allow starting OST before MDT

Oleg Drokin : 3227436e11b4bc77ffd261e8f13adf905fae2353
Files :

  • lustre/mgs/mgs_llog.c
Generated at Sat Feb 10 01:06:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.