-
Bug
-
Resolution: Fixed
-
Medium
-
None
-
None
-
3
-
9223372036854775807
lustre_start_mgc() may leave mgc obd with non-initialized cl_mgc_mgsexp if connect to MGS has failed.
rc = obd_connect(NULL, &exp, obd, uuid, data, NULL);
if (rc) {
CERROR("connect failed %d\n", rc);
GOTO(out, rc);
}
...
obd->u.cli.cl_mgc_mgsexp = exp; <-- skipped if rc != 0
out:
/*
* Keep the MGC info in the sb. Note that many lsi's can point
* to the same mgc.
*/
lsi->lsi_mgc = obd; <- while OBD is kept in lsi
out_free:
mutex_unlock(&mgc_start_lock);
The mount process will exit with error but any other parallel mount waiting on mgc_start_lock will get access to that MGC OBD if it is not yet cleaned up:
obd = class_name2obd(mgcname);
if (obd && !obd->obd_stopping) {
...
/* Re-using an existing MGC */
atomic_inc(&obd->u.cli.cl_mgc_refcount);
and exit from own call to lustre_start_mgc() without error and without export in MGC, so will continue with mount. This can cause crashes like below:
[ 184.387310] BUG: unable to handle kernel NULL pointer dereference at 0000000000000190 ... [ 184.393875] RIP: 0010:server_lsi2mti+0x118/0x7b0 [obdclass]
when code is accessing cl_mgc_mgsexp assuming it is always set.
- is related to
-
LU-13466 BUG: unable to handle kernel NULL pointer dereference in class_exp2cliimp
-
- Resolved
-