[LU-1432] Race condition between lprocfs_exp_setup() and lprocfs_free_per_client_stats() causes LBUG Created: 23/May/12 Updated: 29/Nov/12 Resolved: 02/Aug/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.3.0, Lustre 2.1.3 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Hiroya Nozaki | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
FEFS based on Lustre 1.8.5, RIKEN K-computer |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 4541 |
| Description |
|
We've often happened to see the below LBUG until now, nevertheless we already applied the putch of LASSERTF(atomic_read(&client_stat->nid_exp_ref_count) == 0,
"nid %s:count %d\n", libcfs_nid2str(client_stat->nid),
atomic_read(&client_stat->nid_exp_ref_count));
Then, today, I finally found out the race condition between lprocfs_exp_setup() and lprocfs_free_per_client_stats(). And as far as I can see, it appears that the LBUG could be the case in Lustre-1.8 and Lustre-2.x. So I decided to report it here. the account of the case is below.
int target_handle_connect(struct ptlrpc_request *req, svc_handler_t handler)
{
...
if (!target || target->obd_stopping || !target->obd_set_up) {
LCONSOLE_ERROR_MSG(0x137, "UUID '%s' is not available "
" for connect (%s)\n", str,
!target ? "no target" :
(target->obd_stopping ? "stopping" :
"not set up"));
GOTO(out, rc = -ENODEV);
}
...
}
int lprocfs_exp_setup(struct obd_export *exp, lnet_nid_t *nid, int *newnid)
{
...
old_stat = lustre_hash_findadd_unique(obd->obd_nid_stats_hash,
nid, &new_stat->nid_hash);
CDEBUG(D_INFO, "Found stats %p for nid %s - ref %d\n",
old_stat, libcfs_nid2str(*nid),
atomic_read(&new_stat->nid_exp_ref_count));
/* we need to release old stats because lprocfs_exp_cleanup() hasn't
* been and will never be called. */
if (exp->exp_nid_stats) {
nidstat_putref(exp->exp_nid_stats);
exp->exp_nid_stats = NULL;
}
...
}
You can cause the same problem when you add some waiting code into the part of lprocfs_exp_setup() and execute umount command. in this way, I've actually caused the problem in purpose. and now, I'm creating the patch to the issue and I'll upload that soon but it's for 1.8.8. |
| Comments |
| Comment by Hiroya Nozaki [ 23/May/12 ] |
| Comment by Hiroya Nozaki [ 06/Jun/12 ] |
|
I added some spin_trylock() in the above patch to fix the problem, but I realized lustre_hash_findadd_unique() calls kmalloc() its inside. so I have to change these into something resembled. It's easy to fix if I change spin locks into semaphores and add a new semaphore into struct obd_device. But I wonder I'm allowed to do so because I know the struct is a really basic and important structure. |
| Comment by Hiroya Nozaki [ 11/Jun/12 ] |
|
I had wondered why filter_cleanup() is executed while some export objects of obdfilter exist, and finally I realized that this is not a problem of race condition but a problem of refcount of export object, almost same as |
| Comment by Hiroya Nozaki [ 19/Jun/12 ] |
|
Hi, Lai. this is the debug log you demanded. pid 15133 is ll_ost thread and pid 15400 is umount thread. the environment is the below Finally, umount-thread continues to display the sort of messages like this and never return. (obd_mount.c:1401:server_wait_finished()) Mount still busy with 7 refs after 90 secs. so, I think some request remains and |
| Comment by Lai Siyao [ 19/Jun/12 ] |
|
IMO this is not a bug, please see below log snippet: 00000020:00020000:10:1340084622.090102:0:15134:0:(lprocfs_status.c:1733:lprocfs_exp_setup()) ===== WAIT END ===== ... 00002000:00000001:2:1340084622.090148:0:15134:0:(filter.c:2554:filter_reconnect()) Process leaving (rc=0 : 0 : 0) 00010000:00000001:2:1340084622.090149:0:15134:0:(obd_class.h:775:obd_reconnect()) Process leaving (rc=0 : 0 : 0) 00010000:00000040:2:1340084622.090150:0:15134:0:(ldlm_lib.c:1020:target_handle_connect()) GETting export ffff81063ed80e00 : new refcount 2 ... 00000100:00000040:2:1340084622.090167:0:15134:0:(connection.c:98:ptlrpc_connection_get()) conn=ffff8105f150cac0 refcount 3 to 192.168.128.38@o2ib 00000001:00000001:2:1340084622.090169:0:15134:0:(class_hash.c:238:lustre_hash_add()) Process entered 00000020:00000040:2:1340084622.090170:0:15134:0:(obd_config.c:1518:nid_export_get()) GETting export ffff81063ed80e00 : new refcount 3 00000020:00000001:2:1340084622.090172:0:15134:0:(obd_config.c:1520:nid_export_get()) Process leaving (rc=18446604462556974592 : -139611152577024 : ffff81063ed80e00) 00000001:00000001:2:1340084622.090174:0:15134:0:(class_hash.c:257:lustre_hash_add()) Process leaving The cause of umount hang is that target has run .cleanup (client disconnected already), but the 'connect' is after that, and export is not in nid_hash, so that this export is added into obd_nid_hash. And it takes one refcount of this export, but since server has done .cleanup already, and is waiting in a loop in server_wait_finish(), it will always hang there. However server_wait_finish() checks signals, so by any chance you may send a SIGKILL to this process, and manually do `umount ...` again; it may finish successfully. |
| Comment by Hiroya Nozaki [ 19/Jun/12 ] |
|
But when the case, there could be a case, and actually I've encountered the case, that target_handle_connect() touch the obd_nid_hash which have been already freed by class_cleanup(). So we should remove the race condition this time. and ... I'm afraid to say that I'm still wondering whether the class_export_get() is right or not. The original code, "req->rq_export = export" instead of class_export_get(), looks obviously wrong to me too but it had exited so far, and it has existed in current Lustre-2.1 too. so I think there is a special considering to do so. do you think so? |
| Comment by Hiroya Nozaki [ 19/Jun/12 ] |
|
I added a new member, obd_conn_inprogress into obd_device to count the number of threads which are doing connecting in target_handle_connect(), and do like below pseudo code. so far, it looks working well. So, I'll upload the patch soon. target_handle_connect() {
...
atomic_inc(obd_conn_inprogress);
// do connect
atomic_dec(obd_conn_inprogress);
...
}
class_cleanup() {
...
while (atomic_read(obd_conn_inprogress) > 0) {
cfs_cond_resched();
}
lustre_hash_exit(obd_nid_hash);
...
}
|
| Comment by Lai Siyao [ 19/Jun/12 ] |
|
Hmm, one possible fix for this issue is to serialize target obd device cleanup and connect operation. I'll consider it. As for request not taking export refcount in 2.x, it happens to be correct:
But fix it the way I do is more clear, I will fix it in 2.x later. |
| Comment by Hiroya Nozaki [ 20/Jun/12 ] |
|
I uploaded the patch just now. I know it may not be enough but I'm happy if it help you consider the issue. |
| Comment by Hiroya Nozaki [ 21/Jun/12 ] |
|
Hi, Lai. Jun 21 11:22:54.163835 rx200-040 [ERR.] FEFS 137-5: UUID 'fefs-OST0000_UUID' is not available for connect (stopping) Jun 21 11:22:54.163846 rx200-040 [ERR.] FEFS 4485:0:(ldlm_lib.c:2094:target_send_reply_msg()) @@@ processing error (-19) req@ffff810c37d53400 x1405183678491029/t0 o8-><?>@<?>:0/0 lens 368/0 e 0 st 0 to 0 dl 1340245474 ot 0 ref 1 fl Interpret:/0/0 rc -19/0 I can take the debug log about one of the export objects of fefs-OST0000_UUID. thank you. 88016:00000020:00000040:2:1340264041.083772:0:6336:0:(obd_config.c:1459:uuid_export_get()) GETting export ffff81063d211400 : new refcount 3 88050:00002000:00000040:2:1340264041.083815:0:6336:0:(filter.c:1058:filter_init_server_data()) PUTting export ffff81063d211400 : new refcount 2 90103:00000020:00000040:10:1340264047.516715:0:6078:0:(obd_config.c:1459:uuid_export_get()) GETting export ffff81063d211400 : new refcount 3 90104:00000020:00000001:10:1340264047.516716:0:6078:0:(obd_config.c:1461:uuid_export_get()) Process leaving (rc=18446604462528205824 : -139611181345792 : ffff81063d211400) 90105:00000001:00000001:10:1340264047.516719:0:6078:0:(class_hash.c:467:lustre_hash_lookup()) Process leaving (rc=18446604462528205824 : -139611181345792 : ffff81063d211400) 90107:00010000:00080000:10:1340264047.516723:0:6078:0:(ldlm_lib.c:623:target_handle_reconnect()) connect export for UUID '5daed9bc-4c53-eeb7-ef59-cad4e3ca5ef7' at ffff81063d211400, cookie 0x1c8c621b54ad24a8 90115:00002000:00100000:10:1340264047.516741:0:6078:0:(filter.c:2444:filter_connect_internal()) fefs-OST0000: cli 5daed9bc-4c53-eeb7-ef59-cad4e3ca5ef7/ffff81063d211400 ocd_connect_flags: 0xe3440478 ocd_version: 1080500 ocd_grant: 2097152 ocd_index: 0 90119:00002000:00000020:10:1340264047.516753:0:6078:0:(filter_io.c:201:filter_grant_space_left()) fefs-OST0000: cli 5daed9bc-4c53-eeb7-ef59-cad4e3ca5ef7/ffff81063d211400 free: 17254846464 avail: 17254846464 grant 0 left: 17221083136 pending: 0 90120:00002000:00000020:10:1340264047.516757:0:6078:0:(filter_io.c:261:filter_grant()) fefs-OST0000: cli 5daed9bc-4c53-eeb7-ef59-cad4e3ca5ef7/ffff81063d211400 wants: 2097152 current grant 0 granting: 2097152 90121:00002000:00000020:10:1340264047.516760:0:6078:0:(filter_io.c:266:filter_grant()) fefs-OST0000: cli 5daed9bc-4c53-eeb7-ef59-cad4e3ca5ef7/ffff81063d211400 tot cached:0 granted:2097152 num_exports: 3 90122:00002000:00000020:10:1340264047.516765:0:6078:0:(filter.c:2470:filter_connect_internal()) fefs-OST0000: cli 5daed9bc-4c53-eeb7-ef59-cad4e3ca5ef7/ffff81063d211400 ocd_grant: 2097152 want: 2097152 left: 17221083136 90610:00010000:00000040:2:1340264056.529876:0:6078:0:(ldlm_lib.c:1032:target_handle_connect()) GETting export ffff81063d211400 : new refcount 4 90617:00000020:00000040:2:1340264056.529886:0:6078:0:(obd_config.c:1526:nid_export_get()) GETting export ffff81063d211400 : new refcount 5 90618:00000020:00000001:2:1340264056.529887:0:6078:0:(obd_config.c:1528:nid_export_get()) Process leaving (rc=18446604462528205824 : -139611181345792 : ffff81063d211400) 90693:00000100:00000040:2:1340264056.530005:0:6078:0:(service.c:638:ptlrpc_server_drop_request()) PUTting export ffff81063d211400 : new refcount 4 91015:00000020:00000080:13:1340264061.174390:0:6374:0:(obd_config.c:440:dump_exports()) fefs-OST0000: ffff81063d211400 5daed9bc-4c53-eeb7-ef59-cad4e3ca5ef7 192.168.128.38@o2ib 4 0 0: 0000000000000000 91021:00000020:00000040:13:1340264061.174402:0:6374:0:(genops.c:1223:class_disconnect_export_list()) GETting export ffff81063d211400 : new refcount 5 91022:00000020:00000040:13:1340264061.174404:0:6374:0:(genops.c:1241:class_disconnect_export_list()) GETting export ffff81063d211400 : new refcount 6 91023:00000020:00080000:13:1340264061.174406:0:6374:0:(genops.c:1245:class_disconnect_export_list()) fefs-OST0000: disconnecting export at 192.168.128.38@o2ib (ffff81063d211400), last request at 1340264041 91026:00002000:00000040:13:1340264061.174412:0:6374:0:(filter.c:2772:filter_disconnect()) GETting export ffff81063d211400 : new refcount 7 91048:00000020:00000040:13:1340264061.174452:0:6374:0:(obd_config.c:1537:nid_export_put()) PUTting export ffff81063d211400 : new refcount 6 91049:00000020:00000001:13:1340264061.174454:0:6374:0:(obd_config.c:1539:nid_export_put()) Process leaving (rc=18446604462528205824 : -139611181345792 : ffff81063d211400) 91050:00000001:00000001:13:1340264061.174457:0:6374:0:(class_hash.c:365:lustre_hash_del()) Process leaving (rc=18446604462528205824 : -139611181345792 : ffff81063d211400) 91051:00000020:00000040:13:1340264061.174460:0:6374:0:(lustre_handles.c:143:class_handle_unhash_nolock()) removing object ffff81063d211400 with handle 0x1c8c621b54ad24a8 from hash 91053:00000020:00000040:13:1340264061.174464:0:6374:0:(obd_config.c:1470:uuid_export_put()) PUTting export ffff81063d211400 : new refcount 5 91054:00000020:00000001:13:1340264061.174465:0:6374:0:(obd_config.c:1472:uuid_export_put()) Process leaving (rc=18446604462528205824 : -139611181345792 : ffff81063d211400) 91055:00000001:00000001:13:1340264061.174468:0:6374:0:(class_hash.c:365:lustre_hash_del()) Process leaving (rc=18446604462528205824 : -139611181345792 : ffff81063d211400) 91056:00000020:00000040:13:1340264061.174471:0:6374:0:(genops.c:976:class_unlink_export()) PUTting export ffff81063d211400 : new refcount 4 91057:00000020:00000040:13:1340264061.174473:0:6374:0:(genops.c:1193:class_disconnect()) PUTting export ffff81063d211400 : new refcount 3 91067:00002000:00000040:13:1340264061.174489:0:6374:0:(filter.c:2797:filter_disconnect()) PUTting export ffff81063d211400 : new refcount 2 91070:00000020:00080000:13:1340264061.174494:0:6374:0:(genops.c:1250:class_disconnect_export_list()) disconnected export at 192.168.128.38@o2ib (ffff81063d211400): rc 0 91071:00000020:00000040:13:1340264061.174496:0:6374:0:(genops.c:1251:class_disconnect_export_list()) PUTting export ffff81063d211400 : new refcount 1 |
| Comment by Lai Siyao [ 21/Jun/12 ] |
|
Could you upload full log? I need to know where the export refcount is taken. |
| Comment by Hiroya Nozaki [ 21/Jun/12 ] |
|
ah, the above comment was my mistake. I forgot to add one put. I'm really sorry. |
| Comment by Lai Siyao [ 29/Jun/12 ] |
|
Patch for master is at: http://review.whamcloud.com/#change,3244 |
| Comment by Jay Lan (Inactive) [ 17/Jul/12 ] |
|
Can you offically land the above patch? We have been running that patch in our production systems for ~2 weeks. I would like to see it landed. BTW, a port to b2_1 branch would be appreciated. |
| Comment by Peter Jones [ 02/Aug/12 ] |
|
Landed for 2.1.3 and 2.3 |
| Comment by Jay Lan (Inactive) [ 20/Sep/12 ] |
|
In the "ISSUE Links" section of this ticket, |