Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 1.8.6
-
None
-
lustre 1.8.3.0-6chaos
-
3
-
23,499
-
5093
Description
A sysadmin was shutting down an MDS node cleanly in preparation for scheduled upgrades. During the umount of the MGS device, we hit the following assertion:
LustreError ... (lprocfs_status.c:1060:lprocfs_free_client_stats())
ASSERTION(atomic_read(&client_stat->nid_exp_ref_count) == 0) failed: count 1
And the stack trace was:
:obdclass:lprocfs_free_client_stats
:obdclass:lprocfs_free_per_client_stats
:mgs:lproc_mgs_cleanup
:mgs:mgs_cleanup
:obdclass:class_decref
:obdclass:class_export_destroy
:obdclass:obd_zombie_impexp_cull
:obdclass:obd_zombie_impexp_thread
We have seen this same assertion from OSTs as well. Some investigation was done in bug 23499, but there is not yet a solution.
Attachments
Activity
Comment |
[ Hi, nice to meet you all, this is the first time to write something on the Jira ... anyway, I have a quick question on this issue. because the same problem have still occurred although I already tried to fix this bug with the patch, lu-39v2-master.patch. so, could you check the below case could be true or not ? If a client have mutiple NIDs and always used one NID to talk to a server until the client got a forced disconnect like request timeouts, and next, when the client try to connect the server using another NID, the server side's lprocfs_exp_setup set NULL to its exp->exp_nid_stats nevertheless nid_exp_ref_count is still equal or greater than 1. in this case, I think the server will hung in lprocfs_free_client_stats with the LASSERT statement. thank you for your cooperation. ] |
Fix Version/s | New: Lustre 1.8.6 [ 10022 ] | |
Fix Version/s | New: Lustre 2.1.0 [ 10021 ] | |
Resolution | New: Fixed [ 1 ] | |
Status | Original: Open [ 1 ] | New: Resolved [ 5 ] |
Bugzilla ID | New: 23499 |
Attachment | New: lu-39v2-master.patch [ 10100 ] |
Affects Version/s | New: Lustre 1.8.6 [ 10022 ] | |
Affects Version/s | Original: Lustre 1.8.x [ 10010 ] |
Integrated in
lustre-master-centos5 #159
LU-39ASSERTION(atomic_read(&client_stat->nid_exp_ref_count) == 0)Oleg Drokin : 2a6045403fbd46bb6501df907f0321f5401924ba
Files :