Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.13.0
-
None
-
3
-
18,015
-
9223372036854775807
Description
I finally traced my debug kernel problems with later rhel releases to RCU breakage of some sort.
ldlm_locks slab is declared as SLAB_DESTROY_BY_RCU if it's defined This is going back to bugzilla 18015 https://bugzilla.lustre.org/show_bug.cgi?id=18015 patch by BobiJam.
Now it appears that as we schedule a free in that slab and then destroy the slab, the actual free is delayed and is executed after the slab is already freed despite rcu_barrier() being present.
Clear bug that I will file rh bugzilla ticket for.
But in addition to that I wonder how much do we need that thing nowadays, esp. considering that newer kernels renamed the flag to SLAB_TYPESAFE_BY_RCU that we do not detect and just not set it in that case.
Should we just convert ldlm_locks into a normal slab again I wonder?
Attachments
Issue Links
- is duplicated by
-
LU-12454 parallel-scale-nfsv3 test racer_on_nfs crashes with ‘BUG: unable to handle kernel paging request’
-
- Resolved
-
- is related to
-
LU-17097 RCU stall caused by osc_quota_cleanup
-
- Resolved
-
- is related to
-
LU-11568 Get rid of SLAB_DESTROY_BY_RCU
-
- Resolved
-
-
LU-12374 client went down w/ panic during lustre_rmmod
-
- Resolved
-
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...