Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.13.0
-
None
-
3
-
18,015
-
9223372036854775807
Description
I finally traced my debug kernel problems with later rhel releases to RCU breakage of some sort.
ldlm_locks slab is declared as SLAB_DESTROY_BY_RCU if it's defined This is going back to bugzilla 18015 https://bugzilla.lustre.org/show_bug.cgi?id=18015 patch by BobiJam.
Now it appears that as we schedule a free in that slab and then destroy the slab, the actual free is delayed and is executed after the slab is already freed despite rcu_barrier() being present.
Clear bug that I will file rh bugzilla ticket for.
But in addition to that I wonder how much do we need that thing nowadays, esp. considering that newer kernels renamed the flag to SLAB_TYPESAFE_BY_RCU that we do not detect and just not set it in that case.
Should we just convert ldlm_locks into a normal slab again I wonder?
Attachments
Issue Links
- is duplicated by
-
LU-12454 parallel-scale-nfsv3 test racer_on_nfs crashes with ‘BUG: unable to handle kernel paging request’
-
- Resolved
-
- is related to
-
LU-17097 RCU stall caused by osc_quota_cleanup
-
- Resolved
-
- is related to
-
LU-11568 Get rid of SLAB_DESTROY_BY_RCU
-
- Resolved
-
-
LU-12374 client went down w/ panic during lustre_rmmod
-
- Resolved
-
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
If you want to get a patch into SUSE (all upper-case these days) you open an issue on bugzilla.suse.com, and explain what and why. If you assign the issue to me (or put me on cc or somehow let me know about it - I'm nfbrown@suse.com
in bugzilla) I can expedite it.
Or you can ask me directly, then I can create the bugzilla issue myself.
What would be even better would be for these fixup patches to have been marked "Fixes: ......". Then I would have be alerted to them by our automated machinery. It's too late for that though...