-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.2.0
-
None
-
3
-
4687
Found during IR testing at ORNL.
On MDS startup soon after clients start hitting it, all mdt_xx threads are starting to use all cpu there is.
we tried to sysrq-t and all of them are in grow_rqbd
I checked the code and as soon as the thread is in that state, there is a unbreakable loop, that does 64*numonlinecpus(=16) = 1024 allocations of 16k in size.
the condition to enter there is racy the num posted rqbds < nbuf_group/2
so if 1000 of them would enter there at one time, we have 1000 threads doing 1024 of those allocations
we have kdump log, but it still needs to be transported.
- Trackbacks
-
Changelog 2.1
Changes from version 2.1.1 to version 2.1.2 Server support for kernels: 2.6.18308.4.1.el5 (RHEL5) 2.6.32220.17.1.el6 (RHEL6) Client support for unpatched kernels: 2.6.18308.4.1.el5 (RHEL5) 2.6.32220.17.1....
-
Changelog 2.2
version 2.2.0 Support for networks: o2iblnd OFED 1.5.4 Server support for kernels: 2.6.32220.4.2.el6 (RHEL6) Client support for unpatched kernels: 2.6.18274.18.1.el5 (RHEL5) 2.6.32220.4.2.el6 (RHEL6) 2.6.32.360....