Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.2.0, Lustre 2.1.2
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
4837

Description

Hi,

At CEA they have experienced at least 3 hangs of an MDS in the following context:

most of the memory (> 99%) is allocated by the kernel, caching inodes + attributes (system have 64 GB of memory)

31 threads (over 32 cores) are busy waiting with the following backtrace (all are kiblns_sd_xx)

_spin_lock
kiblnd_pool_alloc_node
kiblnd_get_idle_tx
kiblnd_check_sends
kiblnd_post_rx
kiblnd_recv
lnet_ni_recv
lnet_recv_put
lnet_parse
kiblnd_handle_rx
kiblnd_rx_complete
kiblnd_complete
kiblnd_scheduler
kernel_thread

1 thread (kiblnd_sd_11 in this case) is scheduled with the following bt

schedule
cfs_schedule
kiblnd_pool_alloc_node
kiblnd_get_idle_tx
kiblnd_check_sends
kiblnd_post_rx
kiblnd_recv
lnet_ni_recv
lnet_recv_put
lnet_parse
kiblnd_handle_rx
kiblnd_rx_complete
kiblnd_complete
kiblnd_scheduler
kernel_thread

1 thread (router_checker) is busy waiting with the following bt (the 32 th)

spin_lock
kiblnd_pool_alloc_node
kiblnd_get_idle_tx
kiblnd_send
lnet_ni_send
lnet_send
LNetGet
lnet_router_checker
kernel_thread

1 thread (kiblnd_connd) is scheduled with the following bt (the one owning the lock?)

schedule_timeout
__down
down
ldlm_pools_shrink
ldlm_polls_cli_shrink
shrink_slab
do_try_to_free_pages
__alloc_pages_nodemask
kmem_getpages
fallback_alloc
____cache_alloc_node
__kmalloc
cfs_alloc
kiblnd_alloc_pages
kiblnd_create_tx_pool
kiblnd_pool_alloc_node
kiblnd_get_idle_tx
kiblnd_check_sends
kiblnd_check_conns
kiblnd_connd
kernel_kthread

So, havind #cpu threads busy waiting on a spinlock is not ideal, especially when the owner seems to be scheduled awaiting for something else to happen. At the moment it is not possible to verify if the lock is the same for all threads because the crash dump is not accessible.

Trying to workaround this issue, we made a patch that allows choosing how many kiblnd threads are started, via a kernel module option. We set this number to 24 (for 32 cores).
The issue reproduced, this time having 24 threads spinning on a spinlock, 4 kswapd waiting to acquire a lock in lu_cache_shrink, most of the others threads stucks in congestion_wait.

Cheers,
Sebastien.

Attachments

Issue Links

Trackbacks

Changelog 2.1 Changes from version 2.1.1 to version 2.1.2 Server support for kernels: 2.6.18308.4.1.el5 (RHEL5) 2.6.32220.17.1.el6 (RHEL6) Client support for unpatched kernels: 2.6.18308.4.1.el5 (RHEL5) 2.6.32220.17.1....

Changelog 2.2 version 2.2.0 Support for networks: o2iblnd OFED 1.5.4 Server support for kernels: 2.6.32220.4.2.el6 (RHEL6) Client support for unpatched kernels: 2.6.18274.18.1.el5 (RHEL5) 2.6.32220.4.2.el6 (RHEL6) 2.6.32.360....

Activity

People

Assignee:: Lai Siyao

Reporter:: Sebastien Buisson (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 14/Sep/11 11:00 AM

Updated:: 04/May/12 9:03 AM

Resolved:: 12/Dec/11 12:09 PM