[LU-7054] ib_cm scalling issue when lustre clients connect to OSS - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.8.0
Affects Version/s: Lustre 2.5.3
Labels:
None
Environment:
OFED3.5, MOFED241, and MOFED3.5

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

When a large number of lustre clients (>3000) try connecting to a OSS/MDS at the same, ib_cm threads on the OSS/MDS are unable to services the incoming connection in time. Using ibdump we have seen server replies taking 30sec, by that time the clients have timed out the request and are retrying which results in even more work for ib_cm.

Ib_cm is never able to catchup and usually requires a reboot of the server. Sometime we have been able to recover by ifdowning the ib interface, to give ib_cm time to 'catchup' and then ifuping the interface.

Most of the threads will be in 'D' state here is a example stack trace:

0xffff88062f3c0aa0     1655        2  0    1   D  0xffff88062f3c1140  ib_cm/1^M
sp                ip                Function (args)^M
0xffff880627237a90 0xffffffff81559b50 thread_return^M
0xffff880627237b58 0xffffffff8155b30e __mutex_lock_slowpath+0x13e (0xffff88062f76d260)^M
0xffff880627237bc8 0xffffffff8155b1ab mutex_lock+0x2b (0xffff88062f76d260)^M
0xffff880627237be8 0xffffffffa043f23e [rdma_cm]cma_disable_callback+0x2e (0xffff88062f76d000, unknown)^M
0xffff880627237c18 0xffffffffa044440f [rdma_cm]cma_req_handler+0x8f (0xffff880365eec200, 0xffff880494844698)^M
0xffff880627237d28 0xffffffffa0393e37 [ib_cm]cm_process_work+0x27 (0xffff880365eec200, 0xffff880494844600)^M
0xffff880627237d78 0xffffffffa0394aaa [ib_cm]cm_req_handler+0x6ba (0xffff880494844600)^M
0xffff880627237de8 0xffffffffa0395735 [ib_cm]cm_work_handler+0x145 (0xffff880494844600)^M
0xffff880627237e38 0xffffffff81093f30 worker_thread+0x170 (0xffffe8ffffc431c0)^M
0xffff880627237ee8 0xffffffff8109a106 kthread+0x96 (0xffff880627ae5da8)^M
0xffff880627237f48 0xffffffff8100c20a child_rip+0xa (unknown, unknown)^M

Using systemtap I was able to get a trace of ib_cm it shows a great deal of time is spent in spin_lock_irq. see attached file

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

load.pdf
85 kB
30/Oct/15 5:51 PM
lustre-log.1445147654.68807.gz
0.2 kB
18/Oct/15 10:55 PM
lustre-log.1445147717.68744.gz
0.2 kB
18/Oct/15 10:55 PM
lustre-log.1445147754.68673.gz
0.2 kB
18/Oct/15 10:55 PM
nbp8-os11.var.log.messages.oct.17.gz
27 kB
18/Oct/15 10:44 PM
opensfs-HLDForSMPnodeaffinity-060415-1623-4.pdf
564 kB
18/Sep/15 12:37 AM
read.pdf
92 kB
30/Oct/15 5:51 PM
service104.+net+malloc.gz
0.2 kB
15/Sep/15 7:21 AM
service115.+net.gz
1.04 MB
15/Sep/15 7:21 AM
trace.ib_cm_1rack.out.gz
759 kB
28/Aug/15 12:05 AM
write.pdf
91 kB
30/Oct/15 5:51 PM

Issue Links

is related to

LU-7290 lock callback not getting to client

Resolved

LU-7676 OSS Servers stuck in connecting/disconnect loop

Resolved

Activity

People

Assignee:: Amir Shehata (Inactive)

Reporter:: Mahmoud Hanafi

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 28/Aug/15 12:05 AM

Updated:: 12/May/16 12:24 PM

Resolved:: 27/Jan/16 1:04 AM