[LU-8715] Regression from LU-8057 causes loading of fld.ko hung in 2.7.2 - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.7.0
Labels:
None
Environment:
lustre server nas-2.7.2-3nasS running in centos 6.7.

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Since our nas-2.7.2-2nas rebased to b2_7_fe to nas-2.7.2-3nas, we found loading lustre module fld.ko hanged. Modprobe took 100% cpu time and could not be killed.

I identified the culprit of the problem using git bisect:
commit f23e22da88f07e95071ec76807aaa42ecd39e8ca
Author: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Date: Thu Jun 16 23:12:03 2016 +0800

~~LU-8057~~ ko2iblnd: Replace sg++ with sg = sg_next(sg)

It was a b2_7_fe back port from the following one:
Lustre-commit: d226464acaacccd240da43dcc22372fbf8cb04a6
Lustre-change: http://review.whamcloud.com/19342

Attachments

Issue Links

is related to

LU-8057 o2iblnd driver is causing memory corruption due to improper handling of scatter list.

Resolved

Activity

[LU-8715] Regression from LU-8057 causes loading of fld.ko hung in 2.7.2

James A Simmons added a comment - 18/Oct/16 9:49 PM

Why not. The problem is the LIBCFS_ALLOC and FREE macros. Looking at the macros gave me a headache so no patch from me. I need to get into the right mental state to tackle it

James A Simmons added a comment - 18/Oct/16 9:49 PM Why not. The problem is the LIBCFS_ALLOC and FREE macros. Looking at the macros gave me a headache so no patch from me. I need to get into the right mental state to tackle it

Doug Oucharek (Inactive) added a comment - 18/Oct/16 9:09 PM

James, can we do that fix under this ticket?

Doug Oucharek (Inactive) added a comment - 18/Oct/16 9:09 PM James, can we do that fix under this ticket?

James A Simmons added a comment - 18/Oct/16 7:31 PM

I know exactly what your problem is. We saw this problem in the lustre core some time ago and changed the OBD_ALLOC macros. The libcfs/LNet layer uses it own LIBCFS_ALLOC macros which means when the allocations are more than 2 pages in size they hit the vmalloc spinlock serialization issue. We need a fix for libcfs much like lustre had.

James A Simmons added a comment - 18/Oct/16 7:31 PM I know exactly what your problem is. We saw this problem in the lustre core some time ago and changed the OBD_ALLOC macros. The libcfs/LNet layer uses it own LIBCFS_ALLOC macros which means when the allocations are more than 2 pages in size they hit the vmalloc spinlock serialization issue. We need a fix for libcfs much like lustre had.

Mahmoud Hanafi added a comment - 18/Oct/16 7:17 PM

perf top showed during module load all the time is spent in __vmalloc_node.

Samples: 748K of event 'cycles', Event count (approx.): 53812402443
Overhead  Shared Object            Symbol
  96.21%  [kernel]                 [k] __vmalloc_node
   0.91%  [kernel]                 [k] read_hpet
   0.28%  [kernel]                 [k] get_vmalloc_info
   0.26%  [kernel]                 [k] __write_lock_failed
   0.25%  [kernel]                 [k] __read_lock_failed
   0.05%  [kernel]                 [k] apic_timer_interrupt
   0.05%  [kernel]                 [k] _spin_lock
   0.04%  perf                     [.] dso__find_symbol
   0.03%  [kernel]                 [k] find_busiest_group
   0.03%  [kernel]                 [k] clear_page_c
   0.03%  [kernel]                 [k] page_fault
   0.03%  [kernel]                 [k] memset
   0.02%  [kernel]                 [k] rcu_process_gp_end
   0.02%  perf                     [.] perf_evsel__parse_sample
   0.02%  [kernel]                 [k] sha_transform
   0.02%  [kernel]                 [k] native_write_msr_safe

Mahmoud Hanafi added a comment - 18/Oct/16 7:17 PM perf top showed during module load all the time is spent in __vmalloc_node. Samples: 748K of event 'cycles' , Event count (approx.): 53812402443 Overhead Shared Object Symbol 96.21% [kernel] [k] __vmalloc_node 0.91% [kernel] [k] read_hpet 0.28% [kernel] [k] get_vmalloc_info 0.26% [kernel] [k] __write_lock_failed 0.25% [kernel] [k] __read_lock_failed 0.05% [kernel] [k] apic_timer_interrupt 0.05% [kernel] [k] _spin_lock 0.04% perf [.] dso__find_symbol 0.03% [kernel] [k] find_busiest_group 0.03% [kernel] [k] clear_page_c 0.03% [kernel] [k] page_fault 0.03% [kernel] [k] memset 0.02% [kernel] [k] rcu_process_gp_end 0.02% perf [.] perf_evsel__parse_sample 0.02% [kernel] [k] sha_transform 0.02% [kernel] [k] native_write_msr_safe

Mahmoud Hanafi added a comment - 18/Oct/16 6:27 PM

we have >12,000 clients. We do see some servers consume all the credits.

Mahmoud Hanafi added a comment - 18/Oct/16 6:27 PM we have >12,000 clients. We do see some servers consume all the credits.

Jay Lan (Inactive) added a comment - 18/Oct/16 6:23 PM

@Bruno Faccini: Yes, I can reproduce the problem on our freshly rebooted lustre servers by doing 'modprobe fld.'

Jay Lan (Inactive) added a comment - 18/Oct/16 6:23 PM @Bruno Faccini: Yes, I can reproduce the problem on our freshly rebooted lustre servers by doing 'modprobe fld.'

Joseph Gmitter (Inactive) added a comment - 18/Oct/16 5:14 PM

Hi Doug,

Can you please have a look into the issue since it relates to the ~~LU-8057~~ change?

Thanks.
Joe

Joseph Gmitter (Inactive) added a comment - 18/Oct/16 5:14 PM Hi Doug, Can you please have a look into the issue since it relates to the LU-8057 change? Thanks. Joe

James A Simmons added a comment - 18/Oct/16 4:38 PM - edited

The fix is correct and it fixes a real bug. What this change did is exposed another problem in the ko2iblnd driver. I have to ask is your system really consuming all those credits? I don't think the IB driver queue pair depth is big enough to handle all those credits.

James A Simmons added a comment - 18/Oct/16 4:38 PM - edited The fix is correct and it fixes a real bug. What this change did is exposed another problem in the ko2iblnd driver. I have to ask is your system really consuming all those credits? I don't think the IB driver queue pair depth is big enough to handle all those credits.

Mahmoud Hanafi added a comment - 18/Oct/16 4:33 PM

Module load time before was about 2-5mins, because we have large ntx values.
(options ko2iblnd ntx=125536 credits=62768 fmr_pool_size=31385)
But after the patch it takes >20mins

Mahmoud Hanafi added a comment - 18/Oct/16 4:33 PM Module load time before was about 2-5mins, because we have large ntx values. (options ko2iblnd ntx=125536 credits=62768 fmr_pool_size=31385) But after the patch it takes >20mins

Bruno Faccini (Inactive) added a comment - 18/Oct/16 6:56 AM

Well, both the failure and suspected cause look surprising.
Do you mean that the fld.ko module load simply hangs on a fresh system when running "modprobre lustre"?

Bruno Faccini (Inactive) added a comment - 18/Oct/16 6:56 AM Well, both the failure and suspected cause look surprising. Do you mean that the fld.ko module load simply hangs on a fresh system when running "modprobre lustre"?

People

Assignee:: Amir Shehata (Inactive)

Reporter:: Jay Lan (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 18/Oct/16 1:34 AM

Updated:: 18/Apr/18 6:07 PM

Resolved:: 18/Apr/18 6:07 PM