Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8715

Regression from LU-8057 causes loading of fld.ko hung in 2.7.2

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • None
    • Lustre 2.7.0
    • None
    • lustre server nas-2.7.2-3nasS running in centos 6.7.
    • 3
    • 9223372036854775807

    Description

      Since our nas-2.7.2-2nas rebased to b2_7_fe to nas-2.7.2-3nas, we found loading lustre module fld.ko hanged. Modprobe took 100% cpu time and could not be killed.

      I identified the culprit of the problem using git bisect:
      commit f23e22da88f07e95071ec76807aaa42ecd39e8ca
      Author: Amitoj Kaur Chawla <amitoj1606@gmail.com>
      Date: Thu Jun 16 23:12:03 2016 +0800

      LU-8057 ko2iblnd: Replace sg++ with sg = sg_next(sg)

      It was a b2_7_fe back port from the following one:
      Lustre-commit: d226464acaacccd240da43dcc22372fbf8cb04a6
      Lustre-change: http://review.whamcloud.com/19342

      Attachments

        Issue Links

          Activity

            [LU-8715] Regression from LU-8057 causes loading of fld.ko hung in 2.7.2

            Why not. The problem is the LIBCFS_ALLOC and FREE macros. Looking at the macros gave me a headache so no patch from me. I need to get into the right mental state to tackle it

            simmonsja James A Simmons added a comment - Why not. The problem is the LIBCFS_ALLOC and FREE macros. Looking at the macros gave me a headache so no patch from me. I need to get into the right mental state to tackle it

            James, can we do that fix under this ticket?

            doug Doug Oucharek (Inactive) added a comment - James, can we do that fix under this ticket?

            I know exactly what your problem is. We saw this problem in the lustre core some time ago and changed the OBD_ALLOC macros. The libcfs/LNet layer uses it own LIBCFS_ALLOC macros which means when the allocations are more than 2 pages in size they hit the vmalloc spinlock serialization issue. We need a fix for libcfs much like lustre had.

            simmonsja James A Simmons added a comment - I know exactly what your problem is. We saw this problem in the lustre core some time ago and changed the OBD_ALLOC macros. The libcfs/LNet layer uses it own LIBCFS_ALLOC macros which means when the allocations are more than 2 pages in size they hit the vmalloc spinlock serialization issue. We need a fix for libcfs much like lustre had.

            perf top showed during module load all the time is spent in __vmalloc_node.

            Samples: 748K of event 'cycles', Event count (approx.): 53812402443
            Overhead  Shared Object            Symbol
              96.21%  [kernel]                 [k] __vmalloc_node
               0.91%  [kernel]                 [k] read_hpet
               0.28%  [kernel]                 [k] get_vmalloc_info
               0.26%  [kernel]                 [k] __write_lock_failed
               0.25%  [kernel]                 [k] __read_lock_failed
               0.05%  [kernel]                 [k] apic_timer_interrupt
               0.05%  [kernel]                 [k] _spin_lock
               0.04%  perf                     [.] dso__find_symbol
               0.03%  [kernel]                 [k] find_busiest_group
               0.03%  [kernel]                 [k] clear_page_c
               0.03%  [kernel]                 [k] page_fault
               0.03%  [kernel]                 [k] memset
               0.02%  [kernel]                 [k] rcu_process_gp_end
               0.02%  perf                     [.] perf_evsel__parse_sample
               0.02%  [kernel]                 [k] sha_transform
               0.02%  [kernel]                 [k] native_write_msr_safe
            
            mhanafi Mahmoud Hanafi added a comment - perf top showed during module load all the time is spent in __vmalloc_node. Samples: 748K of event 'cycles' , Event count (approx.): 53812402443 Overhead Shared Object Symbol 96.21% [kernel] [k] __vmalloc_node 0.91% [kernel] [k] read_hpet 0.28% [kernel] [k] get_vmalloc_info 0.26% [kernel] [k] __write_lock_failed 0.25% [kernel] [k] __read_lock_failed 0.05% [kernel] [k] apic_timer_interrupt 0.05% [kernel] [k] _spin_lock 0.04% perf [.] dso__find_symbol 0.03% [kernel] [k] find_busiest_group 0.03% [kernel] [k] clear_page_c 0.03% [kernel] [k] page_fault 0.03% [kernel] [k] memset 0.02% [kernel] [k] rcu_process_gp_end 0.02% perf [.] perf_evsel__parse_sample 0.02% [kernel] [k] sha_transform 0.02% [kernel] [k] native_write_msr_safe

            we have >12,000 clients. We do see some servers consume all the credits.

            mhanafi Mahmoud Hanafi added a comment - we have >12,000 clients. We do see some servers consume all the credits.

            @Bruno Faccini: Yes, I can reproduce the problem on our freshly rebooted lustre servers by doing 'modprobe fld.'

            jaylan Jay Lan (Inactive) added a comment - @Bruno Faccini: Yes, I can reproduce the problem on our freshly rebooted lustre servers by doing 'modprobe fld.'

            Hi Doug,

            Can you please have a look into the issue since it relates to the LU-8057 change?

            Thanks.
            Joe

            jgmitter Joseph Gmitter (Inactive) added a comment - Hi Doug, Can you please have a look into the issue since it relates to the LU-8057 change? Thanks. Joe
            simmonsja James A Simmons added a comment - - edited

            The fix is correct and it fixes a real bug. What this change did is exposed another problem in the ko2iblnd driver. I have to ask is your system really consuming all those credits? I don't think the IB driver queue pair depth is big enough to handle all those credits.

            simmonsja James A Simmons added a comment - - edited The fix is correct and it fixes a real bug. What this change did is exposed another problem in the ko2iblnd driver. I have to ask is your system really consuming all those credits? I don't think the IB driver queue pair depth is big enough to handle all those credits.

            Module load time before was about 2-5mins, because we have large ntx values.
            (options ko2iblnd ntx=125536 credits=62768 fmr_pool_size=31385)
            But after the patch it takes >20mins

            mhanafi Mahmoud Hanafi added a comment - Module load time before was about 2-5mins, because we have large ntx values. (options ko2iblnd ntx=125536 credits=62768 fmr_pool_size=31385) But after the patch it takes >20mins

            Well, both the failure and suspected cause look surprising.
            Do you mean that the fld.ko module load simply hangs on a fresh system when running "modprobre lustre"?

            bfaccini Bruno Faccini (Inactive) added a comment - Well, both the failure and suspected cause look surprising. Do you mean that the fld.ko module load simply hangs on a fresh system when running "modprobre lustre"?

            People

              ashehata Amir Shehata (Inactive)
              jaylan Jay Lan (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: