Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6285

Assert fails in staging client module crashes kernel if CPUMASK_OFFSTACK set

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.8.0, Lustre 2.9.0
    • Lustre 2.4.0
    • None
    • 3
    • 17617

    Description

      Enabling CONFIG_CPUMASK_OFFSTACK in stock kernel 3.18.0 causes the staging ptlrpc module to emit the message

      LustreError: 1203:0:(service.c:2796:ptlrpc_hr_init()) ASSERTION( hrp->hrp_nthrs > 0 ) failed:

      followed by a backtrace and kernel lockup upon loading. I'll attach my dmesg dump and the .config file I used. I picked version 2.4.0 above as there doesn't seem to be anyway to indicate the staging client version.

      Attachments

        1. bad
          110 kB
        2. config
          145 kB

        Issue Links

          Activity

            [LU-6285] Assert fails in staging client module crashes kernel if CPUMASK_OFFSTACK set
            pjones Peter Jones added a comment -

            Bulk of work landed for 2.9. Amir, please open a new ticket to track the landing of http://review.whamcloud.com/#/c/19106/

            pjones Peter Jones added a comment - Bulk of work landed for 2.9. Amir, please open a new ticket to track the landing of http://review.whamcloud.com/#/c/19106/

            Amir Shehata (amir.shehata@intel.com) uploaded a new patch: http://review.whamcloud.com/19106
            Subject: LU-6285 ptlrpc: Correctly calculate hrp->hrp_nthrs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2d6210842775e4fa7b0c7a6bce1dde8e948e56c9

            gerrit Gerrit Updater added a comment - Amir Shehata (amir.shehata@intel.com) uploaded a new patch: http://review.whamcloud.com/19106 Subject: LU-6285 ptlrpc: Correctly calculate hrp->hrp_nthrs Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2d6210842775e4fa7b0c7a6bce1dde8e948e56c9

            There is still an issue which could cause the assert.

            cpu_pattern can sepcify exactly 1 cpu in a partition:
            "0[0]". That means CPT0 will have CPU 0. CPU 0 can have
            hyperthreading enabled. This combination would result in

            weight = cfs_cpu_ht_nsiblings(0);
            hrp->hrp_nthrs = cfs_cpt_weight(ptlrpc_hr.hr_cpt_table, i);
            hrp->hrp_nthrs /= weight;
            

            evaluating to 0. Where

            cfs_cpt_weight(ptlrpc_hr.hr_cpt_table, i) == 1
            weight == 2
            

            Therefore only divide out with weight if

            hrp->hrp_nthrs >= weight
            

            This will avoid the assert:

            LASSERT(hrp->hrp_nthrs > 0);
            
            ashehata Amir Shehata (Inactive) added a comment - There is still an issue which could cause the assert. cpu_pattern can sepcify exactly 1 cpu in a partition: "0 [0] ". That means CPT0 will have CPU 0. CPU 0 can have hyperthreading enabled. This combination would result in weight = cfs_cpu_ht_nsiblings(0); hrp->hrp_nthrs = cfs_cpt_weight(ptlrpc_hr.hr_cpt_table, i); hrp->hrp_nthrs /= weight; evaluating to 0. Where cfs_cpt_weight(ptlrpc_hr.hr_cpt_table, i) == 1 weight == 2 Therefore only divide out with weight if hrp->hrp_nthrs >= weight This will avoid the assert: LASSERT(hrp->hrp_nthrs > 0);

            Please close this ticket

            simmonsja James A Simmons added a comment - Please close this ticket

            Excellent! Thanks everyone.

            twhitehead Tyson Whitehead (Inactive) added a comment - Excellent! Thanks everyone.

            All the patches have landed. We can close this ticket.

            simmonsja James A Simmons added a comment - All the patches have landed. We can close this ticket.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13926/
            Subject: LU-6285 libcfs: get rid of deprecated cpumask function usage
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 3b3233792869e706fe1ebfb6605d93fbc0d0d63c

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13926/ Subject: LU-6285 libcfs: get rid of deprecated cpumask function usage Project: fs/lustre-release Branch: master Current Patch Set: Commit: 3b3233792869e706fe1ebfb6605d93fbc0d0d63c

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13925/
            Subject: LU-6285 ptlrpc: Get rid of cpus_* calls as deprecated
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 61787e1cea610ba38ba917b73db0d43589c029df

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13925/ Subject: LU-6285 ptlrpc: Get rid of cpus_* calls as deprecated Project: fs/lustre-release Branch: master Current Patch Set: Commit: 61787e1cea610ba38ba917b73db0d43589c029df

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13954/
            Subject: LU-6285: o2iblnd: Do not use cpus_weight, it's deprecated
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: b95db057d0501fb19f807cddf3a8ba3f7f47cb1a

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13954/ Subject: LU-6285 : o2iblnd: Do not use cpus_weight, it's deprecated Project: fs/lustre-release Branch: master Current Patch Set: Commit: b95db057d0501fb19f807cddf3a8ba3f7f47cb1a

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13905/
            Subject: LU-6285 ptlrpc: Do not recalculate siblings of CPU 0 in a loop
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 0eb4582d87e32dd3e5491e13ba659e625624bfe7

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13905/ Subject: LU-6285 ptlrpc: Do not recalculate siblings of CPU 0 in a loop Project: fs/lustre-release Branch: master Current Patch Set: Commit: 0eb4582d87e32dd3e5491e13ba659e625624bfe7

            People

              green Oleg Drokin
              twhitehead Tyson Whitehead (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: