Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9089

performance-sanity test_4 OpenFabrics vendor limiting the amount of physical memory

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.10.0
    • None
    • onyx-64-67, Full Group test,
      master branch, v2.9.52, b3499, zfs,
      CentOS Linux 7 clients
    • 3
    • 9223372036854775807

    Description

      performance-sanity, test_4 TIMEOUT

      Access to logs: https://testing.hpdd.intel.com/test_sets/095753c6-e5e9-11e6-b6d4-5254006e85c2

      Also seen in November 2016 (DCO-6144).

      Note: Timeout issues for this test have been seen since 2011 and have most frequently been associated with LU-1357. 1357 attributes the timeout to the use of VMs. With this ticket, physical hardware was used for testing.

      From test_log:

      + su mpiuser sh -c "/usr/lib64/compat-openmpi16/bin/mpirun -mca boot ssh -machinefile /tmp/mdsrate-create-large.machines -np 1 /usr/lib64/lustre/tests/mdsrate --create --time 600 --nfiles 52671 --dir /mnt/lustre/mdsrate/single --filefmt 'f%%d' "
      --------------------------------------------------------------------------
      A deprecated MCA parameter value was specified in an MCA parameter
      file.  Deprecated MCA parameters should be avoided; they may disappear
      in future releases.
      
        Deprecated parameter: plm_rsh_agent
      --------------------------------------------------------------------------
      --------------------------------------------------------------------------
      A deprecated MCA parameter value was specified in an MCA parameter
      file.  Deprecated MCA parameters should be avoided; they may disappear
      in future releases.
      
        Deprecated parameter: plm_rsh_agent
      --------------------------------------------------------------------------
      --------------------------------------------------------------------------
      WARNING: It appears that your OpenFabrics subsystem is configured to only
      allow registering part of your physical memory.  This can cause MPI jobs to
      run with erratic performance, hang, and/or crash.
      
      This may be caused by your OpenFabrics vendor limiting the amount of
      physical memory that can be registered.  You should investigate the
      relevant Linux kernel module parameters that control how much physical
      memory can be registered, and increase them to allow registering all
      physical memory on your machine.
      
      See this Open MPI FAQ item for more information on these Linux kernel module
      parameters:
      
          http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
      
        Local host:              onyx-66.onyx.hpdd.intel.com
        Registerable memory:     32768 MiB
        Total memory:            49110 MiB
      
      Your MPI job will continue, but may be behave poorly and/or hang.
      --------------------------------------------------------------------------
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jcasper James Casper
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: