[LU-9089] performance-sanity test_4 OpenFabrics vendor limiting the amount of physical memory Created: 08/Feb/17 Updated: 21/May/21 Resolved: 21/May/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Casper | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
onyx-64-67, Full Group test, |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
performance-sanity, test_4 TIMEOUT Access to logs: https://testing.hpdd.intel.com/test_sets/095753c6-e5e9-11e6-b6d4-5254006e85c2 Also seen in November 2016 (DCO-6144). Note: Timeout issues for this test have been seen since 2011 and have most frequently been associated with From test_log: + su mpiuser sh -c "/usr/lib64/compat-openmpi16/bin/mpirun -mca boot ssh -machinefile /tmp/mdsrate-create-large.machines -np 1 /usr/lib64/lustre/tests/mdsrate --create --time 600 --nfiles 52671 --dir /mnt/lustre/mdsrate/single --filefmt 'f%%d' "
--------------------------------------------------------------------------
A deprecated MCA parameter value was specified in an MCA parameter
file. Deprecated MCA parameters should be avoided; they may disappear
in future releases.
Deprecated parameter: plm_rsh_agent
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A deprecated MCA parameter value was specified in an MCA parameter
file. Deprecated MCA parameters should be avoided; they may disappear
in future releases.
Deprecated parameter: plm_rsh_agent
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: It appears that your OpenFabrics subsystem is configured to only
allow registering part of your physical memory. This can cause MPI jobs to
run with erratic performance, hang, and/or crash.
This may be caused by your OpenFabrics vendor limiting the amount of
physical memory that can be registered. You should investigate the
relevant Linux kernel module parameters that control how much physical
memory can be registered, and increase them to allow registering all
physical memory on your machine.
See this Open MPI FAQ item for more information on these Linux kernel module
parameters:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
Local host: onyx-66.onyx.hpdd.intel.com
Registerable memory: 32768 MiB
Total memory: 49110 MiB
Your MPI job will continue, but may be behave poorly and/or hang.
--------------------------------------------------------------------------
|