[LU-9993] Open MPI detected an unexpected PSM2 error in opening an endpoint: Created: 15/Sep/17  Updated: 12/Aug/22  Resolved: 12/Aug/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Abe Assignee: Sonia Sharma (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None

Issue Links:
Duplicate
Severity: 3
Rank (Obsolete): 9223372036854775807

 Comments   
Comment by Abe [ 15/Sep/17 ]

In a clustre of 8 clients, we are seeing the # contexts being used with Intel Omnipath
cards has been exceeded and the mpirun jobs are terminated..
On the main client sbb-client1 the # free contexts is set 6 while other clients
have a value of 20.
root@sbb-client1 benchmark]# cat /sys/class/infiniband/hfi*/nfreectxts
6

Is there a way to increase the # of free context on the main client ?
Intel Opa version being used:
IntelOPA-IFS.RHEL73-x86_64.10.3.0.0.81

opainfo
hfi1_0:1 PortGID:0xfe80000000000000:00117501016749c3
PortState: Active
LinkSpeed Act: 25Gb En: 25Gb
LinkWidth Act: 4 En: 4
LinkWidthDnGrd ActTx: 4 Rx: 4 En: 3,4
LCRC Act: 14-bit En: 14-bit,16-bit,48-bit Mgmt: True
LID: 0x0000000e-0x0000000e SM LID: 0x00000003 SL: 0
QSFP: PassiveCu, 2m Hitachi Metals P/N IQSFP26C-20 Rev 02
Xmit Data: 1834754 MB Pkts: 248869099
Recv Data: 1834542 MB Pkts: 244279528
Link Quality: 5 (Excellent)

bb-client1.5361Calculation of subcontext count exceeded maximum supported
[sbb-client1:05361] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
[sbb-client1:05362] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5362Calculation of subcontext count exceeded maximum supported
[sbb-client1:05366] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5222Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5222Calculation of subcontext count exceeded maximum supported
[sbb-client1:05222] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5345Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5345Calculation of subcontext count exceeded maximum supported
[sbb-client1:05345] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5252Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5252Calculation of subcontext count exceeded maximum supported
[sbb-client1:05252] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5223Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5223Calculation of subcontext count exceeded maximum supported
[sbb-client1:05223] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5354Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5354Calculation of subcontext count exceeded maximum supported
[sbb-client1:05354] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5313Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5313Calculation of subcontext count exceeded maximum supported
[sbb-client1:05313] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5143Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5143Calculation of subcontext count exceeded maximum supported
[sbb-client1:05143] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5346Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5346Calculation of subcontext count exceeded maximum supported
[sbb-client1:05346] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5205Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5205Calculation of subcontext count exceeded maximum supported
[sbb-client1:05205] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5245Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5245Calculation of subcontext count exceeded maximum supported
[sbb-client1:05245] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5356Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5356Calculation of subcontext count exceeded maximum supported
[sbb-client1:05356] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5263Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5263Calculation of subcontext count exceeded maximum supported
[sbb-client1:05263] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5202Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5202Calculation of subcontext count exceeded maximum supported
[sbb-client1:05202] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5311Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5311Calculation of subcontext count exceeded maximum supported
[sbb-client1:05311] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5159Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5159Calculation of subcontext count exceeded maximum supported
[sbb-client1:05159] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5269Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5269Calculation of subcontext count exceeded maximum supported
[sbb-client1:05269] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5343Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5343Calculation of subcontext count exceeded maximum supported
[sbb-client1:05343] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5314Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5314Calculation of subcontext count exceeded maximum supported
[sbb-client1:05314] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5163Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5163Calculation of subcontext count exceeded maximum supported
[sbb-client1:05163] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5369Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5369Calculation of subcontext count exceeded maximum supported
[sbb-client1:05369] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5389Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5389Calculation of subcontext count exceeded maximum supported
[sbb-client1:05389] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5392Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5392Calculation of subcontext count exceeded maximum supported
[sbb-client1:05392] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5385Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5385Calculation of subcontext count exceeded maximum supported
[sbb-client1:05385] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5383Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5383Calculation of subcontext count exceeded maximum supported
[sbb-client1:05383] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5381Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5381Calculation of subcontext count exceeded maximum supported
[sbb-client1:05381] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5373Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5373Calculation of subcontext count exceeded maximum supported
[sbb-client1:05373] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5387Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5387Calculation of subcontext count exceeded maximum supported
[sbb-client1:05387] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5376Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5376Calculation of subcontext count exceeded maximum supported
[sbb-client1:05376] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
sbb-client1.5359Calculation of subcontext count exceeded maximum supported (err=8)
sbb-client1.5359Calculation of subcontext count exceeded maximum supported
[sbb-client1:05359] Open MPI detected an unexpected PSM2 error in opening an endpoint: Calculation of subcontext count exceeded maximum supported
[sbb-client1:05150] *** Process received signal ***
[sbb-client1:05150] Signal: Aborted (6)
[sbb-client1:05150] Signal code: (-6)
[sbb-client1:05150] [ 0] /lib64/libpthread.so.0(+0xf370)[0x7fe4c9b01370]
[sbb-client1:05150] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x7fe4c97661d7]
[sbb-client1:05150] [ 2] /lib64/libc.so.6(abort+0x148)[0x7fe4c97678c8]
[sbb-client1:05150] [ 3] /lib64/libpsm2.so.2(psm2_error_defer+0x6d)[0x7fe4c2a7687d]
[sbb-client1:05150] [ 4] /lib64/libpsm2.so.2(+0x12713)[0x7fe4c2a76713]
[sbb-client1:05150] [ 5] /lib64/libpsm2.so.2(+0xe902)[0x7fe4c2a72902]
[sbb-client1:05150] [ 6] /lib64/libpsm2.so.2(+0x10a22)[0x7fe4c2a74a22]
[sbb-client1:05150] [ 7] /lib64/libpsm2.so.2(psm2_ep_open+0x430)[0x7fe4c2a739b0]
[sbb-client1:05150] [ 8] /usr/mpi/gcc/openmpi-1.10.4-hfi/lib64/openmpi/mca_mtl_psm2.so(ompi_mtl_psm2_module_init+0x12c)[0x7fe4c2cdb59c]
[sbb-client1:05150] [ 9] /usr/mpi/gcc/openmpi-1.10.4-hfi/lib64/openmpi/mca_mtl_psm2.so(+0x285a)[0x7fe4c2cdb85a]
[sbb-client1:05150] [10] /usr/mpi/gcc/openmpi-1.10.4-hfi/lib64/libmpi.so.12(ompi_mtl_base_select+0x91)[0x7fe4c9d969f1]
[sbb-client1:05150] [11] /usr/mpi/gcc/openmpi-1.10.4-hfi/lib64/openmpi/mca_pml_cm.so(+0x3b6c)[0x7fe4c3b73b6c]
[sbb-client1:05150] [12] /usr/mpi/gcc/openmpi-1.10.4-hfi/lib64/libmpi.so.12(mca_pml_base_select+0x399)[0x7fe4c9d9e9a9]
[sbb-client1:05150] [13] /usr/mpi/gcc/openmpi-1.10.4-hfi/lib64/libmpi.so.12(ompi_mpi_init+0x4d4)[0x7fe4c9d534a4]
[sbb-client1:05150] [14] /usr/mpi/gcc/openmpi-1.10.4-hfi/lib64/libmpi.so.12(MPI_Init+0x16b)[0x7fe4c9d7305b]
[sbb-client1:05150] [15] /mnt/lustre/IOR/src/C/IOR[0x402567]
[sbb-client1:05150] [16] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fe4c9752b35]
[sbb-client1:05150] [17] /mnt/lustre/IOR/src/C/IOR[0x402419]
[sbb-client1:05150] *** End of error message ***
[sbb-client1:05187] *** Process received signal ***
[sbb-client1:05187] Signal: Aborted (6)
[sbb-client1:05187] Signal code: (-6)
[sbb-client1:05187] [ 0] /lib64/libpthread.so.0(+0xf370)[0x7fdf7050a370]
[sbb-client1:05187] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x7fdf7016f1d7]
[sbb-client1:05187] [ 2] /lib64/libc.so.6(abort+0x148)[0x7fdf701708c8]
[sbb-client1:05187] [ 3] /lib64/libpsm2.so.2(psm2_error_defer+0x6d)[0x7fdf6947f87d]
[sbb-client1:05187] [ 4] /lib64/libpsm2.so.2(+0x12713)[0x7fdf6947f713]
[sbb-client1:05187] [ 5] /lib64/libpsm2.so.2(+0xe902)[0x7fdf6947b902]
[sbb-client1:05187] [ 6] /lib64/libpsm2.so.2(+0x10a22)[0x7fdf6947da22]
[sbb-client1:05187] [ 7] /lib64/libpsm2.so.2(psm2_ep_open+0x430)[0x7fdf6947c9b0]
[sbb-client1:05187] [ 8] /usr/mpi/gcc/openmpi-1.10.4-hfi/lib64/openmpi/mca_mtl_psm2.so(ompi_mtl_psm2_module_init+0x12c)[0x7fdf696e459c]
[sbb-client1:05187] [ 9] /usr/mpi/gcc/openmpi-1.10.4-hfi/lib64/openmpi/mca_mtl_psm2.so(+0x285a)[0x7fdf696e485a]
[sbb-client1:05187] [10] /usr/mpi/gcc/openmpi-1.10.4-hfi/lib64/libmpi.so.12(ompi_mtl_base_select+0x91)[0x7fdf7079f9f1]
[sbb-client1:05187] [11] /usr/mpi/gcc/openmpi-1.10.4-hfi/lib64/openmpi/mca_pml_cm.so(+0x3b6c)[0x7fdf6a57cb6c]
[sbb-client1:05187] [12] /usr/mpi/gcc/openmpi-1.10.4-hfi/lib64/libmpi.so.12(mca_pml_base_select+0x399)[0x7fdf707a79a9]
[sbb-client1:05187] [13] /usr/mpi/gcc/openmpi-1.10.4-hfi/lib64/libmpi.so.12(ompi_mpi_init+0x4d4)[0x7fdf7075c4a4]
[sbb-client1:05187] [14] /usr/mpi/gcc/openmpi-1.10.4-hfi/lib64/libmpi.so.12(MPI_Init+0x16b)[0x7fdf7077c05b]
[sbb-client1:05187] [15] /mnt/lustre/IOR/src/C/IOR[0x402567]
[sbb-client1:05187] [16] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fdf7015bb35]
[sbb-client1:05187] [17] /mnt/lustre/IOR/src/C/IOR[0x402419]
[sbb-client1:05187] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 8 with PID 5150 on node sbb-client1 exited on signal 6 (Aborted).
--------------------------------------------------------------------------
error: get_param: param_path 'osd-zfs/tempkm-OST0001/brw_stats': No such file or directory
kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
Killed by signal 15.
Killed by signal 15.
Killed by signal 15.
Killed by signal 15.
[root@sbb-client1 benchmark]# exit
exit
Script done, file is sep11-mpirun-context-fixed

[root@sbb-client1 benchmark]# vi /etc/modprobe.d/hfi1.conf
[root@sbb-client1 benchmark]#

thanks,
Abe

Comment by Abe [ 21/Sep/17 ]

has this ticket been assigned yet?

thanks,
Abe

Comment by Abe [ 25/Sep/17 ]

Hi,
This issue has been duplicated to SUP-13 and SUP-13 has not been resolved.
Can someone pls assign this ticket to the right team (Intel Omnipath support) to help resolve this ?

thanks,
Abe

Comment by Peter Jones [ 06/Oct/17 ]

Sonia

Is this an LNET issue or should Abe follow up with OPA support?

Peter

Comment by Abe [ 06/Oct/17 ]

this an issue with opa, the opa techsupport alias is no longer valid..

thanks,
Abe

Comment by Peter Jones [ 07/Oct/17 ]

Abe

Could you please elaborate as to what mechanism you are using and how the behaviour has changed?

Peter

Generated at Sat Feb 10 02:31:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.