[LU-13086] 3c7aca747 LU-12395 breaks compatibility mpi tests with mpich Created: 18/Dec/19  Updated: 17/Sep/21  Resolved: 17/Sep/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Elena Gryaznova Assignee: Elena Gryaznova
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-13348 Landing LU-12395 broke an MPICH support Resolved
Related
is related to LU-12395 Failed dependencies while installing ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

commit 3c7aca747 LU-12395 adds "--oversubscribe" MPI run option which causes the MPI tests built with mpich fail:

[mpiexec@fre1307] match_arg (./utils/args/args.c:160): unrecognized argument oversubscribe
[mpiexec@fre1307] HYDU_parse_array (./utils/args/args.c:175): argument matching returned error


 Comments   
Comment by Elena Gryaznova [ 18/Dec/19 ]

Andreas,
Jian,
we need your advice how to fix this. I see only the way to remove this option or make it optional.

Please advice.
Thanks.

Comment by Andreas Dilger [ 11/Jan/20 ]

Added Minh, since he was the author for that patch.

Elena, I don't mind to make this optional. It seems we could also specify it as part of $MPIRUN_OPITIONS from the environment, but maybe this has some side-effect that I'm not aware of?

Comment by Gerrit Updater [ 03/Apr/20 ]

Elena Gryaznova (c17455@cray.com) uploaded a new patch: https://review.whamcloud.com/38130
Subject: LU-13086 tests: restore compatibility with mpich
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: fc31416862a5c6a066ee62429e896ffa1062c557

Comment by Cory Spitz [ 13/May/20 ]

mdiep, we could use your assistance with some questions in the review of https://review.whamcloud.com/#/c/38130/. Thanks!

Comment by Cory Spitz [ 20/May/20 ]

mdiep and jamesanunez can you get together and reconcile your opinions about --oversubscribe and where you set it? James, it sounds like you can't get --oversubscribe in your env with the patch as-is. Is that right? If so, will you and Minh both be happy if it is moved to cfg/local.sh? Please update and clarify your comments in the Gerrit review. Thanks!

Comment by Gerrit Updater [ 21/May/20 ]

Elena Gryaznova (c17455@cray.com) uploaded a new patch: https://review.whamcloud.com/38689
Subject: LU-13086 tests: restore compatibility with mpich
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0ad3e4795d6490a630432950184b42f8a83811e3

Comment by Cory Spitz [ 22/Jun/20 ]

mdiep and jamesanunez, now we have options between https://review.whamcloud.com/#/c/38130/ and https://review.whamcloud.com/#/c/38689/, but reviews have gone stale. Can you please express what approach you want to proceed with and why?

Comment by Cory Spitz [ 09/Jul/20 ]

mdiep, colmstea, and jamesanunez, with the activity at https://review.whamcloud.com/#/c/38689/ are we to assume that it is the preferred direction? And should https://review.whamcloud.com/#/c/38130/ be abandoned?

Comment by Cory Spitz [ 21/Jul/20 ]

mdiep, colmstea, and jamesanunez, so https://review.whamcloud.com/#/c/38130/ should be abandoned? And in https://review.whamcloud.com/#/c/38689/ Elena has proposed to "totally get rid of --oversubscribe". Can you agree?

Comment by Andreas Dilger [ 22/Jul/20 ]

spitzcor, I've abandoned 38130 and updated 38689 to address the minor defect therein. It needs a second review and testing to finish before it can land. It looks like a reasonable compromise to include --oversubscribe so that the testing works out-of-the-box for RHEL (which is far-and-away the most common distro used with Lustre), but still allow the external config file to specify different options based on the MPI version.

Comment by Gerrit Updater [ 23/Dec/20 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41082
Subject: LU-13086 config: add --oversubscribe to MPIRUN_OPTIONS
Project: private/autotest
Branch: master
Current Patch Set: 1
Commit: cca5447c7ee27ec2c7bd916f6a634ef8371b5cbd

Comment by Cory Spitz [ 04/Feb/21 ]

I don't have permissions to view https://review.whamcloud.com/#/c/41082/.

Comment by Cory Spitz [ 22/Feb/21 ]

Is https://review.whamcloud.com/#/c/41082/ supposed to be a replacement for https://review.whamcloud.com/#/c/38689/ ?

Comment by Gerrit Updater [ 02/Sep/21 ]

"Charlie Olmstead <charlie@whamcloud.com>" merged in patch https://review.whamcloud.com/41082/
Subject: LU-13086 config: add --oversubscribe to MPIRUN_OPTIONS
Project: private/autotest
Branch: master
Current Patch Set:
Commit: c01e4db25d8a47a9c27cada04d8e3b4ea83292d4

Comment by Charlie Olmstead [ 02/Sep/21 ]

AT Patch has been deployed

Comment by Gerrit Updater [ 17/Sep/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/38689/
Subject: LU-13086 tests: restore compatibility with mpich
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e418f47688facf07f2e9bd6535b71d484af4f8ac

Comment by Peter Jones [ 17/Sep/21 ]

Landed for 2.15

Generated at Sat Feb 10 02:58:15 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.