[LU-16126] Lustre 2.15.51 mdtest fails with MPI_Abort errors while adjusting max_rpcs_in_progress and using large number of clients Created: 30/Aug/22 Updated: 23/Jan/23 Resolved: 07/Sep/22 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Petros Koutoupis | Assignee: | Andreas Dilger |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
I have been trying to tune the knob of the max_rpcs_in_progress parameter on the MDTs in my system. Environment: On the MDTs, the osp.lustre-MDT0001-osp-MDT0000.max_rpcs_in_progress parameter is defaulted to 0. If I change this value to anything above (including 1) and run mdtest with 16 PPN per client using all 512 clients, I start to see MPI errors: V-1: Rank 0 Line 2565 Operation Duration Rate V-1: Rank 0 Line 2566 --------- -------- ---- V-1: Rank 0 Line 1957 main: * iteration 1 * V-2: Rank 0 Line 1966 main (for j loop): making o.testdir, '/lustre/pkoutoupis/testdir..1mdt.0/test-dir.0-0' V-1: Rank 0 Line 1883 Entering create_remove_directory_tree on /lustre/pkoutoupis/testdir..1mdt.0/test-dir.0-0, currDepth = 0... V-2: Rank 0 Line 1889 Making directory '/lustre/pkoutoupis/testdir..1mdt.0/test-dir.0-0/mdtest_tree.0.0/' V-1: Rank 0 Line 1883 Entering create_remove_directory_tree on /lustre/pkoutoupis/testdir..1mdt.0/test-dir.0-0/mdtest_tree.0.0/, currDepth = 1... V-1: Rank 0 Line 2033 V-1: main: Tree creation : 2.250 sec, 0.444 ops/sec delaying 30 seconds . . . application called MPI_Abort(MPI_COMM_WORLD, -1) - process 7328 srun: error: mo0627: tasks 7328-7343: Exited with exit code 255 application called MPI_Abort(MPI_COMM_WORLD, -1) - process 7329 application called MPI_Abort(MPI_COMM_WORLD, -1) - process 7330 application called MPI_Abort(MPI_COMM_WORLD, -1) - process 7331 application called MPI_Abort(MPI_COMM_WORLD, -1) - process 7332 application called MPI_Abort(MPI_COMM_WORLD, -1) - process 7333 It seems like I may have narrowed this down to commit 23028efcae |
| Comments |
| Comment by Olaf Faaland [ 30/Aug/22 ] |
|
Hi Petros,
Since this commit landed to master after 2.15.0, that confused me and may confuse others (when someone says "2.15" I assume they mean something on branch b2_15). If you change the description and the summary of this ticket to say "2.15.50" or "master post-2.15.0" that would help - thanks. |
| Comment by James A Simmons [ 30/Aug/22 ] |
|
Do you have kernel logs? I do wonder if this is related to LU-16065 |
| Comment by Andreas Dilger [ 30/Aug/22 ] |
|
Note that we don't regularly test with 64 MDTs, so there may be some kind of thread starvation happening of all of the MDTs are sending RpcS to each other and the service thread count is not high enough. Even before debug logs, getting the console logs from the client and MDS nodes would be useful (eg. if clients are being evicted, stacks are dumped, etc.) |
| Comment by Petros Koutoupis [ 07/Sep/22 ] |
|
I went back to collect more data and unfortunately, I am unable to reproduce the original issue. Closing this ticket unless I can observe the problem again. |
| Comment by Nathan Rutman [ 16/Nov/22 ] |
|
Another problem: you can't get it back to 0: [root@cslmo4902 ~]# lctl get_param *.*.max_rpcs_in_progress osp.testfs-MDT0000-osp-MDT0001.max_rpcs_in_progress=0 osp.testfs-MDT0001-osp-MDT0000.max_rpcs_in_progress=0 [root@cslmo4902 ~]# lctl set_param osp.testfs-MDT0000-osp-MDT0001.max_rpcs_in_progress=100 osp.testfs-MDT0000-osp-MDT0001.max_rpcs_in_progress=100 [root@cslmo4902 ~]# lctl set_param osp.testfs-MDT0000-osp-MDT0001.max_rpcs_in_progress=0 error: set_param: setting /sys/fs/lustre/osp/testfs-MDT0000-osp-MDT0001/max_rpcs_in_progress=0: Numerical result out of range |
| Comment by Petros Koutoupis [ 16/Nov/22 ] |
|
Nathan, I did notice that but forgot about that quirk until now. Thank you for refreshing my memory. |
| Comment by James A Simmons [ 16/Nov/22 ] |
|
That is the same behavior for mdc.*.max_rpcs_in_progress. Its due to the test: if (max > OBD_MAX_RIF_MAX || max < 1) in obd_set_max_rpcs_in_flight(). Setting it back to 1 is the default setting. Zero would mean don't send anything which we don't want. |
| Comment by Andreas Dilger [ 16/Nov/22 ] |
|
So maybe the issue here is that the max_rpcs_in_progress value shouldn't show "0" initially? |
| Comment by James A Simmons [ 16/Nov/22 ] |
|
Yeah the initial value is wrong. |
| Comment by Gerrit Updater [ 17/Nov/22 ] |
|
"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49182 |
| Comment by James A Simmons [ 23/Jan/23 ] |
|
I took another look at this and in my testing I see: /usr/src/lustre-2.15.52/lustre/tests# ../utils/lctl get_param ..max_rpcs_* Nathan what version of Lustre you are running that you see a setting of zero?
|