[LU-17251] parallel-scale test_rr_alloc: max/min OST objects (2800 : 923) too different Created: 01/Nov/23 Updated: 20/Dec/23 |
|
| Status: | In Progress |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Alex Deiter | Assignee: | Alex Deiter |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
parallel-scale test_rr_alloc: max/min OST objects (2800 : 923) too differentRHEL 8.8 x86_64 master/2.15.58.130 Failed session: https://testing.whamcloud.com/test_sessions/cde52bbc-3bcd-40b6-b9ad-fa35f8bc4deb CMD: onyx-82vm9 /usr/sbin/lctl set_param -n lod.lustre-MDT*.qos_threshold_rr=100 osp.lustre-OST*-osc-MDT*.create_count=3052 CMD: onyx-82vm10 /usr/sbin/lctl set_param -n lod.lustre-MDT*.qos_threshold_rr=100 osp.lustre-OST*-osc-MDT*.create_count=3052 CMD: onyx-82vm9 /usr/sbin/lctl set_param -n lod.lustre-MDT*.qos_threshold_rr=100 osp.lustre-OST*-osc-MDT*.create_count=3052 CMD: onyx-82vm10 /usr/sbin/lctl set_param -n lod.lustre-MDT*.qos_threshold_rr=100 osp.lustre-OST*-osc-MDT*.create_count=3052 CMD: onyx-82vm9 /usr/sbin/lctl get_param -n debug CMD: onyx-82vm10,onyx-82vm1.onyx.whamcloud.com,onyx-82vm2,onyx-82vm5,onyx-82vm9 /usr/sbin/lctl set_param -n debug=0 CMD: onyx-82vm1.onyx.whamcloud.com,onyx-82vm2 /usr/sbin/lctl set_param debug=\"super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck\" CMD: onyx-82vm10,onyx-82vm5,onyx-82vm9 /usr/sbin/lctl set_param debug=\"super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck\" CMD: onyx-82vm9 /usr/sbin/lctl get_param -n debug CMD: onyx-82vm10,onyx-82vm1.onyx.whamcloud.com,onyx-82vm2,onyx-82vm5,onyx-82vm9 /usr/sbin/lctl set_param -n debug=0 - unlinked 0 (time 1698827219 ; total 0 ; last 0) total: 1032 unlinks in 1 seconds: 1032.000000 unlinks/second CMD: onyx-82vm1.onyx.whamcloud.com,onyx-82vm2 /usr/sbin/lctl set_param debug=\"super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck\" CMD: onyx-82vm10,onyx-82vm5,onyx-82vm9 /usr/sbin/lctl set_param debug=\"super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck\" CMD: onyx-82vm9 lctl get_param -n osp.lustre-OST0000-osc-MDT0000.prealloc_last_id CMD: onyx-82vm9 lctl get_param -n osp.lustre-OST0000-osc-MDT0000.prealloc_next_id Warning: test may fail from too few objs on OST0 CMD: onyx-82vm9 lctl get_param -n osp.lustre-OST0001-osc-MDT0000.prealloc_last_id CMD: onyx-82vm9 lctl get_param -n osp.lustre-OST0001-osc-MDT0000.prealloc_next_id Warning: test may fail from too few objs on OST1 CMD: onyx-82vm9 lctl get_param -n osp.lustre-OST0002-osc-MDT0000.prealloc_last_id CMD: onyx-82vm9 lctl get_param -n osp.lustre-OST0002-osc-MDT0000.prealloc_next_id CMD: onyx-82vm9 lctl get_param -n osp.lustre-OST0003-osc-MDT0000.prealloc_last_id CMD: onyx-82vm9 lctl get_param -n osp.lustre-OST0003-osc-MDT0000.prealloc_next_id CMD: onyx-82vm9 lctl get_param -n osp.lustre-OST0004-osc-MDT0000.prealloc_last_id CMD: onyx-82vm9 lctl get_param -n osp.lustre-OST0004-osc-MDT0000.prealloc_next_id Warning: test may fail from too few objs on OST4 CMD: onyx-82vm9 lctl get_param -n osp.lustre-OST0005-osc-MDT0000.prealloc_last_id CMD: onyx-82vm9 lctl get_param -n osp.lustre-OST0005-osc-MDT0000.prealloc_next_id Warning: test may fail from too few objs on OST5 CMD: onyx-82vm9 lctl get_param -n osp.lustre-OST0006-osc-MDT0000.prealloc_last_id CMD: onyx-82vm9 lctl get_param -n osp.lustre-OST0006-osc-MDT0000.prealloc_next_id Warning: test may fail from too few objs on OST6 CMD: onyx-82vm9 lctl get_param -n osp.lustre-OST0007-osc-MDT0000.prealloc_last_id CMD: onyx-82vm9 lctl get_param -n osp.lustre-OST0007-osc-MDT0000.prealloc_next_id Warning: test may fail from too few objs on OST7 + chmod 0777 /mnt/lustre drwxrwxrwx 4 root root 4096 Nov 1 08:26 /mnt/lustre + su mpiuser bash -c "/usr/lib64/openmpi/bin/mpirun --mca btl tcp,self --mca btl_tcp_if_include eth0 -mca boot ssh --oversubscribe -np 22 /usr/lib64/openmpi/bin/rr_alloc /tmp/rr_alloc_mntpt/lustre/drr_alloc.parallel-scale/f 555 2 " CMD: onyx-82vm9 /usr/sbin/lctl set_param -n lod.lustre-MDT0000-mdtlov.qos_threshold_rr=17% CMD: onyx-82vm9 /usr/sbin/lctl set_param -n osp.lustre-OST0000-osc-MDT0000.create_count=128 CMD: onyx-82vm9 /usr/sbin/lctl set_param -n osp.lustre-OST0001-osc-MDT0000.create_count=64 CMD: onyx-82vm9 /usr/sbin/lctl set_param -n osp.lustre-OST0002-osc-MDT0000.create_count=64 CMD: onyx-82vm9 /usr/sbin/lctl set_param -n osp.lustre-OST0003-osc-MDT0000.create_count=64 CMD: onyx-82vm9 /usr/sbin/lctl set_param -n osp.lustre-OST0004-osc-MDT0000.create_count=128 CMD: onyx-82vm9 /usr/sbin/lctl set_param -n osp.lustre-OST0005-osc-MDT0000.create_count=128 CMD: onyx-82vm9 /usr/sbin/lctl set_param -n osp.lustre-OST0006-osc-MDT0000.create_count=128 CMD: onyx-82vm9 /usr/sbin/lctl set_param -n osp.lustre-OST0007-osc-MDT0000.create_count=64 parallel-scale test_rr_alloc: @@@@@@ FAIL: max/min OST objects (2230 : 1144) too different Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6727:error() = /usr/lib64/lustre/tests/functions.sh:1133:run_rr_alloc() = /usr/lib64/lustre/tests/parallel-scale.sh:163:test_rr_alloc() = /usr/lib64/lustre/tests/test-framework.sh:7067:run_one() = /usr/lib64/lustre/tests/test-framework.sh:7123:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:6953:run_test() = /usr/lib64/lustre/tests/parallel-scale.sh:165:main() |
| Comments |
| Comment by Gerrit Updater [ 01/Nov/23 ] |
|
"Alex Deiter <alex.deiter@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52940 |
| Comment by Andreas Dilger [ 02/Nov/23 ] |
|
I think this is also the same as |
| Comment by Gerrit Updater [ 03/Nov/23 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52968 |
| Comment by Andreas Dilger [ 03/Nov/23 ] |
|
Deiter I think my patch is complementary to yours. Yours is improving the test script, and the wait loop is still needed since my patch does not wait for the precreates to finish before returning from set_param. However, the "createmany/unlinkmany" dance is no longer needed, and maybe never was needed, and is counter-productive in my opinion. |
| Comment by Alex Deiter [ 03/Nov/23 ] |
|
Hello adilger, Thank you very much for the patch and detailed explanation! Thank you! |
| Comment by Gerrit Updater [ 18/Nov/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52968/ |
| Comment by Gerrit Updater [ 26/Nov/23 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53245 |
| Comment by Gerrit Updater [ 20/Dec/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53245/ |