[LU-2453] parallel-scale test_write_disjoint: invalid file size 723793 instead of 827192 = 103399 * 8 Created: 10/Dec/12 Updated: 06/May/13 Resolved: 06/May/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.4 |
| Fix Version/s: | Lustre 2.1.4 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Jian Yu | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre Branch: b2_1 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Bugzilla ID: | 2,304 | ||||||||
| Rank (Obsolete): | 5793 | ||||||||
| Description |
|
The parallel-scale test write_disjoint failed as follows: == parallel-scale test write_disjoint: write_disjoint ================================================ 14:32:34 (1355005954) OPTIONS: WRITE_DISJOINT=/usr/lib64/lustre/tests/write_disjoint clients=fat-intel-3vm5,fat-intel-3vm6.lab.whamcloud.com wdisjoint_THREADS=4 wdisjoint_REP=10000 MACHINEFILE=/tmp/parallel-scale.machines fat-intel-3vm5 fat-intel-3vm6.lab.whamcloud.com + /usr/lib64/lustre/tests/write_disjoint -f /mnt/lustre/d0.write_disjoint/file -n 10000 + chmod 0777 /mnt/lustre drwxrwxrwx 5 root root 4096 Dec 8 14:32 /mnt/lustre + su mpiuser sh -c "/usr/lib64/openmpi/1.4-gcc/bin/mpirun -mca boot ssh -np 8 -machinefile /tmp/parallel-scale.machines /usr/lib64/lustre/tests/write_disjoint -f /mnt/lustre/d0.write_disjoint/file -n 10000 " -------------------------------------------------------------------------- [[22376,1],3]: A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces: Module: OpenFabrics (openib) Host: fat-intel-3vm6.lab.whamcloud.com Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- loop 0: chunk_size 103399 rank 3, loop 0: invalid file size 723793 instead of 827192 = 103399 * 8 -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 4 in communicator MPI_COMM_WORLD with errorcode -1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- rank 4, loop 0: invalid file size 723793 instead of 827192 = 103399 * 8 rank 7, loop 0: invalid file size 723793 instead of 827192 = 103399 * 8 -------------------------------------------------------------------------- Maloo report: https://maloo.whamcloud.com/test_sets/bfc081dc-41bf-11e2-a653-52540035b04c |
| Comments |
| Comment by Peter Jones [ 11/Dec/12 ] |
|
Lai is looking into this one |
| Comment by Lai Siyao [ 11/Dec/12 ] |
|
Strange I can't reproduce it in my setup (rhel 6), I'll test on rhel5 tomorrow. |
| Comment by Lai Siyao [ 12/Dec/12 ] |
|
I reproduced it on rhel5, and http://review.whamcloud.com/#change,4482 for |
| Comment by Jinshan Xiong (Inactive) [ 12/Dec/12 ] |
|
I applied patch of |
| Comment by Jian Yu [ 12/Dec/12 ] |
|
Please backport the patch of |
| Comment by Peter Jones [ 12/Dec/12 ] |
|
Yujian The port seems to be here - http://review.whamcloud.com/#change,4818 Peter |
| Comment by Lai Siyao [ 06/May/13 ] |
|
landed |