[LU-2452] parallel-scale test_write_append_truncate: trunc-after-APPEND bad [435936-441669]/[0x6a6e0-0x6bd45] != c Created: 10/Dec/12  Updated: 13/Dec/12  Resolved: 13/Dec/12

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.4
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Jian Yu Assignee: Yang Sheng
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Lustre Branch: b2_1
Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/148
Distro/Arch: RHEL5.8/x86_64 (kernel version: 2.6.18-308.20.1.el5)
Network: TCP (1GigE)


Issue Links:
Duplicate
duplicates LU-2453 parallel-scale test_write_disjoint: i... Resolved
Related
is related to LU-2304 Test failure sanityn test_16: dual-mo... Resolved
Severity: 3
Rank (Obsolete): 5792

 Description   

The parallel-scale test write_append_truncate failed as follows:

== parallel-scale test write_append_truncate: write_append_truncate ================================== 14:32:14 (1355005934)
OPTIONS:
clients=fat-intel-3vm5,fat-intel-3vm6.lab.whamcloud.com 
write_REP=10000
write_THREADS=8
MACHINEFILE=/tmp/parallel-scale.machines
fat-intel-3vm5
fat-intel-3vm6.lab.whamcloud.com
+ write_append_truncate -n 10000 /mnt/lustre/d0.write_append_truncate/f0.wat
+ chmod 0777 /mnt/lustre
drwxrwxrwx 4 root root 4096 Dec  8 14:32 /mnt/lustre
+ su mpiuser sh -c "/usr/lib64/openmpi/1.4-gcc/bin/mpirun -mca boot ssh -np 16 -machinefile /tmp/parallel-scale.machines write_append_truncate -n 10000 /mnt/lustre/d0.write_append_truncate/f0.wat "
--------------------------------------------------------------------------
[[21817,1],7]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: fat-intel-3vm6.lab.whamcloud.com

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
r= 0: create /mnt/lustre/d0.write_append_truncate/f0.wat, max size: 3703701, seed 1355005938: No such file or directory
r= 0 l=0000: WR A 1233669/0x12d305, AP a  845367/0x0ce637, TR@ 1479192/0x169218
r= 0 l=0002: trunc-after-APPEND bad [435936-441669]/[0x6a6e0-0x6bd45] != c
r= 0 l=0002: WR C  435936/0x06a6e0, AP c   47201/0x00b861, TR@  441670/0x06bd46
000000   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C
*
06a6e0 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
*
06bd40 nul nul nul nul nul nul
06bd46
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
with errorcode 1.

Maloo report: https://maloo.whamcloud.com/test_sets/bfc081dc-41bf-11e2-a653-52540035b04c



 Comments   
Comment by Peter Jones [ 11/Dec/12 ]

Yangsheng is looking into this one

Comment by Jinshan Xiong (Inactive) [ 12/Dec/12 ]

this is a duplication of LU-2304. we need to port the patch here.

Comment by Jian Yu [ 13/Dec/12 ]

This is a duplicate of LU-2453, and then a duplicate of LU-2304.

Generated at Sat Feb 10 01:25:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.