[LU-701] parallel-scale test_write_disjoint fails due to invalid file size Created: 21/Sep/11  Updated: 06/Sep/13  Resolved: 06/Sep/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0, Lustre 2.4.0, Lustre 1.8.7, Lustre 2.5.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Minh Diep Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Lustre Clients:
Tag: 1.8.6-wc1
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32_131.2.1.el6)
Build: http://newbuild.whamcloud.com/job/lustre-b1_8/100/arch=x86_64,build_type=client,distro=el6,ib_stack=inkernel/
Network: TCP
ENABLE_QUOTA=yes

Lustre Servers:
Tag: v2_1_0_0_RC2
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32-131.6.1.el6_lustre.g65156ed.x86_64)
Build: http://newbuild.whamcloud.com/job/lustre-master/228/arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel/
Network: TCP


Issue Links:
Duplicate
duplicates LU-3027 Failure on test suite parallel-scale ... Resolved
Severity: 3
Rank (Obsolete): 5514

 Description   

v2_1_0_0_RC2 testing, MPI_ABORT for unknown reason. No console, syslog at all in the report (maloo bug?)

Report: https://maloo.whamcloud.com/test_sets/44dc4934-e440-11e0-9909-52540025f9af

== parallel-scale test write_disjoint: write_disjoint == 14:43:05 (1316554985)
OPTIONS:
WRITE_DISJOINT=/usr/lib64/lustre/tests/write_disjoint
clients=fat-intel-1vm1,fat-intel-1vm2
wdisjoint_THREADS=4
wdisjoint_REP=10000
MACHINEFILE=/tmp/parallel-scale.machines
fat-intel-1vm1
fat-intel-1vm2
+ /usr/lib64/lustre/tests/write_disjoint -f /mnt/lustre/d0.write_disjoint/file -n 10000
UUID Inodes IUsed IFree IUse% Mounted on
lustre-MDT0000_UUID 5000040 86 4999954 0% /mnt/lustre[MDT:0]
lustre-OST0000_UUID 167552 10974 156578 7% /mnt/lustre[OST:0]
lustre-OST0001_UUID 167552 11326 156226 7% /mnt/lustre[OST:1]
lustre-OST0002_UUID 167552 3807 163745 2% /mnt/lustre[OST:2]
lustre-OST0003_UUID 167552 4830 162722 3% /mnt/lustre[OST:3]
lustre-OST0004_UUID 167552 3806 163746 2% /mnt/lustre[OST:4]
lustre-OST0005_UUID 167552 3646 163906 2% /mnt/lustre[OST:5]
lustre-OST0006_UUID 167552 3806 163746 2% /mnt/lustre[OST:6]

filesystem summary: 5000040 86 4999954 0% /mnt/lustre

+ chmod 0777 /mnt/lustre
drwxrwxrwx 7 root root 4096 Sep 20 14:43 /mnt/lustre
+ su mpiuser sh -c "/usr/lib64/openmpi/bin/mpirun -mca boot ssh -mca btl tcp,self -np 8 -machinefile /tmp/parallel-scale.machines /usr/lib64/lustre/tests/write_disjoint -f /mnt/lustre/d0.write_disjoint/file -n 10000 "
loop 0: chunk_size 103399
loop 1000: chunk_size 69125
loop 2000: chunk_size 104360
loop 3000: chunk_size 11295
loop 4000: chunk_size 77918
loop 5000: chunk_size 27295
loop 6000: chunk_size 42065
loop 7000: chunk_size 82749
loop 8000: chunk_size 94370
loop 9000: chunk_size 107226
loop 9371: chunk_size 25301, file size was 202408
rank 5, loop 9372: invalid file size 801136 instead of 915584 = 114448 * 8
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 5 in communicator MPI_COMM_WORLD
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 5 with PID 30944 on
node fat-intel-1vm2 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[fat-intel-1vm2.lab.whamcloud.com][[61908,1],7][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] [fat-intel-1vm1.lab.whamcloud.com][[61908,1],4][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
UUID Inodes IUsed IFree IUse% Mounted on
lustre-MDT0000_UUID 5000040 87 4999953 0% /mnt/lustre[MDT:0]
lustre-OST0000_UUID 167552 10974 156578 7% /mnt/lustre[OST:0]
lustre-OST0001_UUID 167552 11326 156226 7% /mnt/lustre[OST:1]
lustre-OST0002_UUID 167552 3806 163746 2% /mnt/lustre[OST:2]
lustre-OST0003_UUID 167552 4830 162722 3% /mnt/lustre[OST:3]
lustre-OST0004_UUID 167552 3806 163746 2% /mnt/lustre[OST:4]
lustre-OST0005_UUID 167552 3646 163906 2% /mnt/lustre[OST:5]
lustre-OST0006_UUID 167552 3806 163746 2% /mnt/lustre[OST:6]

filesystem summary: 5000040 87 4999953 0% /mnt/lustre

parallel-scale test_write_disjoint: @@@@@@ FAIL: write_disjoint failed! 1
Dumping lctl log to /logdir/test_logs/2011-09-19/lustre-mixed-el6-x86_64_283_-7f6a2ad2c9e0/parallel-scale.test_write_disjoint.*.1316557553.log
Resetting fail_loc on all nodes...done.



 Comments   
Comment by Jian Yu [ 23/Sep/11 ]

Lustre Clients:
Tag: 1.8.6-wc1
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32_131.2.1.el6)
Build: http://newbuild.whamcloud.com/job/lustre-b1_8/100/arch=x86_64,build_type=client,distro=el6,ib_stack=inkernel/
Network: TCP (1GigE)
ENABLE_QUOTA=yes

Lustre Servers:
Tag: v2_1_0_0_RC2
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32-131.6.1.el6_lustre)
Build: http://newbuild.whamcloud.com/job/lustre-master/283/arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel/

write_disjoint test passed in manual run: https://maloo.whamcloud.com/test_sets/af1b916c-e5bf-11e0-9909-52540025f9af

Comment by Andreas Dilger [ 30/May/13 ]

I just noticed in the "full" runs that test_write_disjoint is one of the few tests that is consistently failing, and this bug is listed as the cause.

The MPI_ABORT is not the cause of this problem, just a symptom. When write_disjoint detects a data consistency error it prints an error message and then calls MPI_Abort() to exit.

The real problem is that the output file was not being written correctly or the DLM locks are caching the file size incorrectly, resulting in an inconsistent file size reported to the application:

loop 90: chunk_size 62460, file size was 499680
rank 4, loop 91: invalid file size 203532 instead of 232608 = 29076 * 8
rank 2, loop 91: invalid file size 203532 instead of 232608 = 29076 * 8
rank 6, loop 91: invalid file size 203532 instead of 232608 = 29076 * 8
rank 0, loop 91: invalid file size 203532 instead of 232608 = 29076 * 8
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 

In this case MPI_ABORT is expected, and we need to find out why the test is failing. For good or bad, it seems like it fails virtually every test run, so it will hopefully not be too complex to debug. Almost certainly we would need to gather more debug logs from the client nodes (lctl set_param debug="+vfstrace +rpctrace +dlmtrace" at a minimum).

Comment by Andreas Dilger [ 06/Sep/13 ]

Duplicate of LU-3027, which has a landed patch.

Generated at Sat Feb 10 01:09:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.