[LU-413] performance-sanity test_8: rank 0: open(f124836) error: Input/output error Created: 14/Jun/11  Updated: 23/Apr/14  Resolved: 23/Apr/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0, Lustre 1.8.8, Lustre 1.8.6, Lustre 1.8.9
Fix Version/s: Lustre 2.1.0, Lustre 1.8.7

Type: Bug Priority: Blocker
Reporter: Jian Yu Assignee: Johann Lombardi (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Lustre Branch: v1_8_6_RC2
Lustre Build: http://newbuild.whamcloud.com/job/lustre-b1_8/80/
e2fsprogs Build: http://newbuild.whamcloud.com/job/e2fsprogs-master/40/
Distro/Arch: RHEL6/x86_64(patchless client, in-kernel OFED, kernel version: 2.6.32-131.2.1.el6)
RHEL5/x86_64(server, OFED 1.5.3.1, kernel version: 2.6.18-238.12.1.el5_lustre)


Severity: 3
Bugzilla ID: 23,206
Rank (Obsolete): 4972

 Description   

performance-sanity test_8 failed as follows:

===== mdsrate-stat-large.sh Test preparation: creating 125125 files.
+ /usr/lib64/lustre/tests/mdsrate --create --dir /mnt/lustre/mdsrate --nfiles 125125 --filefmt 'f%%d'
UUID                      Inodes       IUsed       IFree IUse% Mounted on
lustre-MDT0000_UUID       415069          50      415019   0% /mnt/lustre[MDT:0]
lustre-OST0000_UUID       125184          89      125095   0% /mnt/lustre[OST:0]
lustre-OST0001_UUID       125184          89      125095   0% /mnt/lustre[OST:1]
lustre-OST0002_UUID       125184          89      125095   0% /mnt/lustre[OST:2]
lustre-OST0003_UUID       125184          89      125095   0% /mnt/lustre[OST:3]
lustre-OST0004_UUID       125184          89      125095   0% /mnt/lustre[OST:4]
lustre-OST0005_UUID       125184          89      125095   0% /mnt/lustre[OST:5]

filesystem summary:       415069          50      415019   0% /mnt/lustre

+ chmod 0777 /mnt/lustre
drwxrwxrwx 5 root root 4096 Jun 13 13:41 /mnt/lustre
+ su mpiuser sh -c "/usr/lib64/openmpi/bin/mpirun  -np 2 -machinefile /tmp/mdsrate-stat-large.machines /usr/lib64/lustre/tests/mdsrate --create --dir /mnt/lustre/mdsrate --nfiles 125125 --filefmt 'f%%d' "
0: client-10-ib starting at Mon Jun 13 13:49:30 2011
rank 0: open(f124836) error: Input/output error
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 4468 on
node client-10-ib exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
rank 1: open(f124837) error: Input/output error
UUID                      Inodes       IUsed       IFree IUse% Mounted on
lustre-MDT0000_UUID       500096      124886      375210  25% /mnt/lustre[MDT:0]
lustre-OST0000_UUID       125184      124985         199 100% /mnt/lustre[OST:0]
lustre-OST0001_UUID       125184      125184           0 100% /mnt/lustre[OST:1]
lustre-OST0002_UUID       125184      125184           0 100% /mnt/lustre[OST:2]
lustre-OST0003_UUID       125184      123961        1223  99% /mnt/lustre[OST:3]
lustre-OST0004_UUID       125184      124633         551 100% /mnt/lustre[OST:4]
lustre-OST0005_UUID       125184      124377         807  99% /mnt/lustre[OST:5]

filesystem summary:       500096      124886      375210  25% /mnt/lustre

status        script            Total(sec) E(xcluded) S(low) 
------------------------------------------------------------------------------------
test-framework exiting on error
 performance-sanity test_8: @@@@@@ FAIL: test_8 failed with 1

Dmesg on the MDS node:

Lustre: DEBUG MARKER: ===== mdsrate-stat-large.sh Test preparation: creating 125125 files.
Lustre: 8659:0:(lov_qos.c:459:qos_shrink_lsm()) using fewer stripes for object 278662: old 6 new 5
Lustre: 8681:0:(lov_qos.c:459:qos_shrink_lsm()) using fewer stripes for object 278663: old 6 new 5
Lustre: 8663:0:(lov_qos.c:459:qos_shrink_lsm()) using fewer stripes for object 279300: old 6 new 5
Lustre: 8663:0:(lov_qos.c:459:qos_shrink_lsm()) Skipped 636 previous similar messages
Lustre: 8662:0:(lov_qos.c:459:qos_shrink_lsm()) using fewer stripes for object 280599: old 6 new 3
Lustre: 8662:0:(lov_qos.c:459:qos_shrink_lsm()) Skipped 1298 previous similar messages
LustreError: 8685:0:(mds_open.c:441:mds_create_objects()) error creating objects for inode 281132: rc = -5
LustreError: 8685:0:(mds_open.c:826:mds_finish_open()) mds_create_objects: rc = -5
LustreError: 8681:0:(mds_open.c:441:mds_create_objects()) error creating objects for inode 281132: rc = -5
LustreError: 8681:0:(mds_open.c:826:mds_finish_open()) mds_create_objects: rc = -5
Lustre: DEBUG MARKER: performance-sanity test_8: @@@@@@ FAIL: test_8 failed with 1

Dmesg on the OSS node:

Lustre: DEBUG MARKER: ===== mdsrate-stat-large.sh Test preparation: creating 125125 files.
LustreError: 25861:0:(filter.c:3449:filter_precreate()) create failed rc = -28
LustreError: 27807:0:(filter.c:3449:filter_precreate()) create failed rc = -28
LustreError: 27804:0:(filter.c:3449:filter_precreate()) create failed rc = -28
LustreError: 27804:0:(filter.c:3449:filter_precreate()) Skipped 2 previous similar messages
Lustre: DEBUG MARKER: performance-sanity test_8: @@@@@@ FAIL: test_8 failed with 1

Maloo report: https://maloo.whamcloud.com/test_sets/9b2e5a46-964f-11e0-9a27-52540025f9af

This is an known issue: bug 23206



 Comments   
Comment by Andreas Dilger [ 14/Jun/11 ]

I think a prime issue here is that the "mdsrate_inodes_available()" function is incorrectly assuming that it can create min(num_ost_objects) files across all of the OSTs with wide striping. The MDS does not allocate objects perfectly evenly to avoid waiting for slow OSTs.

I've uploaded http://review.whamcloud.com/#change,941 for master, and http://review.whamcloud.com/942 for b1_8.

It would also be a good idea to land the patch from bugzilla 23206 from Dmitry. That may also resolve the issue, but has a much higher risk and is not suitable for 1.8.6-RC.

Comment by Peter Jones [ 16/Jun/11 ]

Andreas has provided initial patches for this. We can reassign if another engineer takes over this effort

Comment by Build Master (Inactive) [ 30/Jun/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #192
LU-413 limit used inodes for performance tests

Oleg Drokin : fc73791d9bd7e71538a96f8700a8cca737598e1a
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Jun/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #192
LU-413 limit used inodes for performance tests

Oleg Drokin : fc73791d9bd7e71538a96f8700a8cca737598e1a
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Jun/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #192
LU-413 limit used inodes for performance tests

Oleg Drokin : fc73791d9bd7e71538a96f8700a8cca737598e1a
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Jun/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #192
LU-413 limit used inodes for performance tests

Oleg Drokin : fc73791d9bd7e71538a96f8700a8cca737598e1a
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Jun/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #192
LU-413 limit used inodes for performance tests

Oleg Drokin : fc73791d9bd7e71538a96f8700a8cca737598e1a
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Jun/11 ]

Integrated in lustre-master » i686,client,el5,ofa #192
LU-413 limit used inodes for performance tests

Oleg Drokin : fc73791d9bd7e71538a96f8700a8cca737598e1a
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Jun/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #192
LU-413 limit used inodes for performance tests

Oleg Drokin : fc73791d9bd7e71538a96f8700a8cca737598e1a
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Jun/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #192
LU-413 limit used inodes for performance tests

Oleg Drokin : fc73791d9bd7e71538a96f8700a8cca737598e1a
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Jun/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #192
LU-413 limit used inodes for performance tests

Oleg Drokin : fc73791d9bd7e71538a96f8700a8cca737598e1a
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Jun/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #192
LU-413 limit used inodes for performance tests

Oleg Drokin : fc73791d9bd7e71538a96f8700a8cca737598e1a
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Jun/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #192
LU-413 limit used inodes for performance tests

Oleg Drokin : fc73791d9bd7e71538a96f8700a8cca737598e1a
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Jun/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #192
LU-413 limit used inodes for performance tests

Oleg Drokin : fc73791d9bd7e71538a96f8700a8cca737598e1a
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Jun/11 ]

Integrated in lustre-master » i686,server,el5,ofa #192
LU-413 limit used inodes for performance tests

Oleg Drokin : fc73791d9bd7e71538a96f8700a8cca737598e1a
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Jun/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #192
LU-413 limit used inodes for performance tests

Oleg Drokin : fc73791d9bd7e71538a96f8700a8cca737598e1a
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 26/Jul/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,inkernel #113
LU-413 limit used inodes for performance tests

Johann Lombardi : 930243348131214ede3376790dbcdab50335d3ee
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 26/Jul/11 ]

Integrated in lustre-b1_8 » i686,client,el6,inkernel #113
LU-413 limit used inodes for performance tests

Johann Lombardi : 930243348131214ede3376790dbcdab50335d3ee
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 26/Jul/11 ]

Integrated in lustre-b1_8 » x86_64,client,el6,inkernel #113
LU-413 limit used inodes for performance tests

Johann Lombardi : 930243348131214ede3376790dbcdab50335d3ee
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 26/Jul/11 ]

Integrated in lustre-b1_8 » i686,client,el5,ofa #113
LU-413 limit used inodes for performance tests

Johann Lombardi : 930243348131214ede3376790dbcdab50335d3ee
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 26/Jul/11 ]

Integrated in lustre-b1_8 » x86_64,client,ubuntu1004,inkernel #113
LU-413 limit used inodes for performance tests

Johann Lombardi : 930243348131214ede3376790dbcdab50335d3ee
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 26/Jul/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,inkernel #113
LU-413 limit used inodes for performance tests

Johann Lombardi : 930243348131214ede3376790dbcdab50335d3ee
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 26/Jul/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,ofa #113
LU-413 limit used inodes for performance tests

Johann Lombardi : 930243348131214ede3376790dbcdab50335d3ee
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 26/Jul/11 ]

Integrated in lustre-b1_8 » i686,client,el5,inkernel #113
LU-413 limit used inodes for performance tests

Johann Lombardi : 930243348131214ede3376790dbcdab50335d3ee
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 26/Jul/11 ]

Integrated in lustre-b1_8 » i686,server,el5,inkernel #113
LU-413 limit used inodes for performance tests

Johann Lombardi : 930243348131214ede3376790dbcdab50335d3ee
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 26/Jul/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,ofa #113
LU-413 limit used inodes for performance tests

Johann Lombardi : 930243348131214ede3376790dbcdab50335d3ee
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 26/Jul/11 ]

Integrated in lustre-b1_8 » i686,server,el5,ofa #113
LU-413 limit used inodes for performance tests

Johann Lombardi : 930243348131214ede3376790dbcdab50335d3ee
Files :

  • lustre/tests/test-framework.sh
Comment by Andreas Dilger [ 26/Jul/11 ]

Landed to both master and b1_8.

Comment by Andreas Dilger [ 26/Jul/11 ]

Reopen issue, because it is also tracking landing of bug 23206 patch from bugzilla.

Assign to Johann for further reassignment.

Comment by Jian Yu [ 16/May/12 ]

Lustre Tag: v1_8_8_WC1_RC1
Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/195/
Distro/Arch: RHEL5.8/x86_64 (kernel version: 2.6.18-308.4.1.el5)
Network: TCP (1GigE)
ENABLE_QUOTA=yes

performance-sanity test_8 failed with the same issue:
https://maloo.whamcloud.com/test_sets/58f9a3c8-9d1b-11e1-8587-52540035b04c

Comment by Jian Yu [ 16/May/12 ]

Lustre Tag: v1_8_8_WC1_RC1
Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/195/
Distro/Arch: RHEL5.8/x86_64(server), RHEL6.2/x86_64(client)
Network: TCP (1GigE)
ENABLE_QUOTA=yes

compilebench in parallel-scale-nfsv

{3,4}

tests also failed with this issue:
https://maloo.whamcloud.com/test_sets/bce1ce86-9b8d-11e1-a0a0-52540035b04c
https://maloo.whamcloud.com/test_sets/07c83eac-9b8f-11e1-a0a0-52540035b04c

Comment by Jian Yu [ 31/May/12 ]

Lustre client: 1.8.8-wc1
Lustre server: v2_1_2_RC2

performance-sanity test_8 failed with the same issue:
https://maloo.whamcloud.com/test_sets/502c3ec4-aa84-11e1-bd84-52540035b04c
https://maloo.whamcloud.com/test_sets/b2ccef9a-aa59-11e1-971d-52540035b04c

Comment by Vladimir V. Saveliev [ 18/Sep/12 ]

http://review.whamcloud.com/4025

Comment by Jian Yu [ 14/Feb/13 ]

Lustre Tag: v1_8_9_WC1_RC1
Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/256
Distro/Arch: RHEL5.9/x86_64(server), RHEL6.3/x86_64(client)
Network: TCP (1GigE)
ENABLE_QUOTA=yes

The compilebench test in parallel-scale

{,-nfsv3,-nfsv4}

.sh all hit the same issue:
https://maloo.whamcloud.com/test_sets/2e38af52-7683-11e2-bc2f-52540035b04c
https://maloo.whamcloud.com/test_sets/bae3afec-7683-11e2-bc2f-52540035b04c
https://maloo.whamcloud.com/test_sets/1b49d334-7684-11e2-bc2f-52540035b04c

Comment by Bruno Faccini (Inactive) [ 14/Feb/13 ]

Failures come from -28/ENOSPC errors during filter_precreate() on OSTs, so Compilebench tests must self-protect vs number of inodes available, like already done for performance-sanity.

Also I see in ticket history that compilebench/parallel-scale-nfsv

{3,4}

tests already failed in May 2012, do we remember what was done already ? Some cleanup before, OSTs number/inode-number change ?

On the other hand, if we want to do the same pre-check in our compilebench-based tests, we need to be able to evaluate its file number/consumption, but I don't find it in our sources, does it come from public benchmark sources ?

Comment by Jian Yu [ 20/Feb/13 ]

Hi Bruno,

FYI...

The original compilebench source is in https://oss.oracle.com/~mason/compilebench/.
We build it on Jenkins under http://build.whamcloud.com/job/toolkit/.
It's wrapped in auster test suite under run_compilebench() in lustre/tests/functions.sh.

In addition, I found Vladimir Saveliev uploaded a patch in http://review.whamcloud.com/4025.

Comment by Jian Yu [ 20/Feb/13 ]

Lustre Tag: v1_8_9_WC1_RC2
Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/258
Distro/Arch: RHEL5.9/x86_64(server), RHEL6.3/x86_64(client)
Network: IB (in-kernel OFED)
ENABLE_QUOTA=yes

The compilebench test in parallel-scale

{,-nfsv3,-nfsv4}

.sh all hit the same issue:
https://maloo.whamcloud.com/test_sets/a770271e-7b6b-11e2-a4de-52540035b04c
https://maloo.whamcloud.com/test_sets/e4a40466-7b6b-11e2-a4de-52540035b04c
https://maloo.whamcloud.com/test_sets/02613f96-7b6c-11e2-a4de-52540035b04c

Generated at Sat Feb 10 01:06:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.