[LU-6826] sanity test_71: No space left on device Created: 09/Jul/15  Updated: 07/Oct/16  Resolved: 10/Sep/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Critical
Reporter: Maloo Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: dne2
Environment:

server and client: lustre-master build # 3094 RHEL7


Issue Links:
Related
is related to LU-6831 The ticket for tracking all DNE2 bugs Reopened
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/d8d84160-25eb-11e5-866a-5254006e85c2.

The sub-test test_71 failed with the following error:

dbench failed!

test log

== sanity test 71: Running dbench on lustre (don't segment fault) ==================================== 21:00:19 (1436389219)
copying /usr/share/dbench/client.txt to /mnt/lustre/d71.sanity/client.txt
copying necessary libs to /mnt/lustre/d71.sanity
lib64/libpopt.so.0
tar: lib64: Cannot mkdir: No space left on device
tar: lib64/libpopt.so.0: Cannot open: No such file or directory
lib64/libc.so.6
tar: lib64: Cannot mkdir: No space left on device
tar: lib64/libc.so.6: Cannot open: No such file or directory
lib64/ld-linux-x86-64.so.2
tar: lib64: Cannot mkdir: No space left on device
tar: lib64/ld-linux-x86-64.so.2: Cannot open: No such file or directory
tar: Exiting with failure status due to previous errors
status        script            Total(sec) E(xcluded) S(low) 
------------------------------------------------------------------------------------

test-framework exiting on error
 sanity test_71: @@@@@@ FAIL: dbench failed! 


 Comments   
Comment by Sarah Liu [ 15/Jul/15 ]

another instance hit in RHEL6.6 server/client:

https://testing.hpdd.intel.com/test_sets/a4e56756-2696-11e5-8b33-5254006e85c2

Comment by Oleg Drokin [ 16/Jul/15 ]

Ok, this problem actually does not have anything to do with out of space, but it's just DNE code allowing setdirstripe for more MDTs than we have in the system.
And then it fails mkdirs there:

[root@centos6-9 tests]# mkdir /mnt/lustre/test/dir
[root@centos6-9 tests]# rm -rf /mnt/lustre/test
[root@centos6-9 tests]# mkdir /mnt/lustre/test
[root@centos6-9 tests]# ../utils/lfs setdirstripe -D -c1 /mnt/lustre/test
[root@centos6-9 tests]# mkdir /mnt/lustre/test/dir
mkdir: cannot create directory `/mnt/lustre/test/dir': No space left on device

note that -CX where X is more than 1 are also accepted.

And coincidentally that's how test_71 looks like:

test_71() {
        test_mkdir -p $DIR/$tdir
        $LFS setdirstripe -D -c$MDSCOUNT $DIR/$tdir
        sh rundbench -C -D $DIR/$tdir 2 || error "dbench failed!"
}
run_test 71 "Running dbench on lustre (don't segment fault) ===="

So for non-DNE testcase this totally breaks the directory with such striping added

Comment by Andreas Dilger [ 28/Aug/15 ]

The "-c1" case should be handled internally by the DNE kernel code to just create a non-striped directory. Doing anything else doesn't make sense.

Comment by Gerrit Updater [ 29/Aug/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/16130
Subject: LU-6826 lod: validate stripe_count and offset
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8e79c7c47dfc6e5a6230c440a88a8b26fa1cfe37

Comment by Gerrit Updater [ 10/Sep/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16130/
Subject: LU-6826 lod: validate stripe_count and offset
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 46ebfdd558dbe57db6cf51351246ca81bd38e4c9

Comment by Joseph Gmitter (Inactive) [ 10/Sep/15 ]

Landed for 2.8.0

Generated at Sat Feb 10 02:03:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.