[LU-6813] sanity-benchmark test_iozone: iozone (1) failed Created: 08/Jul/15  Updated: 12/May/16  Resolved: 23/Sep/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Critical
Reporter: Maloo Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File iozone.lctl.tar     Text File iozone_dk.log     File lu6813-good-bad-log.tgz    
Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/37318204-250a-11e5-8009-5254006e85c2.

The sub-test test_iozone failed with the following error:

iozone (1) failed

test log. Hit this problem in multiple configs

stride                                   
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         3847216     512

Sanity check failed. Do not deploy this filesystem in a production environment !
 sanity-benchmark test_iozone: @@@@@@ FAIL: iozone (1) failed 

MDS console, it looks like there is not enough space

18:15:50:Lustre: DEBUG MARKER: == sanity-benchmark test iozone: iozone ============================================================== 11:04:05 (1436205845)
18:15:51:Lustre: DEBUG MARKER: /usr/sbin/lctl mark min OST has 1846488kB available, using 3847216kB file size
18:15:51:Lustre: DEBUG MARKER: min OST has 1846488kB available, using 3847216kB file size
18:15:51:Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-benchmark test_iozone: @@@@@@ FAIL: iozone \(1\) failed 
18:15:51:Lustre: DEBUG MARKER: sanity-benchmark test_iozone: @@@@@@ FAIL: iozone (1) failed
18:15:51:Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /logdir/test_logs/2015-07-06/lustre-master-el6_6-x86_64-vs-lustre-master-sles11sp3-x86_64--full--2_12_1__3092__-70321095799900-131112/sanity-benchmark.test_iozone.debug_log.$(hostname -s).1436205847.log;


 Comments   
Comment by Andreas Dilger [ 08/Jul/15 ]

I don't see any reports on the MDS or OSS logs that relate to out of space. It looks like this might be a data corruption issue, since there aren't errors in any of the logs. The message min OST has 1846488kB available, using 3847216kB file size is just reporting how large the iozone file is. The file is striped across all OSTs, so it shouldn't consume all of the space on even the smallest OST, since each OST will get only 1/7 of the file data (549602KB, 29% of the free space).

Comment by Vinayak (Inactive) [ 07/Sep/15 ]

Hello Andreas,

We have also faced this issue much frequently in our testing. I am attaching the logs to this ticket. Please let me know if you need any extra information from our side.

Comment by Zhenyu Xu [ 15/Sep/15 ]

found this with strace

open("/mnt/lustre/d0.iozone/iozone", O_WRONLY|O_CREAT, 0) = 3
stat("/mnt/lustre/d0.iozone/iozone",

{st_mode=S_IFREG, st_size=0, ...}

) = 0
ftruncate(3, 0) = -1 EACCES (Permission denied)
write(1, "\n\nSanity check failed. Do not de"..., 82

Comment by Zhenyu Xu [ 15/Sep/15 ]

the iozone test file is created by open with mode 0100000(S_IFREG), no WRX mode in it.

00000004:00000002:1.0:1442290645.210840:0:15269:0:(mdt_open.c:1226:mdt_reint_open()) I am going to open [0x200000401:0x11:0x0]/(iozone->[0x200000401:0x15:0x0]) cr_flag=0102 mode=0100000 msg_flag=0x0

Comment by Zhenyu Xu [ 15/Sep/15 ]

commit c8d5aa14e50be2a85491783f169a8f4e646b9594 changed the object create mode logic.

Wang Di, can you give it a look? In your commit, the MDS does not use its umask to create object and client does not pass the object's mode in the RPC, and client is depending on MDS to set the new object's mode with its umask.

Comment by Zhenyu Xu [ 15/Sep/15 ]

client create the file with strange mode

00000080:00000001:1.0:1442292147.237309:0:28484:0:(namei.c:526:ll_lookup_it()) Process entered
00000080:00200000:1.0:1442292147.237309:0:28484:0:(namei.c:533:ll_lookup_it()) VFS Op:name=iozone, dir=[0x200000401:0x1:0x0](ffff880016c0bc40), intent=open|creat
00000080:00000010:1.0:1442292147.237311:0:28484:0:(llite_lib.c:2436:ll_prep_md_op_data()) kmalloced 'op_data': 312 at ffff8800286e1a00.
00000080:00020000:1.0:1442292147.237312:0:28484:0:(namei.c:561:ll_lookup_it()) create mode 0100000

while normal touch create a file with normal mode

00000080:00000001:0.0:1442298104.737478:0:28661:0:(namei.c:526:ll_lookup_it()) Process entered
00000080:00200000:0.0:1442298104.737478:0:28661:0:(namei.c:533:ll_lookup_it()) VFS Op:name=touchfile, dir=[0x200000401:0x1:0x0](ffff880016c0bc40), intent=open|creat
00000080:00000010:0.0:1442298104.737480:0:28661:0:(llite_lib.c:2436:ll_prep_md_op_data()) kmalloced 'op_data': 312 at ffff8800352f2400.
00000080:00020000:0.0:1442298104.737482:0:28661:0:(namei.c:561:ll_lookup_it()) create mode 0100666
Comment by Di Wang [ 15/Sep/15 ]
Wang Di, can you give it a look? In your commit, the MDS does not use its umask to create object and client does not pass the object's mode in the RPC,

Hmm, mdd_acl_init do use MDS's umask to fix the mode. Thanks.

Comment by Zhenyu Xu [ 15/Sep/15 ]

the strace iozone shows that iozone create the testing file "iozone" without setting its file mode

open("/mnt/lustre/d0.iozone/iozone", O_WRONLY|O_CREAT, 0) = 3

And client pack its create mode as 0100000, no file mode in it.

In mdd_acl_init(), la->la_mode is still just 0100000, and the file is created with empty mode, so that truncate failed permission checking.

Comment by Zhenyu Xu [ 16/Sep/15 ]

$ git bisect bad
6acf93339ad3297f2e5c659f2269c05df6198f74 is the first bad commit
commit 6acf93339ad3297f2e5c659f2269c05df6198f74
Author: Jinshan Xiong <jinshan.xiong@intel.com>
Date: Sun Jun 21 14:29:20 2015 -0400

LU-5823 llite: Remove access of stripe in ll_setattr_raw

Comment by Zhenyu Xu [ 17/Sep/15 ]

wrote a open then ftruncate test code, run it as a normal user, gathered the logs w/ and w/o the 6acf9333 patch.

Comment by Gerrit Updater [ 17/Sep/15 ]

Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/16462
Subject: LU-6813 llite: omit to update wire data
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6d32529f5c853f8a2e800fc588dc396be9b07405

Comment by Vinayak (Inactive) [ 21/Sep/15 ]

We ran the test_iozone on the patch submitted by Bobi Jam and the test was successful.

	File stride size set to 17 * record size.
                                                            random  random    bkwd   record   stride                                   
              kB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
          926070     512   40674   66809   161970   151712  164405   61944                                                          

iozone test complete.
debug=0x33f0484
Resetting fail_loc on all nodes...done.
PASS iozone (103s)
Comment by Gerrit Updater [ 22/Sep/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16462/
Subject: LU-6813 llite: omit to update wire data
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b937be892a3dc68dd2fe3f248608937a8a79d424

Comment by Peter Jones [ 23/Sep/15 ]

Landed for 2.8

Generated at Sat Feb 10 02:03:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.