[LU-9826] conf-sanity: test 32b failed with 1 Created: 03/Aug/17  Updated: 24/Aug/17  Resolved: 17/Aug/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.1, Lustre 2.11.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Sarah Liu
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-9888 conf-sanity test_32b: test 32b failed... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Jinshan Xiong <jinshan.xiong@intel.com>

Please provide additional information about the failure here.

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/b60d81b2-770c-11e7-8db2-5254006e85c2.

Detail message about the failure

unlink(/tmp/t32/mnt/lustre/t32_qf_1) error: No such file or directory
total: 1 unlinks in 0 seconds: inf unlinks/second
CMD: trevis-5vm12 test -f /tmp/t32/list
CMD: trevis-5vm12 test -f /tmp/t32/list2
/tmp/t32/mnt/lustre/striped_dir_old /usr/lib64/lustre/tests
CMD: trevis-5vm12 cat /tmp/t32/list2
/usr/lib64/lustre/tests
CMD: trevis-5vm12 /usr/sbin/lctl set_param -n osd*.*.force_sync=1
dd: failed to open '/tmp/t32/mnt/lustre/tmp_file': No space left on device
 conf-sanity test_32b: @@@@@@ FAIL: dd failed 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4968:error_noexit()
  = /usr/lib64/lustre/tests/conf-sanity.sh:2140:t32_test()
  = /usr/lib64/lustre/tests/conf-sanity.sh:2349:test_32b()
  = /usr/lib64/lustre/tests/test-framework.sh:5256:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:5295:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:5142:run_test()
  = /usr/lib64/lustre/tests/conf-sanity.sh:2353:main()


 Comments   
Comment by Jinshan Xiong (Inactive) [ 03/Aug/17 ]

it failed at:

                sync; sleep 5; sync                                             
                $r $LCTL set_param -n osd*.*.force_sync=1                       
                dd if=/dev/zero of=$tmp/mnt/lustre/tmp_file bs=10k count=10

The log said:

00000004:00000001:0.0:1501803264.544507:0:21733:0:(osp_dev.c:738:osp_statfs()) Process entered
00000004:00001000:0.0:1501803264.544508:0:21733:0:(osp_dev.c:774:osp_statfs()) t32fs-OST0000-osc-MDT0000: 20128 blocks, 5504 free, 32 avail, 1836 files, 1349 free files
00000004:00000001:0.0:1501803264.544509:0:21733:0:(osp_dev.c:775:osp_statfs()) Process leaving (rc=0 : 0 : 0)
00020000:00000001:0.0:1501803264.544509:0:21733:0:(lod_qos.c:205:lod_statfs_and_check()) Process leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4)
00020000:00000001:0.0:1501803264.544510:0:21733:0:(lod_qos.c:296:lod_qos_statfs_update()) Process leaving
00020000:00000010:0.0:1501803264.544511:0:21733:0:(lod_qos.c:2067:lod_qos_prep_create()) kmalloced 'stripe': 8 at ffff8800a8dec110.
00020000:00001000:0.0:1501803264.544512:0:21733:0:(lod_qos.c:2074:lod_qos_prep_create()) tgt_count 1 stripenr 1
00020000:00000001:0.0:1501803264.544512:0:21733:0:(lod_qos.c:1443:lod_alloc_qos()) Process entered
00020000:00000001:0.0:1501803264.544513:0:21733:0:(lod_qos.c:1464:lod_alloc_qos()) Process leaving via out_nolock (rc=18446744073709551605 : -11 : 0xfffffffffffffff5)

And:

00002000:00000024:1.0:1501803265.168716:0:23940:0:(ofd_obd.c:740:ofd_statfs()) blocks cached 0 granted 20447232 pending 0 free 22544384 avail 20447232
00002000:00000020:1.0:1501803265.168717:0:23940:0:(tgt_grant.c:150:tgt_check_export_grants()) t32fs-OST0000: cli 07f7f756-bbb3-3f62-5432-0eeecfceef6b/ffff8801225b6400 dirty 0 pend 0 grant 131072
00002000:00000020:1.0:1501803265.168719:0:23940:0:(tgt_grant.c:150:tgt_check_export_grants()) t32fs-OST0000: cli t32fs-MDT0000-mdtlov_UUID/ffff880132ffac00 dirty 0 pend 0 grant 0
00002000:00000020:1.0:1501803265.168720:0:23940:0:(tgt_grant.c:150:tgt_check_export_grants()) t32fs-OST0000: cli t32fs-MDT0001-mdtlov_UUID/ffff880127b2b400 dirty 0 pend 0 grant 0
00002000:00000020:1.0:1501803265.168721:0:23940:0:(tgt_grant.c:144:tgt_check_export_grants()) t32fs-OST0000: processing self export: 20316160 0 0
00002000:00000020:1.0:1501803265.168722:0:23940:0:(tgt_grant.c:150:tgt_check_export_grants()) t32fs-OST0000: cli t32fs-OST0000_UUID/ffff880063d06000 dirty 0 pend 0 grant 20316160
00002000:00000020:1.0:1501803265.168723:0:23940:0:(ofd_obd.c:760:ofd_statfs()) 629 blocks: 172 free, 1 avail; 1836 objects: 1349 free; state 0
00002000:00000001:1.0:1501803265.168724:0:23940:0:(ofd_obd.c:794:ofd_statfs()) Process leaving

The OFD said it has granted most of spaces to client, but it seems not to be true:

[root@centos7 ~]# cat /proc/fs/lustre/osc/t32fs-OST0000-osc-ffff880131f58000/cur_grant_bytes 
0

So when it failed, it thought there is no space available at the corresponding OST. However, it seems there are still plenty of space out there:

UUID                   1K-blocks        Used   Available Use% Mounted on
t32fs-MDT0000_UUID         80512        5376       73088   7% /tmp/t32/mnt/lustre[MDT:0]
t32fs-MDT0001_UUID         80640        5120       73472   7% /tmp/t32/mnt/lustre[MDT:1]
t32fs-OST0000_UUID         80512       58496       19968  75% /tmp/t32/mnt/lustre[OST:0]

filesystem_summary:        80512       58496       19968  75% /tmp/t32/mnt/lustre

[root@centos7 ~]# zpool list -v t32fs-ost1
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
t32fs-ost1   190M  57.4M   133M         -    26%    30%  1.00x  ONLINE  -
  /tmp/t32/ost   190M  57.4M   133M         -    26%    30%

I guess this would be a BUG about how statfs works on osd-zfs.

Comment by Joseph Gmitter (Inactive) [ 04/Aug/17 ]

Alex,

Do you have any opinion on this?

Thanks.
Joe

Comment by Alex Zhuravlev [ 04/Aug/17 ]

will try to check locally..

Comment by Gerrit Updater [ 07/Aug/17 ]

James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/28408
Subject: LU-9826 tests: Do not run con-sanity 32b with ZFS
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 91b437c635de5c5421964c40005e49aaa1cdca0e

Comment by Alex Zhuravlev [ 08/Aug/17 ]

disk2_7-zfs.tar.bz2 seems to be containing a very small filesystem:
Upgrading from disk2_7-zfs.tar.bz2, created with:
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
t32fs-mdt1 190M 6.26M 184M - 11% 3% 1.00x ONLINE -
t32fs-ost1 190M 57.7M 132M - 24% 30% 1.00x ONLINE -
Filesystem 1K-blocks Used Available Use% Mounted on
t32fs-ost1/ost1 80512 58496 19968 75% /tmp/t32/mnt/ost
virtual image:
rw------ 1 root root 204800000 Dec 18 2015 ost

Upgrading from disk2_4-zfs.tar.bz2, created with:
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
t32fs-mdt1 190M 4.50M 186M - - 2% 1.00x ONLINE -
t32fs-ost1 288M 20.0M 268M - - 6% 1.00x ONLINE -
Filesystem 1K-blocks Used Available Use% Mounted on
t32fs-ost1/ost1 146944 19712 125184 14% /tmp/t32/mnt/ost
virtual image:
rw------ 1 root root 307200000 Dec 1 2015 ost

i.e. 2.4 disk image is ~100MB larger.

Comment by Alex Zhuravlev [ 08/Aug/17 ]

also, from what I heard before ZFS-0.7 reserves more space for remove operations. I think it's a possible reason why we can't pass with 2.7 image using ZFS-0.7

Comment by Jinshan Xiong (Inactive) [ 08/Aug/17 ]

Alex - do you have a theory to explain why the granted bytes on the OFD doesn't match the number on the client side? I guess that would be the actual problem.

Comment by Alex Zhuravlev [ 08/Aug/17 ]

because we take a grant for precreates as well.

Comment by Sarah Liu [ 08/Aug/17 ]

Hello,

If recreate the image, how big the disk image is recommended? Besides 2.7, is there any other image needs update?

Comment by Alex Zhuravlev [ 08/Aug/17 ]

Sarah, 2.4 image is 300MB and the test works fine with that image.

Comment by Sarah Liu [ 08/Aug/17 ]

Thanks Alex, I will recreate the 2.7 one and update this ticket.

Comment by Gerrit Updater [ 09/Aug/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28408/
Subject: LU-9826 tests: Do not run conf-sanity 32b with ZFS
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d564becbe58f7fb349cbe19a7f23e447278310d5

Comment by Joseph Gmitter (Inactive) [ 09/Aug/17 ]

Assigning to Sarah for the recreation of the 2.7 image.

Comment by Minh Diep [ 09/Aug/17 ]

Landed in 2.11

Comment by James Nunez (Inactive) [ 09/Aug/17 ]

Reopening because Sarah is working on a fix/ updating the disk image for ZFS 0.7.0 (probably 0.7.1).

Comment by Gerrit Updater [ 10/Aug/17 ]

Wei Liu (wei3.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/28464
Subject: LU-9826 test: Recreate disk2_7-zfs image to have a bigger file system
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f2cdd71f3bb9c1885172089e0afd9b534eb5fb4b

Comment by Gerrit Updater [ 11/Aug/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28504
Subject: LU-9826 tests: Do not run conf-sanity 32b with ZFS
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 773715574a41b3f57c68f8b31ca61a4e8f9e4b30

Comment by Gerrit Updater [ 16/Aug/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28568
Subject: LU-9826 test: Recreate disk2_7-zfs image to have a bigger file system
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 191a90dbc8563204b05fea50a91408d4d95bcccc

Comment by Gerrit Updater [ 17/Aug/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28464/
Subject: LU-9826 test: Recreate disk2_7-zfs image to have a bigger file system
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6ab9866730c1bd2344869502586fc322f6ed7b71

Comment by Peter Jones [ 17/Aug/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 18/Aug/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28504/
Subject: LU-9826 tests: Do not run conf-sanity 32b with ZFS
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: ee0b6826d606be8d7edeb2210a87ba27d8ff76cc

Generated at Sat Feb 10 02:29:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.