[LU-9826] conf-sanity: test 32b failed with 1 Created: 03/Aug/17 Updated: 24/Aug/17 Resolved: 17/Aug/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.10.1, Lustre 2.11.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Sarah Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was created by maloo for Jinshan Xiong <jinshan.xiong@intel.com> Please provide additional information about the failure here. This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/b60d81b2-770c-11e7-8db2-5254006e85c2. Detail message about the failure unlink(/tmp/t32/mnt/lustre/t32_qf_1) error: No such file or directory total: 1 unlinks in 0 seconds: inf unlinks/second CMD: trevis-5vm12 test -f /tmp/t32/list CMD: trevis-5vm12 test -f /tmp/t32/list2 /tmp/t32/mnt/lustre/striped_dir_old /usr/lib64/lustre/tests CMD: trevis-5vm12 cat /tmp/t32/list2 /usr/lib64/lustre/tests CMD: trevis-5vm12 /usr/sbin/lctl set_param -n osd*.*.force_sync=1 dd: failed to open '/tmp/t32/mnt/lustre/tmp_file': No space left on device conf-sanity test_32b: @@@@@@ FAIL: dd failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4968:error_noexit() = /usr/lib64/lustre/tests/conf-sanity.sh:2140:t32_test() = /usr/lib64/lustre/tests/conf-sanity.sh:2349:test_32b() = /usr/lib64/lustre/tests/test-framework.sh:5256:run_one() = /usr/lib64/lustre/tests/test-framework.sh:5295:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:5142:run_test() = /usr/lib64/lustre/tests/conf-sanity.sh:2353:main() |
| Comments |
| Comment by Jinshan Xiong (Inactive) [ 03/Aug/17 ] |
|
it failed at: sync; sleep 5; sync
$r $LCTL set_param -n osd*.*.force_sync=1
dd if=/dev/zero of=$tmp/mnt/lustre/tmp_file bs=10k count=10
The log said: 00000004:00000001:0.0:1501803264.544507:0:21733:0:(osp_dev.c:738:osp_statfs()) Process entered 00000004:00001000:0.0:1501803264.544508:0:21733:0:(osp_dev.c:774:osp_statfs()) t32fs-OST0000-osc-MDT0000: 20128 blocks, 5504 free, 32 avail, 1836 files, 1349 free files 00000004:00000001:0.0:1501803264.544509:0:21733:0:(osp_dev.c:775:osp_statfs()) Process leaving (rc=0 : 0 : 0) 00020000:00000001:0.0:1501803264.544509:0:21733:0:(lod_qos.c:205:lod_statfs_and_check()) Process leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4) 00020000:00000001:0.0:1501803264.544510:0:21733:0:(lod_qos.c:296:lod_qos_statfs_update()) Process leaving 00020000:00000010:0.0:1501803264.544511:0:21733:0:(lod_qos.c:2067:lod_qos_prep_create()) kmalloced 'stripe': 8 at ffff8800a8dec110. 00020000:00001000:0.0:1501803264.544512:0:21733:0:(lod_qos.c:2074:lod_qos_prep_create()) tgt_count 1 stripenr 1 00020000:00000001:0.0:1501803264.544512:0:21733:0:(lod_qos.c:1443:lod_alloc_qos()) Process entered 00020000:00000001:0.0:1501803264.544513:0:21733:0:(lod_qos.c:1464:lod_alloc_qos()) Process leaving via out_nolock (rc=18446744073709551605 : -11 : 0xfffffffffffffff5) And: 00002000:00000024:1.0:1501803265.168716:0:23940:0:(ofd_obd.c:740:ofd_statfs()) blocks cached 0 granted 20447232 pending 0 free 22544384 avail 20447232 00002000:00000020:1.0:1501803265.168717:0:23940:0:(tgt_grant.c:150:tgt_check_export_grants()) t32fs-OST0000: cli 07f7f756-bbb3-3f62-5432-0eeecfceef6b/ffff8801225b6400 dirty 0 pend 0 grant 131072 00002000:00000020:1.0:1501803265.168719:0:23940:0:(tgt_grant.c:150:tgt_check_export_grants()) t32fs-OST0000: cli t32fs-MDT0000-mdtlov_UUID/ffff880132ffac00 dirty 0 pend 0 grant 0 00002000:00000020:1.0:1501803265.168720:0:23940:0:(tgt_grant.c:150:tgt_check_export_grants()) t32fs-OST0000: cli t32fs-MDT0001-mdtlov_UUID/ffff880127b2b400 dirty 0 pend 0 grant 0 00002000:00000020:1.0:1501803265.168721:0:23940:0:(tgt_grant.c:144:tgt_check_export_grants()) t32fs-OST0000: processing self export: 20316160 0 0 00002000:00000020:1.0:1501803265.168722:0:23940:0:(tgt_grant.c:150:tgt_check_export_grants()) t32fs-OST0000: cli t32fs-OST0000_UUID/ffff880063d06000 dirty 0 pend 0 grant 20316160 00002000:00000020:1.0:1501803265.168723:0:23940:0:(ofd_obd.c:760:ofd_statfs()) 629 blocks: 172 free, 1 avail; 1836 objects: 1349 free; state 0 00002000:00000001:1.0:1501803265.168724:0:23940:0:(ofd_obd.c:794:ofd_statfs()) Process leaving The OFD said it has granted most of spaces to client, but it seems not to be true: [root@centos7 ~]# cat /proc/fs/lustre/osc/t32fs-OST0000-osc-ffff880131f58000/cur_grant_bytes 0 So when it failed, it thought there is no space available at the corresponding OST. However, it seems there are still plenty of space out there: UUID 1K-blocks Used Available Use% Mounted on t32fs-MDT0000_UUID 80512 5376 73088 7% /tmp/t32/mnt/lustre[MDT:0] t32fs-MDT0001_UUID 80640 5120 73472 7% /tmp/t32/mnt/lustre[MDT:1] t32fs-OST0000_UUID 80512 58496 19968 75% /tmp/t32/mnt/lustre[OST:0] filesystem_summary: 80512 58496 19968 75% /tmp/t32/mnt/lustre [root@centos7 ~]# zpool list -v t32fs-ost1 NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT t32fs-ost1 190M 57.4M 133M - 26% 30% 1.00x ONLINE - /tmp/t32/ost 190M 57.4M 133M - 26% 30% I guess this would be a BUG about how statfs works on osd-zfs. |
| Comment by Joseph Gmitter (Inactive) [ 04/Aug/17 ] |
|
Alex, Do you have any opinion on this? Thanks. |
| Comment by Alex Zhuravlev [ 04/Aug/17 ] |
|
will try to check locally.. |
| Comment by Gerrit Updater [ 07/Aug/17 ] |
|
James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/28408 |
| Comment by Alex Zhuravlev [ 08/Aug/17 ] |
|
disk2_7-zfs.tar.bz2 seems to be containing a very small filesystem: Upgrading from disk2_4-zfs.tar.bz2, created with: i.e. 2.4 disk image is ~100MB larger. |
| Comment by Alex Zhuravlev [ 08/Aug/17 ] |
|
also, from what I heard before ZFS-0.7 reserves more space for remove operations. I think it's a possible reason why we can't pass with 2.7 image using ZFS-0.7 |
| Comment by Jinshan Xiong (Inactive) [ 08/Aug/17 ] |
|
Alex - do you have a theory to explain why the granted bytes on the OFD doesn't match the number on the client side? I guess that would be the actual problem. |
| Comment by Alex Zhuravlev [ 08/Aug/17 ] |
|
because we take a grant for precreates as well. |
| Comment by Sarah Liu [ 08/Aug/17 ] |
|
Hello, If recreate the image, how big the disk image is recommended? Besides 2.7, is there any other image needs update? |
| Comment by Alex Zhuravlev [ 08/Aug/17 ] |
|
Sarah, 2.4 image is 300MB and the test works fine with that image. |
| Comment by Sarah Liu [ 08/Aug/17 ] |
|
Thanks Alex, I will recreate the 2.7 one and update this ticket. |
| Comment by Gerrit Updater [ 09/Aug/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28408/ |
| Comment by Joseph Gmitter (Inactive) [ 09/Aug/17 ] |
|
Assigning to Sarah for the recreation of the 2.7 image. |
| Comment by Minh Diep [ 09/Aug/17 ] |
|
Landed in 2.11 |
| Comment by James Nunez (Inactive) [ 09/Aug/17 ] |
|
Reopening because Sarah is working on a fix/ updating the disk image for ZFS 0.7.0 (probably 0.7.1). |
| Comment by Gerrit Updater [ 10/Aug/17 ] |
|
Wei Liu (wei3.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/28464 |
| Comment by Gerrit Updater [ 11/Aug/17 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28504 |
| Comment by Gerrit Updater [ 16/Aug/17 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28568 |
| Comment by Gerrit Updater [ 17/Aug/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28464/ |
| Comment by Peter Jones [ 17/Aug/17 ] |
|
Landed for 2.11 |
| Comment by Gerrit Updater [ 18/Aug/17 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28504/ |