[LU-13753] sanityn test_51b: 'file size is 1024, should be 3145728' Created: 07/Jul/20  Updated: 13/Sep/23  Resolved: 27/Aug/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Qian Yingjin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-10934 integrate statx() API with Lustre Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for liuying <emoly.liu@intel.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/abe98c66-5293-41c4-a72b-c317b11bb2e2

test_51b failed with the following error:

== sanityn test 51b: layout lock: glimpse should be able to restart if layout changed ================ 12:39:07 (1594039147)
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 0.00191303 s, 535 kB/s
fail_loc=0x1404
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 1.74443 s, 601 kB/s
1024
 sanityn test_51b: @@@@@@ FAIL: file size is 1024, should be 3145728 

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanityn test_51b - file size is 1024, should be 3145728



 Comments   
Comment by Chris Horn [ 07/Jul/20 ]

+1 on master: https://testing.whamcloud.com/test_sessions/23b5849c-c8b2-4516-883c-5198009482b8

Comment by Andreas Dilger [ 29/Jul/20 ]

+1 on master https://testing.whamcloud.com/test_sets/fe7f9404-be7c-43d7-8915-6a9db3002a43

Comment by James Nunez (Inactive) [ 29/Jul/20 ]

It looks like the majority of sanityn test 51b failures is for DNE/ZFS testing and started on 28 MAY 2020, but we do have some failures that are ZFS with no DNE like https://testing.whamcloud.com/test_sets/7e857919-f4e3-49fc-abda-4d67275bf425.

This test is failing ~20% of the time in autotest patch/review testing.

The early failures were attributed to ticket LU-10934.

Comment by John Hammond [ 29/Jul/20 ]

qian_wc This test was modified by https://review.whamcloud.com/36674 LU-10934 llite: integrate statx() API with Lustre. Could you revert the changes to the test and ensure that it passes consistently. In general we should not rewrite tests in this way. It makes it difficult to analyze test results over time. Instead please add a new test based on the old test. Once the test issues are addressed please ensure that the new test passes consistently.

Comment by Gerrit Updater [ 29/Jul/20 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39533
Subject: LU-13753 tests: stop running sanityn 51b for ZFS
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0858686a91606ac6be812d6ed18c0fb302337959

Comment by Gerrit Updater [ 29/Jul/20 ]

The patch https://review.whamcloud.com/39533 is not a fix, but should be used if we can't find a fix for this test in a timely manner.

Comment by Qian Yingjin [ 30/Jul/20 ]

Hi John,

The reason I changed the test_51b is because that it can not pass the test https://review.whamcloud.com/36674 LU-10934 llite: integrate statx() API with Lustre.

In the above patch, it avoids some unnecessary glimpse lock calls for files without any stripe (i.e. MCREATE an empty file). For this kinds of files, the size returned from MDT is strictly correct, no need RPCs to Lustre OSTs to obtain the file size.

Comment by Qian Yingjin [ 11/Aug/20 ]

I tested it with ZFS backend locally, it passed:

[root@qian tests]# FSTYPE=zfs ONLY="51b" REFORMAT="yes" sh sanityn.sh 
qian: executing check_logdir /tmp/test_logs/1597136195
Logging to shared log directory: /tmp/test_logs/1597136195
qian: executing yml_node
IOC_LIBCFS_GET_NI error 22: Invalid argument
Client: 2.13.55.16
MDS: 2.13.55.16
OSS: 2.13.55.16
excepting tests: 28
skipping tests SLOW=no: 33a
Stopping clients: qian /mnt/lustre (opts:-f)
Stopping clients: qian /mnt/lustre2 (opts:-f)
qian: executing set_hostid
Loading modules from /root/work/STATX/lustre-release/lustre/tests/..
detected 2 online CPUs by sysfs
Force libcfs to create 2 CPU partitions
quota/lquota options: 'hash_lqs_cur_bits=3'
Formatting mgs, mds, osts
Format mds1: lustre-mdt1/mdt1
Format ost1: lustre-ost1/ost1
Format ost2: lustre-ost2/ost2
Checking servers environments
Checking clients qian environments
Loading modules from /root/work/STATX/lustre-release/lustre/tests/..
detected 2 online CPUs by sysfs
Force libcfs to create 2 CPU partitions
Setup mgs, mdt, osts
Starting mds1: -o localrecov  lustre-mdt1/mdt1 /mnt/lustre-mds1
Commit the device label on lustre-mdt1/mdt1
Started lustre-MDT0000
Starting ost1: -o localrecov  lustre-ost1/ost1 /mnt/lustre-ost1
Commit the device label on lustre-ost1/ost1
Started lustre-OST0000
Starting ost2: -o localrecov  lustre-ost2/ost2 /mnt/lustre-ost2
Commit the device label on lustre-ost2/ost2
Started lustre-OST0001
Starting client: qian:  -o user_xattr,flock qian@tcp:/lustre /mnt/lustre
Starting client qian:  -o user_xattr,flock qian@tcp:/lustre /mnt/lustre
Started clients qian: 
192.168.150.131@tcp:/lustre on /mnt/lustre type lustre (rw,flock,user_xattr,lazystatfs,encrypt)
Starting client: qian:  -o user_xattr,flock qian@tcp:/lustre /mnt/lustre2
Starting client qian:  -o user_xattr,flock qian@tcp:/lustre /mnt/lustre2
Started clients qian: 
192.168.150.131@tcp:/lustre on /mnt/lustre2 type lustre (rw,flock,user_xattr,lazystatfs,encrypt)
Using TIMEOUT=20
osc.lustre-OST0000-osc-ffff8e409657d800.idle_timeout=debug
osc.lustre-OST0000-osc-ffff8e409657e800.idle_timeout=debug
osc.lustre-OST0001-osc-ffff8e409657d800.idle_timeout=debug
osc.lustre-OST0001-osc-ffff8e409657e800.idle_timeout=debug
setting jobstats to procname_uid
Setting lustre.sys.jobid_var from disable to procname_uid
Waiting 90s for 'procname_uid'
Updated after 7s: want 'procname_uid' got 'procname_uid'
disable quota as required
lod.lustre-MDT0000-mdtlov.mdt_hash=crush
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00664703 s, 158 MB/s
running as uid/gid/euid/egid 500/500/500/500, groups:
 [touch] [/mnt/lustre/d0_runas_test/f7574]

== sanityn test 51b: layout lock: glimpse should be able to restart if layout changed ================ 16:57:29 (1597136249)
1+0 records in
1+0 records out
1024 bytes (1.0 kB, 1.0 KiB) copied, 0.000503275 s, 2.0 MB/s
fail_loc=0x1404
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0147413 s, 71.1 MB/s
3145728
Resetting fail_loc on all nodes...done.
PASS 51b (7s)
cleanup: ======================================================
== sanityn test complete, duration 61 sec ============================================================ 16:57:36 (1597136256)
Stopping clients: qian /mnt/lustre (opts:-f)
Stopping client qian /mnt/lustre opts:-f
Stopping clients: qian /mnt/lustre2 (opts:-f)
Stopping client qian /mnt/lustre2 opts:-f


[root@qian tests]# FSTYPE=zfs ONLY="51b" MDSCOUNT=2 REFORMAT="yes" sh sanityn.sh
== sanityn test 51b: layout lock: glimpse should be able to restart if layout changed ================ 16:59:20 (1597136360)
1+0 records in
1+0 records out
1024 bytes (1.0 kB, 1.0 KiB) copied, 0.000356896 s, 2.9 MB/s
fail_loc=0x1404
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00559126 s, 188 MB/s
3145728
Resetting fail_loc on all nodes...done.
PASS 51b (7s)
cleanup: ======================================================
== sanityn test complete, duration 35 sec ============================================================ 16:59:27 (1597136367)

Comment by Qian Yingjin [ 11/Aug/20 ]

Andreas' patch: https://review.whamcloud.com/#/c/38947/ gives out the solution by increasing timeout for sanityn test_51b.

Comment by Qian Yingjin [ 14/Aug/20 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38947
Subject: LU-10934 tests: increase timeout for sanityn test_51b
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a00f3f63fa7e8d3ba9a32dcf4da3de83b6dcdcb3

Comment by Andreas Dilger [ 27/Aug/20 ]

The patch to fix test_51b is landed.

Comment by Sergey Cheremencev [ 23/Dec/21 ]

Faced it again on master(review-dne-zfs-part-5): https://testing.whamcloud.com/test_sets/e23e4426-f555-4f86-b61e-7c1051c2a974

Comment by Chris Horn [ 25/Jan/22 ]

+1 on master https://testing.whamcloud.com/test_sets/2b34efa6-e2f5-4797-ab11-e9e3d18082eb

Generated at Sat Feb 10 03:03:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.