[LU-13753] sanityn test_51b: 'file size is 1024, should be 3145728' Created: 07/Jul/20 Updated: 13/Sep/23 Resolved: 27/Aug/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.14.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Qian Yingjin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was created by maloo for liuying <emoly.liu@intel.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/abe98c66-5293-41c4-a72b-c317b11bb2e2 test_51b failed with the following error: == sanityn test 51b: layout lock: glimpse should be able to restart if layout changed ================ 12:39:07 (1594039147) 1+0 records in 1+0 records out 1024 bytes (1.0 kB) copied, 0.00191303 s, 535 kB/s fail_loc=0x1404 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 1.74443 s, 601 kB/s 1024 sanityn test_51b: @@@@@@ FAIL: file size is 1024, should be 3145728 VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Chris Horn [ 07/Jul/20 ] |
|
+1 on master: https://testing.whamcloud.com/test_sessions/23b5849c-c8b2-4516-883c-5198009482b8 |
| Comment by Andreas Dilger [ 29/Jul/20 ] |
|
+1 on master https://testing.whamcloud.com/test_sets/fe7f9404-be7c-43d7-8915-6a9db3002a43 |
| Comment by James Nunez (Inactive) [ 29/Jul/20 ] |
|
It looks like the majority of sanityn test 51b failures is for DNE/ZFS testing and started on 28 MAY 2020, but we do have some failures that are ZFS with no DNE like https://testing.whamcloud.com/test_sets/7e857919-f4e3-49fc-abda-4d67275bf425. This test is failing ~20% of the time in autotest patch/review testing. The early failures were attributed to ticket |
| Comment by John Hammond [ 29/Jul/20 ] |
|
qian_wc This test was modified by https://review.whamcloud.com/36674 |
| Comment by Gerrit Updater [ 29/Jul/20 ] |
|
James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39533 |
| Comment by Gerrit Updater [ 29/Jul/20 ] |
|
The patch https://review.whamcloud.com/39533 is not a fix, but should be used if we can't find a fix for this test in a timely manner. |
| Comment by Qian Yingjin [ 30/Jul/20 ] |
|
Hi John, The reason I changed the test_51b is because that it can not pass the test https://review.whamcloud.com/36674 In the above patch, it avoids some unnecessary glimpse lock calls for files without any stripe (i.e. MCREATE an empty file). For this kinds of files, the size returned from MDT is strictly correct, no need RPCs to Lustre OSTs to obtain the file size. |
| Comment by Qian Yingjin [ 11/Aug/20 ] |
|
I tested it with ZFS backend locally, it passed: [root@qian tests]# FSTYPE=zfs ONLY="51b" REFORMAT="yes" sh sanityn.sh qian: executing check_logdir /tmp/test_logs/1597136195 Logging to shared log directory: /tmp/test_logs/1597136195 qian: executing yml_node IOC_LIBCFS_GET_NI error 22: Invalid argument Client: 2.13.55.16 MDS: 2.13.55.16 OSS: 2.13.55.16 excepting tests: 28 skipping tests SLOW=no: 33a Stopping clients: qian /mnt/lustre (opts:-f) Stopping clients: qian /mnt/lustre2 (opts:-f) qian: executing set_hostid Loading modules from /root/work/STATX/lustre-release/lustre/tests/.. detected 2 online CPUs by sysfs Force libcfs to create 2 CPU partitions quota/lquota options: 'hash_lqs_cur_bits=3' Formatting mgs, mds, osts Format mds1: lustre-mdt1/mdt1 Format ost1: lustre-ost1/ost1 Format ost2: lustre-ost2/ost2 Checking servers environments Checking clients qian environments Loading modules from /root/work/STATX/lustre-release/lustre/tests/.. detected 2 online CPUs by sysfs Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 Commit the device label on lustre-mdt1/mdt1 Started lustre-MDT0000 Starting ost1: -o localrecov lustre-ost1/ost1 /mnt/lustre-ost1 Commit the device label on lustre-ost1/ost1 Started lustre-OST0000 Starting ost2: -o localrecov lustre-ost2/ost2 /mnt/lustre-ost2 Commit the device label on lustre-ost2/ost2 Started lustre-OST0001 Starting client: qian: -o user_xattr,flock qian@tcp:/lustre /mnt/lustre Starting client qian: -o user_xattr,flock qian@tcp:/lustre /mnt/lustre Started clients qian: 192.168.150.131@tcp:/lustre on /mnt/lustre type lustre (rw,flock,user_xattr,lazystatfs,encrypt) Starting client: qian: -o user_xattr,flock qian@tcp:/lustre /mnt/lustre2 Starting client qian: -o user_xattr,flock qian@tcp:/lustre /mnt/lustre2 Started clients qian: 192.168.150.131@tcp:/lustre on /mnt/lustre2 type lustre (rw,flock,user_xattr,lazystatfs,encrypt) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff8e409657d800.idle_timeout=debug osc.lustre-OST0000-osc-ffff8e409657e800.idle_timeout=debug osc.lustre-OST0001-osc-ffff8e409657d800.idle_timeout=debug osc.lustre-OST0001-osc-ffff8e409657e800.idle_timeout=debug setting jobstats to procname_uid Setting lustre.sys.jobid_var from disable to procname_uid Waiting 90s for 'procname_uid' Updated after 7s: want 'procname_uid' got 'procname_uid' disable quota as required lod.lustre-MDT0000-mdtlov.mdt_hash=crush 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00664703 s, 158 MB/s running as uid/gid/euid/egid 500/500/500/500, groups: [touch] [/mnt/lustre/d0_runas_test/f7574] == sanityn test 51b: layout lock: glimpse should be able to restart if layout changed ================ 16:57:29 (1597136249) 1+0 records in 1+0 records out 1024 bytes (1.0 kB, 1.0 KiB) copied, 0.000503275 s, 2.0 MB/s fail_loc=0x1404 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0147413 s, 71.1 MB/s 3145728 Resetting fail_loc on all nodes...done. PASS 51b (7s) cleanup: ====================================================== == sanityn test complete, duration 61 sec ============================================================ 16:57:36 (1597136256) Stopping clients: qian /mnt/lustre (opts:-f) Stopping client qian /mnt/lustre opts:-f Stopping clients: qian /mnt/lustre2 (opts:-f) Stopping client qian /mnt/lustre2 opts:-f [root@qian tests]# FSTYPE=zfs ONLY="51b" MDSCOUNT=2 REFORMAT="yes" sh sanityn.sh == sanityn test 51b: layout lock: glimpse should be able to restart if layout changed ================ 16:59:20 (1597136360) 1+0 records in 1+0 records out 1024 bytes (1.0 kB, 1.0 KiB) copied, 0.000356896 s, 2.9 MB/s fail_loc=0x1404 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00559126 s, 188 MB/s 3145728 Resetting fail_loc on all nodes...done. PASS 51b (7s) cleanup: ====================================================== == sanityn test complete, duration 35 sec ============================================================ 16:59:27 (1597136367) |
| Comment by Qian Yingjin [ 11/Aug/20 ] |
|
Andreas' patch: https://review.whamcloud.com/#/c/38947/ gives out the solution by increasing timeout for sanityn test_51b. |
| Comment by Qian Yingjin [ 14/Aug/20 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38947 |
| Comment by Andreas Dilger [ 27/Aug/20 ] |
|
The patch to fix test_51b is landed. |
| Comment by Sergey Cheremencev [ 23/Dec/21 ] |
|
Faced it again on master(review-dne-zfs-part-5): https://testing.whamcloud.com/test_sets/e23e4426-f555-4f86-b61e-7c1051c2a974 |
| Comment by Chris Horn [ 25/Jan/22 ] |
|
+1 on master https://testing.whamcloud.com/test_sets/2b34efa6-e2f5-4797-ab11-e9e3d18082eb |