Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13753

sanityn test_51b: 'file size is 1024, should be 3145728'

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Lustre 2.14.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for liuying <emoly.liu@intel.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/abe98c66-5293-41c4-a72b-c317b11bb2e2

      test_51b failed with the following error:

      == sanityn test 51b: layout lock: glimpse should be able to restart if layout changed ================ 12:39:07 (1594039147)
      1+0 records in
      1+0 records out
      1024 bytes (1.0 kB) copied, 0.00191303 s, 535 kB/s
      fail_loc=0x1404
      1+0 records in
      1+0 records out
      1048576 bytes (1.0 MB) copied, 1.74443 s, 601 kB/s
      1024
       sanityn test_51b: @@@@@@ FAIL: file size is 1024, should be 3145728 
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanityn test_51b - file size is 1024, should be 3145728

      Attachments

        Issue Links

          Activity

            [LU-13753] sanityn test_51b: 'file size is 1024, should be 3145728'
            hornc Chris Horn added a comment - +1 on master https://testing.whamcloud.com/test_sets/2b34efa6-e2f5-4797-ab11-e9e3d18082eb
            scherementsev Sergey Cheremencev added a comment - Faced it again on master(review-dne-zfs-part-5): https://testing.whamcloud.com/test_sets/e23e4426-f555-4f86-b61e-7c1051c2a974

            The patch to fix test_51b is landed.

            adilger Andreas Dilger added a comment - The patch to fix test_51b is landed.
            qian_wc Qian Yingjin added a comment - - edited

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38947
            Subject: LU-10934 tests: increase timeout for sanityn test_51b
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: a00f3f63fa7e8d3ba9a32dcf4da3de83b6dcdcb3

            qian_wc Qian Yingjin added a comment - - edited Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38947 Subject: LU-10934 tests: increase timeout for sanityn test_51b Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a00f3f63fa7e8d3ba9a32dcf4da3de83b6dcdcb3

            Andreas' patch: https://review.whamcloud.com/#/c/38947/ gives out the solution by increasing timeout for sanityn test_51b.

            qian_wc Qian Yingjin added a comment - Andreas' patch: https://review.whamcloud.com/#/c/38947/ gives out the solution by increasing timeout for sanityn test_51b.
            qian_wc Qian Yingjin added a comment -

            I tested it with ZFS backend locally, it passed:

            [root@qian tests]# FSTYPE=zfs ONLY="51b" REFORMAT="yes" sh sanityn.sh 
            qian: executing check_logdir /tmp/test_logs/1597136195
            Logging to shared log directory: /tmp/test_logs/1597136195
            qian: executing yml_node
            IOC_LIBCFS_GET_NI error 22: Invalid argument
            Client: 2.13.55.16
            MDS: 2.13.55.16
            OSS: 2.13.55.16
            excepting tests: 28
            skipping tests SLOW=no: 33a
            Stopping clients: qian /mnt/lustre (opts:-f)
            Stopping clients: qian /mnt/lustre2 (opts:-f)
            qian: executing set_hostid
            Loading modules from /root/work/STATX/lustre-release/lustre/tests/..
            detected 2 online CPUs by sysfs
            Force libcfs to create 2 CPU partitions
            quota/lquota options: 'hash_lqs_cur_bits=3'
            Formatting mgs, mds, osts
            Format mds1: lustre-mdt1/mdt1
            Format ost1: lustre-ost1/ost1
            Format ost2: lustre-ost2/ost2
            Checking servers environments
            Checking clients qian environments
            Loading modules from /root/work/STATX/lustre-release/lustre/tests/..
            detected 2 online CPUs by sysfs
            Force libcfs to create 2 CPU partitions
            Setup mgs, mdt, osts
            Starting mds1: -o localrecov  lustre-mdt1/mdt1 /mnt/lustre-mds1
            Commit the device label on lustre-mdt1/mdt1
            Started lustre-MDT0000
            Starting ost1: -o localrecov  lustre-ost1/ost1 /mnt/lustre-ost1
            Commit the device label on lustre-ost1/ost1
            Started lustre-OST0000
            Starting ost2: -o localrecov  lustre-ost2/ost2 /mnt/lustre-ost2
            Commit the device label on lustre-ost2/ost2
            Started lustre-OST0001
            Starting client: qian:  -o user_xattr,flock qian@tcp:/lustre /mnt/lustre
            Starting client qian:  -o user_xattr,flock qian@tcp:/lustre /mnt/lustre
            Started clients qian: 
            192.168.150.131@tcp:/lustre on /mnt/lustre type lustre (rw,flock,user_xattr,lazystatfs,encrypt)
            Starting client: qian:  -o user_xattr,flock qian@tcp:/lustre /mnt/lustre2
            Starting client qian:  -o user_xattr,flock qian@tcp:/lustre /mnt/lustre2
            Started clients qian: 
            192.168.150.131@tcp:/lustre on /mnt/lustre2 type lustre (rw,flock,user_xattr,lazystatfs,encrypt)
            Using TIMEOUT=20
            osc.lustre-OST0000-osc-ffff8e409657d800.idle_timeout=debug
            osc.lustre-OST0000-osc-ffff8e409657e800.idle_timeout=debug
            osc.lustre-OST0001-osc-ffff8e409657d800.idle_timeout=debug
            osc.lustre-OST0001-osc-ffff8e409657e800.idle_timeout=debug
            setting jobstats to procname_uid
            Setting lustre.sys.jobid_var from disable to procname_uid
            Waiting 90s for 'procname_uid'
            Updated after 7s: want 'procname_uid' got 'procname_uid'
            disable quota as required
            lod.lustre-MDT0000-mdtlov.mdt_hash=crush
            1+0 records in
            1+0 records out
            1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00664703 s, 158 MB/s
            running as uid/gid/euid/egid 500/500/500/500, groups:
             [touch] [/mnt/lustre/d0_runas_test/f7574]
            
            == sanityn test 51b: layout lock: glimpse should be able to restart if layout changed ================ 16:57:29 (1597136249)
            1+0 records in
            1+0 records out
            1024 bytes (1.0 kB, 1.0 KiB) copied, 0.000503275 s, 2.0 MB/s
            fail_loc=0x1404
            1+0 records in
            1+0 records out
            1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0147413 s, 71.1 MB/s
            3145728
            Resetting fail_loc on all nodes...done.
            PASS 51b (7s)
            cleanup: ======================================================
            == sanityn test complete, duration 61 sec ============================================================ 16:57:36 (1597136256)
            Stopping clients: qian /mnt/lustre (opts:-f)
            Stopping client qian /mnt/lustre opts:-f
            Stopping clients: qian /mnt/lustre2 (opts:-f)
            Stopping client qian /mnt/lustre2 opts:-f
            
            
            [root@qian tests]# FSTYPE=zfs ONLY="51b" MDSCOUNT=2 REFORMAT="yes" sh sanityn.sh
            == sanityn test 51b: layout lock: glimpse should be able to restart if layout changed ================ 16:59:20 (1597136360)
            1+0 records in
            1+0 records out
            1024 bytes (1.0 kB, 1.0 KiB) copied, 0.000356896 s, 2.9 MB/s
            fail_loc=0x1404
            1+0 records in
            1+0 records out
            1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00559126 s, 188 MB/s
            3145728
            Resetting fail_loc on all nodes...done.
            PASS 51b (7s)
            cleanup: ======================================================
            == sanityn test complete, duration 35 sec ============================================================ 16:59:27 (1597136367)
            
            
            qian_wc Qian Yingjin added a comment - I tested it with ZFS backend locally, it passed: [root@qian tests]# FSTYPE=zfs ONLY= "51b" REFORMAT= "yes" sh sanityn.sh qian: executing check_logdir /tmp/test_logs/1597136195 Logging to shared log directory: /tmp/test_logs/1597136195 qian: executing yml_node IOC_LIBCFS_GET_NI error 22: Invalid argument Client: 2.13.55.16 MDS: 2.13.55.16 OSS: 2.13.55.16 excepting tests: 28 skipping tests SLOW=no: 33a Stopping clients: qian /mnt/lustre (opts:-f) Stopping clients: qian /mnt/lustre2 (opts:-f) qian: executing set_hostid Loading modules from /root/work/STATX/lustre-release/lustre/tests/.. detected 2 online CPUs by sysfs Force libcfs to create 2 CPU partitions quota/lquota options: 'hash_lqs_cur_bits=3' Formatting mgs, mds, osts Format mds1: lustre-mdt1/mdt1 Format ost1: lustre-ost1/ost1 Format ost2: lustre-ost2/ost2 Checking servers environments Checking clients qian environments Loading modules from /root/work/STATX/lustre-release/lustre/tests/.. detected 2 online CPUs by sysfs Force libcfs to create 2 CPU partitions Setup mgs, mdt, osts Starting mds1: -o localrecov lustre-mdt1/mdt1 /mnt/lustre-mds1 Commit the device label on lustre-mdt1/mdt1 Started lustre-MDT0000 Starting ost1: -o localrecov lustre-ost1/ost1 /mnt/lustre-ost1 Commit the device label on lustre-ost1/ost1 Started lustre-OST0000 Starting ost2: -o localrecov lustre-ost2/ost2 /mnt/lustre-ost2 Commit the device label on lustre-ost2/ost2 Started lustre-OST0001 Starting client: qian: -o user_xattr,flock qian@tcp:/lustre /mnt/lustre Starting client qian: -o user_xattr,flock qian@tcp:/lustre /mnt/lustre Started clients qian: 192.168.150.131@tcp:/lustre on /mnt/lustre type lustre (rw,flock,user_xattr,lazystatfs,encrypt) Starting client: qian: -o user_xattr,flock qian@tcp:/lustre /mnt/lustre2 Starting client qian: -o user_xattr,flock qian@tcp:/lustre /mnt/lustre2 Started clients qian: 192.168.150.131@tcp:/lustre on /mnt/lustre2 type lustre (rw,flock,user_xattr,lazystatfs,encrypt) Using TIMEOUT=20 osc.lustre-OST0000-osc-ffff8e409657d800.idle_timeout=debug osc.lustre-OST0000-osc-ffff8e409657e800.idle_timeout=debug osc.lustre-OST0001-osc-ffff8e409657d800.idle_timeout=debug osc.lustre-OST0001-osc-ffff8e409657e800.idle_timeout=debug setting jobstats to procname_uid Setting lustre.sys.jobid_var from disable to procname_uid Waiting 90s for 'procname_uid' Updated after 7s: want 'procname_uid' got 'procname_uid' disable quota as required lod.lustre-MDT0000-mdtlov.mdt_hash=crush 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00664703 s, 158 MB/s running as uid/gid/euid/egid 500/500/500/500, groups: [touch] [/mnt/lustre/d0_runas_test/f7574] == sanityn test 51b: layout lock: glimpse should be able to restart if layout changed ================ 16:57:29 (1597136249) 1+0 records in 1+0 records out 1024 bytes (1.0 kB, 1.0 KiB) copied, 0.000503275 s, 2.0 MB/s fail_loc=0x1404 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0147413 s, 71.1 MB/s 3145728 Resetting fail_loc on all nodes...done. PASS 51b (7s) cleanup: ====================================================== == sanityn test complete, duration 61 sec ============================================================ 16:57:36 (1597136256) Stopping clients: qian /mnt/lustre (opts:-f) Stopping client qian /mnt/lustre opts:-f Stopping clients: qian /mnt/lustre2 (opts:-f) Stopping client qian /mnt/lustre2 opts:-f [root@qian tests]# FSTYPE=zfs ONLY= "51b" MDSCOUNT=2 REFORMAT= "yes" sh sanityn.sh == sanityn test 51b: layout lock: glimpse should be able to restart if layout changed ================ 16:59:20 (1597136360) 1+0 records in 1+0 records out 1024 bytes (1.0 kB, 1.0 KiB) copied, 0.000356896 s, 2.9 MB/s fail_loc=0x1404 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00559126 s, 188 MB/s 3145728 Resetting fail_loc on all nodes...done. PASS 51b (7s) cleanup: ====================================================== == sanityn test complete, duration 35 sec ============================================================ 16:59:27 (1597136367)
            qian_wc Qian Yingjin added a comment -

            Hi John,

            The reason I changed the test_51b is because that it can not pass the test https://review.whamcloud.com/36674 LU-10934 llite: integrate statx() API with Lustre.

            In the above patch, it avoids some unnecessary glimpse lock calls for files without any stripe (i.e. MCREATE an empty file). For this kinds of files, the size returned from MDT is strictly correct, no need RPCs to Lustre OSTs to obtain the file size.

            qian_wc Qian Yingjin added a comment - Hi John, The reason I changed the test_51b is because that it can not pass the test  https://review.whamcloud.com/36674 LU-10934 llite: integrate statx() API with Lustre. In the above patch, it avoids some unnecessary glimpse lock calls for files without any stripe (i.e. MCREATE an empty file). For this kinds of files, the size returned from MDT is strictly correct, no need RPCs to Lustre OSTs to obtain the file size.
            gerrit Gerrit Updater added a comment - - edited

            The patch https://review.whamcloud.com/39533 is not a fix, but should be used if we can't find a fix for this test in a timely manner.

            gerrit Gerrit Updater added a comment - - edited The patch https://review.whamcloud.com/39533 is not a fix, but should be used if we can't find a fix for this test in a timely manner.

            James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39533
            Subject: LU-13753 tests: stop running sanityn 51b for ZFS
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0858686a91606ac6be812d6ed18c0fb302337959

            gerrit Gerrit Updater added a comment - James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39533 Subject: LU-13753 tests: stop running sanityn 51b for ZFS Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0858686a91606ac6be812d6ed18c0fb302337959
            jhammond John Hammond added a comment -

            qian_wc This test was modified by https://review.whamcloud.com/36674 LU-10934 llite: integrate statx() API with Lustre. Could you revert the changes to the test and ensure that it passes consistently. In general we should not rewrite tests in this way. It makes it difficult to analyze test results over time. Instead please add a new test based on the old test. Once the test issues are addressed please ensure that the new test passes consistently.

            jhammond John Hammond added a comment - qian_wc This test was modified by https://review.whamcloud.com/36674 LU-10934 llite: integrate statx() API with Lustre. Could you revert the changes to the test and ensure that it passes consistently. In general we should not rewrite tests in this way. It makes it difficult to analyze test results over time. Instead please add a new test based on the old test. Once the test issues are addressed please ensure that the new test passes consistently.

            People

              qian_wc Qian Yingjin
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: