Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7746

skip test of new functionality on upstream client

Details

    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Andreas Dilger <andreas.dilger@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/37f487fc-cbab-11e5-a59a-5254006e85c2.

      The sub-test test_11 failed with the following error:

      error on LL_IOC_LMV_SETSTRIPE '/mnt/lustre/d11.sanity-scrub/mds1' (3): Invalid argument
      (1) Fail to mkdir /mnt/lustre/d11.sanity-scrub/mds1
      

      The upstream client reports a Lustre build version of 2.3.64 (though it has bug fixes from after that point) and does not contain any of the DNE1/2 or HSM functionality, among other things.

      Info required for matching: sanity-scrub 11
      Info required for matching: sanity 27E
      Info required for matching: sanity 29
      Info required for matching: sanity 48b
      Info required for matching: sanity 48c
      Info required for matching: sanity 48d
      Info required for matching: sanity 56w
      Info required for matching: sanity 101f
      Info required for matching: sanity 102a
      Info required for matching: sanity 102b
      Info required for matching: sanity 124c

      Attachments

        Issue Links

          Activity

            [LU-7746] skip test of new functionality on upstream client

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28718/
            Subject: LU-7746 tests: skip tests for older (upstream) client
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a8d33a29e77e102505ee3916782dc697ad121ff8

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28718/ Subject: LU-7746 tests: skip tests for older (upstream) client Project: fs/lustre-release Branch: master Current Patch Set: Commit: a8d33a29e77e102505ee3916782dc697ad121ff8

            I pushed a patch to fs/linux-staging (based on the current tip of staging-next (v4.13-rc5-519-g30b7b04 "staging: lustre: lnet: cleanup paths for all LNet headers") and hit the following failures:
            https://testing.hpdd.intel.com/test_sets/e759b6c0-8989-11e7-b50a-5254006e85c2

            sanity: FAIL: test_27z FF stripe count 1 != 0  (PFL=fixed)
            sanity: FAIL: test_27D llapi_layout_test failed (PFL=fixed)
            sanity: FAIL: test_29 No mdc lock count
            sanity: FAIL: test_77c no checksum dump file on Client
            sanity: FAIL: test_101g unable to set max_pages_per_rpc=16M (16M=fixed)
            sanity: FAIL: test_102a /mnt/lustre/f102a.sanity missing 3 trusted.name xattrs
            sanity: FAIL: test_102b can't get trusted.lov from /mnt/lustre/f102b.sanity 
            sanity: FAIL: test_102n setxattr invalid 'trusted.lov' success
            sanity: FAIL: test_103a run_acl_subtest cp failed
            sanity: FAIL: test_125 setfacl /mnt/lustre/d125 failed
            sanity: FAIL: test_154B decode linkea /mnt/lustre/d154B.sanity/f154B.sanity failed
            sanity: FAIL: test_154a setfacl /mnt/lustre/.lustre/fid/[0x200002b12:0xf:0x0] failed.'
            sanity: FAIL: test_154g llapi_path2fid failed for '/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766' (fixed)
            sanity: FAIL: test_160a MKDIR changelog mask count != 1
            sanity: FAIL: test_160c TRUNC changelog mask count != 1
            sanity: FAIL: test_160e changelog_clear failed with 2, expected 22 (EINVAL)
            sanity: FAIL: test_161c flag is not 0x1
            sanity: FAIL: test_161d create should be blocked
            sanity: FAIL: test_162a path looked up "" instead of "d162a.sanity/d2/p/q/r/slink"
            sanity: FAIL: test_205 Wrong changelog jobid count 0 != 9
            sanity: FAIL: test_215 cannot read lnet.stats
            sanity: FAIL: test_226a cannot get path of FIFO by /mnt/lustre /mnt/lustre/d226a.sanity/fifo
            sanity: FAIL: test_242 ls /mnt/lustre/d242.sanity failed
            sanity: FAIL: test_251 short write happened
            sanity: FAIL: test_405 One layout swap locked test failed (fixed)
            sanity: FAIL: test_410 no inode match (fixed)
            sanity: FAIL: test_900 VFS: Busy inodes after unmount of lustre. Self-destruct in 5 seconds.
            

            I've pushed a patch to skip a few of the tests based on the client version, but it looks like there are some real problems with the upstream client with handling xattrs and changelogs and some files in /sys.

            adilger Andreas Dilger added a comment - I pushed a patch to fs/linux-staging (based on the current tip of staging-next (v4.13-rc5-519-g30b7b04 "staging: lustre: lnet: cleanup paths for all LNet headers") and hit the following failures: https://testing.hpdd.intel.com/test_sets/e759b6c0-8989-11e7-b50a-5254006e85c2 sanity: FAIL: test_27z FF stripe count 1 != 0 (PFL=fixed) sanity: FAIL: test_27D llapi_layout_test failed (PFL=fixed) sanity: FAIL: test_29 No mdc lock count sanity: FAIL: test_77c no checksum dump file on Client sanity: FAIL: test_101g unable to set max_pages_per_rpc=16M (16M=fixed) sanity: FAIL: test_102a /mnt/lustre/f102a.sanity missing 3 trusted.name xattrs sanity: FAIL: test_102b can't get trusted.lov from /mnt/lustre/f102b.sanity sanity: FAIL: test_102n setxattr invalid 'trusted.lov' success sanity: FAIL: test_103a run_acl_subtest cp failed sanity: FAIL: test_125 setfacl /mnt/lustre/d125 failed sanity: FAIL: test_154B decode linkea /mnt/lustre/d154B.sanity/f154B.sanity failed sanity: FAIL: test_154a setfacl /mnt/lustre/.lustre/fid/[0x200002b12:0xf:0x0] failed.' sanity: FAIL: test_154g llapi_path2fid failed for '/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766' (fixed) sanity: FAIL: test_160a MKDIR changelog mask count != 1 sanity: FAIL: test_160c TRUNC changelog mask count != 1 sanity: FAIL: test_160e changelog_clear failed with 2, expected 22 (EINVAL) sanity: FAIL: test_161c flag is not 0x1 sanity: FAIL: test_161d create should be blocked sanity: FAIL: test_162a path looked up "" instead of "d162a.sanity/d2/p/q/r/slink" sanity: FAIL: test_205 Wrong changelog jobid count 0 != 9 sanity: FAIL: test_215 cannot read lnet.stats sanity: FAIL: test_226a cannot get path of FIFO by /mnt/lustre /mnt/lustre/d226a.sanity/fifo sanity: FAIL: test_242 ls /mnt/lustre/d242.sanity failed sanity: FAIL: test_251 short write happened sanity: FAIL: test_405 One layout swap locked test failed (fixed) sanity: FAIL: test_410 no inode match (fixed) sanity: FAIL: test_900 VFS: Busy inodes after unmount of lustre. Self-destruct in 5 seconds. I've pushed a patch to skip a few of the tests based on the client version, but it looks like there are some real problems with the upstream client with handling xattrs and changelogs and some files in /sys.

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/28718
            Subject: LU-7746 tests: skip tests for older (upstream) client
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ac4f6fc8db0211e18c59ebd828931a067b36f17c

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/28718 Subject: LU-7746 tests: skip tests for older (upstream) client Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ac4f6fc8db0211e18c59ebd828931a067b36f17c

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19663/
            Subject: LU-7746 tests: some fixes for sanity.sh with upstream kernel
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: c67a7f39cabbe8ef6d7c5a340b501aedaa748be6

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19663/ Subject: LU-7746 tests: some fixes for sanity.sh with upstream kernel Project: fs/lustre-release Branch: master Current Patch Set: Commit: c67a7f39cabbe8ef6d7c5a340b501aedaa748be6

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/19663
            Subject: LU-7746 tests: some fixes for sanity.sh with upstream kernel
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: a5056f96b9d8c0b4f6175faf1ee5f7f48c4f6138

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/19663 Subject: LU-7746 tests: some fixes for sanity.sh with upstream kernel Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a5056f96b9d8c0b4f6175faf1ee5f7f48c4f6138

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/18442
            Subject: LU-7746 tests: skip a few tests for upstream kernel
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: a91a02f8676dcc3a124531a4dfb21371ed3ea7b5

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/18442 Subject: LU-7746 tests: skip a few tests for upstream kernel Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a91a02f8676dcc3a124531a4dfb21371ed3ea7b5

            In my own testing these are the sanity test I see fail.

            sanity: FAIL: test_29 test_29 failed with 1
            sanity: FAIL: test_53 can not match last_seq/last_id for OST-osc
            sanity: FAIL: test_101f misses too much pages!
            sanity: FAIL: test_102a /lustre/lustre/f102a.sanity missing 3 trusted.name xattrs
            sanity: FAIL: test_102ha saving trusted.big on /lustre/lustre/f102ha.sanity failed
            sanity: FAIL: test_124c 205 locks are not canceled
            sanity: FAIL: test_129 exceeded dir size limit 20480(1) : 24576 bytes
            sanity: FAIL: test_134b createmany finished incorrectly!
            sanity: FAIL: test_154f expected parent: [0x2000032e2:0x1f:0x0]/foo1, got:
            sanity: FAIL: test_154g test_154g failed with 1
            sanity: FAIL: test_205 Wrong changelog jobid count 0 != 9
            sanity: FAIL: test_208 get lease error
            sanity: FAIL: test_215 cannot read lnet.stats
            sanity: FAIL: test_220 test_220 failed with 6
            sanity: FAIL: test_300m set default stripes dir error
            sanity: FAIL: test_402 setdirstripe -i 0 failed

            simmonsja James A Simmons added a comment - In my own testing these are the sanity test I see fail. sanity: FAIL: test_29 test_29 failed with 1 sanity: FAIL: test_53 can not match last_seq/last_id for OST -osc sanity: FAIL: test_101f misses too much pages! sanity: FAIL: test_102a /lustre/lustre/f102a.sanity missing 3 trusted.name xattrs sanity: FAIL: test_102ha saving trusted.big on /lustre/lustre/f102ha.sanity failed sanity: FAIL: test_124c 205 locks are not canceled sanity: FAIL: test_129 exceeded dir size limit 20480(1) : 24576 bytes sanity: FAIL: test_134b createmany finished incorrectly! sanity: FAIL: test_154f expected parent: [0x2000032e2:0x1f:0x0] /foo1, got: sanity: FAIL: test_154g test_154g failed with 1 sanity: FAIL: test_205 Wrong changelog jobid count 0 != 9 sanity: FAIL: test_208 get lease error sanity: FAIL: test_215 cannot read lnet.stats sanity: FAIL: test_220 test_220 failed with 6 sanity: FAIL: test_300m set default stripes dir error sanity: FAIL: test_402 setdirstripe -i 0 failed

            Have pushed a patch to LU-7144 to diagnose why sanity-scrub and sanity-lfsck are being run for staging-next tests.

            adilger Andreas Dilger added a comment - Have pushed a patch to LU-7144 to diagnose why sanity-scrub and sanity-lfsck are being run for staging-next tests.
            adilger Andreas Dilger added a comment - - edited

            Running sanity.sh from master, now that LU-5030 patches are landed still produces a few errors, but is pretty good already:
            Subtest passes: 393/405

            The long "====" padding should be removed from the test descriptions.

            == sanity test 27E: check that default extended attribute size properly increases ==================== 21:09:00 (1454620140)
            error: set_param: opening /sys/fs/lustre/llite/lustre-ffff88006feee000/default_easize: Permission denied
             sanity test_27E: @@@@@@ FAIL: lctl set_param failed 
            

            I don't think the default_easize functionality is available. This could either be skipped or a patch ported to the kernel.

            == sanity test 29: IT_GETATTR regression  ============================================================ 21:09:06 (1454620146)
            first d29
            total 0
            -rw-r--r-- 1 root root 0 Feb  4 21:09 foo
            No mdc lock count
             sanity test_29: @@@@@@ FAIL: test_29 failed with 1 
            

            Not sure of cause. This also needs a proper "error" instead of "failed with 1".

            == sanity test 48b: Access removed working dir (should return errors)================================= 21:16:25 (1454620585)
            cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
             sanity test_48b: @@@@@@ FAIL: 'cd .' worked after removing cwd 
            

            Not sure about this one. Missing fix or kernel bug?

            == sanity test 48c: Access removed working subdir (should return errors) ============================= 21:16:29 (1454620589)
            cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
             sanity test_48c: @@@@@@ FAIL: 'cd .' worked after removing cwd 
            

            This test only worked for patched clients, it can be deleted completely.

            cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
             sanity test_48d: @@@@@@ FAIL: 'cd .' worked after recreate parent 
            

            This test only worked for patched clients, it can be deleted completely.

            == sanity test 56w: check lfs_migrate -c stripe_count works ========================================== 21:38:08 (1454621888)
            /usr/bin/lfs_migrate -y -c 6 /mnt/lustre/d56w.sanityw/file1
            /mnt/lustre/d56w.sanityw/file1: /usr/bin/lfs: /mnt/lustre/d56w.sanityw/file1: dataversion changed during copy, migration aborted
            cannot put lease: Invalid argument (22)
            error: migrate: migrate file '/mnt/lustre/d56w.sanityw/file1' failed
            falling back to rsync-based migration
            done
            /usr/bin/lfs migrate -i 0 /mnt/lustre/d56w.sanityw/migr_1_ost
            /usr/bin/lfs: /mnt/lustre/d56w.sanityw/migr_1_ost: dataversion changed
            

            Handling this separately in LU-7747 since the test should not be skipped.

            == sanity test 101f: check read-ahead for max_read_ahead_whole_mb ==================================== 21:53:19 (1454622799)
            Cancel LRU locks on lustre client to flush the client cache
            Reset readahead stats
            Random 4K reads on 2M file for 1000 times
            
            1.061329s, 3.85931MB/s
            checking missing pages
             sanity test_101f: @@@@@@ FAIL: misses too much pages!
            

            Not sure about this one.

            == sanity test 102a: user xattr test ================================================================= 21:53:25 (1454622805)
            set/get xattr...
            trusted.name1="value1"
            user.author1="author1"
            listxattr...
             sanity test_102a: @@@@@@ FAIL: /mnt/lustre/f102a.sanity missing 3 trusted.name xattrs
            

            Not sure.

            == sanity test 102b: getfattr/setfattr for trusted.lov EAs =========================================== 21:53:28 (1454622808)
            get/set/list trusted.lov xattr ...
             sanity test_102b: @@@@@@ FAIL: can't get trusted.lov from /mnt/lustre/f102b.sanity 
            

            Not sure.

            == sanity test 124c: LRUR cancel very aged locks ===================================================== 22:01:49 (1454623309)
            total: 100 creates in 0.26 seconds: 379.45 creates/second
            unused=205, max_age=36000000, recalc_p=10
            ldlm.namespaces.lustre-MDT0000-mdc-ffff8800729fd000.lru_max_age=1000
            sleep 20 seconds...
             sanity test_124c: @@@@@@ FAIL: 205 locks are not canceled 
            

            Should be skipped, or LRU aging fix ported to kernel.

            == sanity test 133f: Check for LBUGs/Oopses/unreadable files in /proc ================================ 22:03:46 (1454623426)
            16:04:38:[ 5252.564061] LustreError: 21743:0:(obd_class.h:1087:obd_statfs()) obd_statfs: dev 0 no operation
            16:04:38:[ 5526.640178] kworker/dying (67) used greatest stack depth: 10616 bytes left
            

            Filed LU-7748 for this, since it seems like a bug in the upstream kernel /sys handling, though it should probably be skipped in the short term.

            adilger Andreas Dilger added a comment - - edited Running sanity.sh from master, now that LU-5030 patches are landed still produces a few errors, but is pretty good already: Subtest passes: 393/405 The long "====" padding should be removed from the test descriptions. == sanity test 27E: check that default extended attribute size properly increases ==================== 21:09:00 (1454620140) error: set_param: opening /sys/fs/lustre/llite/lustre-ffff88006feee000/default_easize: Permission denied sanity test_27E: @@@@@@ FAIL: lctl set_param failed I don't think the default_easize functionality is available. This could either be skipped or a patch ported to the kernel. == sanity test 29: IT_GETATTR regression ============================================================ 21:09:06 (1454620146) first d29 total 0 -rw-r--r-- 1 root root 0 Feb 4 21:09 foo No mdc lock count sanity test_29: @@@@@@ FAIL: test_29 failed with 1 Not sure of cause. This also needs a proper "error" instead of "failed with 1". == sanity test 48b: Access removed working dir (should return errors)================================= 21:16:25 (1454620585) cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory sanity test_48b: @@@@@@ FAIL: 'cd .' worked after removing cwd Not sure about this one. Missing fix or kernel bug? == sanity test 48c: Access removed working subdir (should return errors) ============================= 21:16:29 (1454620589) cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory sanity test_48c: @@@@@@ FAIL: 'cd .' worked after removing cwd This test only worked for patched clients, it can be deleted completely. cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory sanity test_48d: @@@@@@ FAIL: 'cd .' worked after recreate parent This test only worked for patched clients, it can be deleted completely. == sanity test 56w: check lfs_migrate -c stripe_count works ========================================== 21:38:08 (1454621888) /usr/bin/lfs_migrate -y -c 6 /mnt/lustre/d56w.sanityw/file1 /mnt/lustre/d56w.sanityw/file1: /usr/bin/lfs: /mnt/lustre/d56w.sanityw/file1: dataversion changed during copy, migration aborted cannot put lease: Invalid argument (22) error: migrate: migrate file '/mnt/lustre/d56w.sanityw/file1' failed falling back to rsync-based migration done /usr/bin/lfs migrate -i 0 /mnt/lustre/d56w.sanityw/migr_1_ost /usr/bin/lfs: /mnt/lustre/d56w.sanityw/migr_1_ost: dataversion changed Handling this separately in LU-7747 since the test should not be skipped. == sanity test 101f: check read-ahead for max_read_ahead_whole_mb ==================================== 21:53:19 (1454622799) Cancel LRU locks on lustre client to flush the client cache Reset readahead stats Random 4K reads on 2M file for 1000 times 1.061329s, 3.85931MB/s checking missing pages sanity test_101f: @@@@@@ FAIL: misses too much pages! Not sure about this one. == sanity test 102a: user xattr test ================================================================= 21:53:25 (1454622805) set/get xattr... trusted.name1="value1" user.author1="author1" listxattr... sanity test_102a: @@@@@@ FAIL: /mnt/lustre/f102a.sanity missing 3 trusted.name xattrs Not sure. == sanity test 102b: getfattr/setfattr for trusted.lov EAs =========================================== 21:53:28 (1454622808) get/set/list trusted.lov xattr ... sanity test_102b: @@@@@@ FAIL: can't get trusted.lov from /mnt/lustre/f102b.sanity Not sure. == sanity test 124c: LRUR cancel very aged locks ===================================================== 22:01:49 (1454623309) total: 100 creates in 0.26 seconds: 379.45 creates/second unused=205, max_age=36000000, recalc_p=10 ldlm.namespaces.lustre-MDT0000-mdc-ffff8800729fd000.lru_max_age=1000 sleep 20 seconds... sanity test_124c: @@@@@@ FAIL: 205 locks are not canceled Should be skipped, or LRU aging fix ported to kernel. == sanity test 133f: Check for LBUGs/Oopses/unreadable files in /proc ================================ 22:03:46 (1454623426) 16:04:38:[ 5252.564061] LustreError: 21743:0:(obd_class.h:1087:obd_statfs()) obd_statfs: dev 0 no operation 16:04:38:[ 5526.640178] kworker/dying (67) used greatest stack depth: 10616 bytes left Filed LU-7748 for this, since it seems like a bug in the upstream kernel /sys handling, though it should probably be skipped in the short term.

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: