Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7746

skip test of new functionality on upstream client

Details

    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Andreas Dilger <andreas.dilger@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/37f487fc-cbab-11e5-a59a-5254006e85c2.

      The sub-test test_11 failed with the following error:

      error on LL_IOC_LMV_SETSTRIPE '/mnt/lustre/d11.sanity-scrub/mds1' (3): Invalid argument
      (1) Fail to mkdir /mnt/lustre/d11.sanity-scrub/mds1
      

      The upstream client reports a Lustre build version of 2.3.64 (though it has bug fixes from after that point) and does not contain any of the DNE1/2 or HSM functionality, among other things.

      Info required for matching: sanity-scrub 11
      Info required for matching: sanity 27E
      Info required for matching: sanity 29
      Info required for matching: sanity 48b
      Info required for matching: sanity 48c
      Info required for matching: sanity 48d
      Info required for matching: sanity 56w
      Info required for matching: sanity 101f
      Info required for matching: sanity 102a
      Info required for matching: sanity 102b
      Info required for matching: sanity 124c

      Attachments

        Issue Links

          Activity

            [LU-7746] skip test of new functionality on upstream client

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19663/
            Subject: LU-7746 tests: some fixes for sanity.sh with upstream kernel
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: c67a7f39cabbe8ef6d7c5a340b501aedaa748be6

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19663/ Subject: LU-7746 tests: some fixes for sanity.sh with upstream kernel Project: fs/lustre-release Branch: master Current Patch Set: Commit: c67a7f39cabbe8ef6d7c5a340b501aedaa748be6

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/19663
            Subject: LU-7746 tests: some fixes for sanity.sh with upstream kernel
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: a5056f96b9d8c0b4f6175faf1ee5f7f48c4f6138

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/19663 Subject: LU-7746 tests: some fixes for sanity.sh with upstream kernel Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a5056f96b9d8c0b4f6175faf1ee5f7f48c4f6138

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/18442
            Subject: LU-7746 tests: skip a few tests for upstream kernel
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: a91a02f8676dcc3a124531a4dfb21371ed3ea7b5

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/18442 Subject: LU-7746 tests: skip a few tests for upstream kernel Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a91a02f8676dcc3a124531a4dfb21371ed3ea7b5

            In my own testing these are the sanity test I see fail.

            sanity: FAIL: test_29 test_29 failed with 1
            sanity: FAIL: test_53 can not match last_seq/last_id for OST-osc
            sanity: FAIL: test_101f misses too much pages!
            sanity: FAIL: test_102a /lustre/lustre/f102a.sanity missing 3 trusted.name xattrs
            sanity: FAIL: test_102ha saving trusted.big on /lustre/lustre/f102ha.sanity failed
            sanity: FAIL: test_124c 205 locks are not canceled
            sanity: FAIL: test_129 exceeded dir size limit 20480(1) : 24576 bytes
            sanity: FAIL: test_134b createmany finished incorrectly!
            sanity: FAIL: test_154f expected parent: [0x2000032e2:0x1f:0x0]/foo1, got:
            sanity: FAIL: test_154g test_154g failed with 1
            sanity: FAIL: test_205 Wrong changelog jobid count 0 != 9
            sanity: FAIL: test_208 get lease error
            sanity: FAIL: test_215 cannot read lnet.stats
            sanity: FAIL: test_220 test_220 failed with 6
            sanity: FAIL: test_300m set default stripes dir error
            sanity: FAIL: test_402 setdirstripe -i 0 failed

            simmonsja James A Simmons added a comment - In my own testing these are the sanity test I see fail. sanity: FAIL: test_29 test_29 failed with 1 sanity: FAIL: test_53 can not match last_seq/last_id for OST -osc sanity: FAIL: test_101f misses too much pages! sanity: FAIL: test_102a /lustre/lustre/f102a.sanity missing 3 trusted.name xattrs sanity: FAIL: test_102ha saving trusted.big on /lustre/lustre/f102ha.sanity failed sanity: FAIL: test_124c 205 locks are not canceled sanity: FAIL: test_129 exceeded dir size limit 20480(1) : 24576 bytes sanity: FAIL: test_134b createmany finished incorrectly! sanity: FAIL: test_154f expected parent: [0x2000032e2:0x1f:0x0] /foo1, got: sanity: FAIL: test_154g test_154g failed with 1 sanity: FAIL: test_205 Wrong changelog jobid count 0 != 9 sanity: FAIL: test_208 get lease error sanity: FAIL: test_215 cannot read lnet.stats sanity: FAIL: test_220 test_220 failed with 6 sanity: FAIL: test_300m set default stripes dir error sanity: FAIL: test_402 setdirstripe -i 0 failed

            Have pushed a patch to LU-7144 to diagnose why sanity-scrub and sanity-lfsck are being run for staging-next tests.

            adilger Andreas Dilger added a comment - Have pushed a patch to LU-7144 to diagnose why sanity-scrub and sanity-lfsck are being run for staging-next tests.
            adilger Andreas Dilger added a comment - - edited

            Running sanity.sh from master, now that LU-5030 patches are landed still produces a few errors, but is pretty good already:
            Subtest passes: 393/405

            The long "====" padding should be removed from the test descriptions.

            == sanity test 27E: check that default extended attribute size properly increases ==================== 21:09:00 (1454620140)
            error: set_param: opening /sys/fs/lustre/llite/lustre-ffff88006feee000/default_easize: Permission denied
             sanity test_27E: @@@@@@ FAIL: lctl set_param failed 
            

            I don't think the default_easize functionality is available. This could either be skipped or a patch ported to the kernel.

            == sanity test 29: IT_GETATTR regression  ============================================================ 21:09:06 (1454620146)
            first d29
            total 0
            -rw-r--r-- 1 root root 0 Feb  4 21:09 foo
            No mdc lock count
             sanity test_29: @@@@@@ FAIL: test_29 failed with 1 
            

            Not sure of cause. This also needs a proper "error" instead of "failed with 1".

            == sanity test 48b: Access removed working dir (should return errors)================================= 21:16:25 (1454620585)
            cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
             sanity test_48b: @@@@@@ FAIL: 'cd .' worked after removing cwd 
            

            Not sure about this one. Missing fix or kernel bug?

            == sanity test 48c: Access removed working subdir (should return errors) ============================= 21:16:29 (1454620589)
            cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
             sanity test_48c: @@@@@@ FAIL: 'cd .' worked after removing cwd 
            

            This test only worked for patched clients, it can be deleted completely.

            cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
             sanity test_48d: @@@@@@ FAIL: 'cd .' worked after recreate parent 
            

            This test only worked for patched clients, it can be deleted completely.

            == sanity test 56w: check lfs_migrate -c stripe_count works ========================================== 21:38:08 (1454621888)
            /usr/bin/lfs_migrate -y -c 6 /mnt/lustre/d56w.sanityw/file1
            /mnt/lustre/d56w.sanityw/file1: /usr/bin/lfs: /mnt/lustre/d56w.sanityw/file1: dataversion changed during copy, migration aborted
            cannot put lease: Invalid argument (22)
            error: migrate: migrate file '/mnt/lustre/d56w.sanityw/file1' failed
            falling back to rsync-based migration
            done
            /usr/bin/lfs migrate -i 0 /mnt/lustre/d56w.sanityw/migr_1_ost
            /usr/bin/lfs: /mnt/lustre/d56w.sanityw/migr_1_ost: dataversion changed
            

            Handling this separately in LU-7747 since the test should not be skipped.

            == sanity test 101f: check read-ahead for max_read_ahead_whole_mb ==================================== 21:53:19 (1454622799)
            Cancel LRU locks on lustre client to flush the client cache
            Reset readahead stats
            Random 4K reads on 2M file for 1000 times
            
            1.061329s, 3.85931MB/s
            checking missing pages
             sanity test_101f: @@@@@@ FAIL: misses too much pages!
            

            Not sure about this one.

            == sanity test 102a: user xattr test ================================================================= 21:53:25 (1454622805)
            set/get xattr...
            trusted.name1="value1"
            user.author1="author1"
            listxattr...
             sanity test_102a: @@@@@@ FAIL: /mnt/lustre/f102a.sanity missing 3 trusted.name xattrs
            

            Not sure.

            == sanity test 102b: getfattr/setfattr for trusted.lov EAs =========================================== 21:53:28 (1454622808)
            get/set/list trusted.lov xattr ...
             sanity test_102b: @@@@@@ FAIL: can't get trusted.lov from /mnt/lustre/f102b.sanity 
            

            Not sure.

            == sanity test 124c: LRUR cancel very aged locks ===================================================== 22:01:49 (1454623309)
            total: 100 creates in 0.26 seconds: 379.45 creates/second
            unused=205, max_age=36000000, recalc_p=10
            ldlm.namespaces.lustre-MDT0000-mdc-ffff8800729fd000.lru_max_age=1000
            sleep 20 seconds...
             sanity test_124c: @@@@@@ FAIL: 205 locks are not canceled 
            

            Should be skipped, or LRU aging fix ported to kernel.

            == sanity test 133f: Check for LBUGs/Oopses/unreadable files in /proc ================================ 22:03:46 (1454623426)
            16:04:38:[ 5252.564061] LustreError: 21743:0:(obd_class.h:1087:obd_statfs()) obd_statfs: dev 0 no operation
            16:04:38:[ 5526.640178] kworker/dying (67) used greatest stack depth: 10616 bytes left
            

            Filed LU-7748 for this, since it seems like a bug in the upstream kernel /sys handling, though it should probably be skipped in the short term.

            adilger Andreas Dilger added a comment - - edited Running sanity.sh from master, now that LU-5030 patches are landed still produces a few errors, but is pretty good already: Subtest passes: 393/405 The long "====" padding should be removed from the test descriptions. == sanity test 27E: check that default extended attribute size properly increases ==================== 21:09:00 (1454620140) error: set_param: opening /sys/fs/lustre/llite/lustre-ffff88006feee000/default_easize: Permission denied sanity test_27E: @@@@@@ FAIL: lctl set_param failed I don't think the default_easize functionality is available. This could either be skipped or a patch ported to the kernel. == sanity test 29: IT_GETATTR regression ============================================================ 21:09:06 (1454620146) first d29 total 0 -rw-r--r-- 1 root root 0 Feb 4 21:09 foo No mdc lock count sanity test_29: @@@@@@ FAIL: test_29 failed with 1 Not sure of cause. This also needs a proper "error" instead of "failed with 1". == sanity test 48b: Access removed working dir (should return errors)================================= 21:16:25 (1454620585) cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory sanity test_48b: @@@@@@ FAIL: 'cd .' worked after removing cwd Not sure about this one. Missing fix or kernel bug? == sanity test 48c: Access removed working subdir (should return errors) ============================= 21:16:29 (1454620589) cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory sanity test_48c: @@@@@@ FAIL: 'cd .' worked after removing cwd This test only worked for patched clients, it can be deleted completely. cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory sanity test_48d: @@@@@@ FAIL: 'cd .' worked after recreate parent This test only worked for patched clients, it can be deleted completely. == sanity test 56w: check lfs_migrate -c stripe_count works ========================================== 21:38:08 (1454621888) /usr/bin/lfs_migrate -y -c 6 /mnt/lustre/d56w.sanityw/file1 /mnt/lustre/d56w.sanityw/file1: /usr/bin/lfs: /mnt/lustre/d56w.sanityw/file1: dataversion changed during copy, migration aborted cannot put lease: Invalid argument (22) error: migrate: migrate file '/mnt/lustre/d56w.sanityw/file1' failed falling back to rsync-based migration done /usr/bin/lfs migrate -i 0 /mnt/lustre/d56w.sanityw/migr_1_ost /usr/bin/lfs: /mnt/lustre/d56w.sanityw/migr_1_ost: dataversion changed Handling this separately in LU-7747 since the test should not be skipped. == sanity test 101f: check read-ahead for max_read_ahead_whole_mb ==================================== 21:53:19 (1454622799) Cancel LRU locks on lustre client to flush the client cache Reset readahead stats Random 4K reads on 2M file for 1000 times 1.061329s, 3.85931MB/s checking missing pages sanity test_101f: @@@@@@ FAIL: misses too much pages! Not sure about this one. == sanity test 102a: user xattr test ================================================================= 21:53:25 (1454622805) set/get xattr... trusted.name1="value1" user.author1="author1" listxattr... sanity test_102a: @@@@@@ FAIL: /mnt/lustre/f102a.sanity missing 3 trusted.name xattrs Not sure. == sanity test 102b: getfattr/setfattr for trusted.lov EAs =========================================== 21:53:28 (1454622808) get/set/list trusted.lov xattr ... sanity test_102b: @@@@@@ FAIL: can't get trusted.lov from /mnt/lustre/f102b.sanity Not sure. == sanity test 124c: LRUR cancel very aged locks ===================================================== 22:01:49 (1454623309) total: 100 creates in 0.26 seconds: 379.45 creates/second unused=205, max_age=36000000, recalc_p=10 ldlm.namespaces.lustre-MDT0000-mdc-ffff8800729fd000.lru_max_age=1000 sleep 20 seconds... sanity test_124c: @@@@@@ FAIL: 205 locks are not canceled Should be skipped, or LRU aging fix ported to kernel. == sanity test 133f: Check for LBUGs/Oopses/unreadable files in /proc ================================ 22:03:46 (1454623426) 16:04:38:[ 5252.564061] LustreError: 21743:0:(obd_class.h:1087:obd_statfs()) obd_statfs: dev 0 no operation 16:04:38:[ 5526.640178] kworker/dying (67) used greatest stack depth: 10616 bytes left Filed LU-7748 for this, since it seems like a bug in the upstream kernel /sys handling, though it should probably be skipped in the short term.

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: