[LU-7746] skip test of new functionality on upstream client Created: 05/Feb/16  Updated: 26/Feb/18  Resolved: 13/Sep/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: Lustre 2.11.0, Lustre 2.10.4

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: staging

Issue Links:
Related
is related to LU-7092 Interop 2.7.0<->master sanity test_13... Resolved
is related to LU-7144 Interop 2.7.0<->master- sanity-scrub ... Resolved
is related to LU-9679 Prepare lustre for adoption into the ... Resolved
is related to LU-7748 sanity test_133f: upstream kernel tim... Open
is related to LU-7782 sanity-scrub test_2: NULL pointer der... Resolved
is related to LU-7747 sanity test_56w: dataversion changed ... Resolved
is related to LU-8737 Lustre upstream client tree for testi... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Andreas Dilger <andreas.dilger@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/37f487fc-cbab-11e5-a59a-5254006e85c2.

The sub-test test_11 failed with the following error:

error on LL_IOC_LMV_SETSTRIPE '/mnt/lustre/d11.sanity-scrub/mds1' (3): Invalid argument
(1) Fail to mkdir /mnt/lustre/d11.sanity-scrub/mds1

The upstream client reports a Lustre build version of 2.3.64 (though it has bug fixes from after that point) and does not contain any of the DNE1/2 or HSM functionality, among other things.

Info required for matching: sanity-scrub 11
Info required for matching: sanity 27E
Info required for matching: sanity 29
Info required for matching: sanity 48b
Info required for matching: sanity 48c
Info required for matching: sanity 48d
Info required for matching: sanity 56w
Info required for matching: sanity 101f
Info required for matching: sanity 102a
Info required for matching: sanity 102b
Info required for matching: sanity 124c



 Comments   
Comment by Andreas Dilger [ 05/Feb/16 ]

Running sanity.sh from master, now that LU-5030 patches are landed still produces a few errors, but is pretty good already:
Subtest passes: 393/405

The long "====" padding should be removed from the test descriptions.

== sanity test 27E: check that default extended attribute size properly increases ==================== 21:09:00 (1454620140)
error: set_param: opening /sys/fs/lustre/llite/lustre-ffff88006feee000/default_easize: Permission denied
 sanity test_27E: @@@@@@ FAIL: lctl set_param failed 

I don't think the default_easize functionality is available. This could either be skipped or a patch ported to the kernel.

== sanity test 29: IT_GETATTR regression  ============================================================ 21:09:06 (1454620146)
first d29
total 0
-rw-r--r-- 1 root root 0 Feb  4 21:09 foo
No mdc lock count
 sanity test_29: @@@@@@ FAIL: test_29 failed with 1 

Not sure of cause. This also needs a proper "error" instead of "failed with 1".

== sanity test 48b: Access removed working dir (should return errors)================================= 21:16:25 (1454620585)
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
 sanity test_48b: @@@@@@ FAIL: 'cd .' worked after removing cwd 

Not sure about this one. Missing fix or kernel bug?

== sanity test 48c: Access removed working subdir (should return errors) ============================= 21:16:29 (1454620589)
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
 sanity test_48c: @@@@@@ FAIL: 'cd .' worked after removing cwd 

This test only worked for patched clients, it can be deleted completely.

cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
 sanity test_48d: @@@@@@ FAIL: 'cd .' worked after recreate parent 

This test only worked for patched clients, it can be deleted completely.

== sanity test 56w: check lfs_migrate -c stripe_count works ========================================== 21:38:08 (1454621888)
/usr/bin/lfs_migrate -y -c 6 /mnt/lustre/d56w.sanityw/file1
/mnt/lustre/d56w.sanityw/file1: /usr/bin/lfs: /mnt/lustre/d56w.sanityw/file1: dataversion changed during copy, migration aborted
cannot put lease: Invalid argument (22)
error: migrate: migrate file '/mnt/lustre/d56w.sanityw/file1' failed
falling back to rsync-based migration
done
/usr/bin/lfs migrate -i 0 /mnt/lustre/d56w.sanityw/migr_1_ost
/usr/bin/lfs: /mnt/lustre/d56w.sanityw/migr_1_ost: dataversion changed

Handling this separately in LU-7747 since the test should not be skipped.

== sanity test 101f: check read-ahead for max_read_ahead_whole_mb ==================================== 21:53:19 (1454622799)
Cancel LRU locks on lustre client to flush the client cache
Reset readahead stats
Random 4K reads on 2M file for 1000 times

1.061329s, 3.85931MB/s
checking missing pages
 sanity test_101f: @@@@@@ FAIL: misses too much pages!

Not sure about this one.

== sanity test 102a: user xattr test ================================================================= 21:53:25 (1454622805)
set/get xattr...
trusted.name1="value1"
user.author1="author1"
listxattr...
 sanity test_102a: @@@@@@ FAIL: /mnt/lustre/f102a.sanity missing 3 trusted.name xattrs

Not sure.

== sanity test 102b: getfattr/setfattr for trusted.lov EAs =========================================== 21:53:28 (1454622808)
get/set/list trusted.lov xattr ...
 sanity test_102b: @@@@@@ FAIL: can't get trusted.lov from /mnt/lustre/f102b.sanity 

Not sure.

== sanity test 124c: LRUR cancel very aged locks ===================================================== 22:01:49 (1454623309)
total: 100 creates in 0.26 seconds: 379.45 creates/second
unused=205, max_age=36000000, recalc_p=10
ldlm.namespaces.lustre-MDT0000-mdc-ffff8800729fd000.lru_max_age=1000
sleep 20 seconds...
 sanity test_124c: @@@@@@ FAIL: 205 locks are not canceled 

Should be skipped, or LRU aging fix ported to kernel.

== sanity test 133f: Check for LBUGs/Oopses/unreadable files in /proc ================================ 22:03:46 (1454623426)
16:04:38:[ 5252.564061] LustreError: 21743:0:(obd_class.h:1087:obd_statfs()) obd_statfs: dev 0 no operation
16:04:38:[ 5526.640178] kworker/dying (67) used greatest stack depth: 10616 bytes left

Filed LU-7748 for this, since it seems like a bug in the upstream kernel /sys handling, though it should probably be skipped in the short term.

Comment by Andreas Dilger [ 11/Feb/16 ]

Have pushed a patch to LU-7144 to diagnose why sanity-scrub and sanity-lfsck are being run for staging-next tests.

Comment by James A Simmons [ 12/Feb/16 ]

In my own testing these are the sanity test I see fail.

sanity: FAIL: test_29 test_29 failed with 1
sanity: FAIL: test_53 can not match last_seq/last_id for OST-osc
sanity: FAIL: test_101f misses too much pages!
sanity: FAIL: test_102a /lustre/lustre/f102a.sanity missing 3 trusted.name xattrs
sanity: FAIL: test_102ha saving trusted.big on /lustre/lustre/f102ha.sanity failed
sanity: FAIL: test_124c 205 locks are not canceled
sanity: FAIL: test_129 exceeded dir size limit 20480(1) : 24576 bytes
sanity: FAIL: test_134b createmany finished incorrectly!
sanity: FAIL: test_154f expected parent: [0x2000032e2:0x1f:0x0]/foo1, got:
sanity: FAIL: test_154g test_154g failed with 1
sanity: FAIL: test_205 Wrong changelog jobid count 0 != 9
sanity: FAIL: test_208 get lease error
sanity: FAIL: test_215 cannot read lnet.stats
sanity: FAIL: test_220 test_220 failed with 6
sanity: FAIL: test_300m set default stripes dir error
sanity: FAIL: test_402 setdirstripe -i 0 failed

Comment by Gerrit Updater [ 13/Feb/16 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/18442
Subject: LU-7746 tests: skip a few tests for upstream kernel
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a91a02f8676dcc3a124531a4dfb21371ed3ea7b5

Comment by Gerrit Updater [ 20/Apr/16 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/19663
Subject: LU-7746 tests: some fixes for sanity.sh with upstream kernel
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a5056f96b9d8c0b4f6175faf1ee5f7f48c4f6138

Comment by Gerrit Updater [ 22/Apr/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19663/
Subject: LU-7746 tests: some fixes for sanity.sh with upstream kernel
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c67a7f39cabbe8ef6d7c5a340b501aedaa748be6

Comment by Gerrit Updater [ 25/Aug/17 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/28718
Subject: LU-7746 tests: skip tests for older (upstream) client
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ac4f6fc8db0211e18c59ebd828931a067b36f17c

Comment by Andreas Dilger [ 25/Aug/17 ]

I pushed a patch to fs/linux-staging (based on the current tip of staging-next (v4.13-rc5-519-g30b7b04 "staging: lustre: lnet: cleanup paths for all LNet headers") and hit the following failures:
https://testing.hpdd.intel.com/test_sets/e759b6c0-8989-11e7-b50a-5254006e85c2

sanity: FAIL: test_27z FF stripe count 1 != 0  (PFL=fixed)
sanity: FAIL: test_27D llapi_layout_test failed (PFL=fixed)
sanity: FAIL: test_29 No mdc lock count
sanity: FAIL: test_77c no checksum dump file on Client
sanity: FAIL: test_101g unable to set max_pages_per_rpc=16M (16M=fixed)
sanity: FAIL: test_102a /mnt/lustre/f102a.sanity missing 3 trusted.name xattrs
sanity: FAIL: test_102b can't get trusted.lov from /mnt/lustre/f102b.sanity 
sanity: FAIL: test_102n setxattr invalid 'trusted.lov' success
sanity: FAIL: test_103a run_acl_subtest cp failed
sanity: FAIL: test_125 setfacl /mnt/lustre/d125 failed
sanity: FAIL: test_154B decode linkea /mnt/lustre/d154B.sanity/f154B.sanity failed
sanity: FAIL: test_154a setfacl /mnt/lustre/.lustre/fid/[0x200002b12:0xf:0x0] failed.'
sanity: FAIL: test_154g llapi_path2fid failed for '/mnt/lustre/d154g.sanity/llapi_fid_test_name_9585766' (fixed)
sanity: FAIL: test_160a MKDIR changelog mask count != 1
sanity: FAIL: test_160c TRUNC changelog mask count != 1
sanity: FAIL: test_160e changelog_clear failed with 2, expected 22 (EINVAL)
sanity: FAIL: test_161c flag is not 0x1
sanity: FAIL: test_161d create should be blocked
sanity: FAIL: test_162a path looked up "" instead of "d162a.sanity/d2/p/q/r/slink"
sanity: FAIL: test_205 Wrong changelog jobid count 0 != 9
sanity: FAIL: test_215 cannot read lnet.stats
sanity: FAIL: test_226a cannot get path of FIFO by /mnt/lustre /mnt/lustre/d226a.sanity/fifo
sanity: FAIL: test_242 ls /mnt/lustre/d242.sanity failed
sanity: FAIL: test_251 short write happened
sanity: FAIL: test_405 One layout swap locked test failed (fixed)
sanity: FAIL: test_410 no inode match (fixed)
sanity: FAIL: test_900 VFS: Busy inodes after unmount of lustre. Self-destruct in 5 seconds.

I've pushed a patch to skip a few of the tests based on the client version, but it looks like there are some real problems with the upstream client with handling xattrs and changelogs and some files in /sys.

Comment by Gerrit Updater [ 13/Sep/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28718/
Subject: LU-7746 tests: skip tests for older (upstream) client
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a8d33a29e77e102505ee3916782dc697ad121ff8

Comment by Peter Jones [ 13/Sep/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 31/Jan/18 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31111
Subject: LU-7746 tests: skip tests for older (upstream) client
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 96a70aea9c5f2fa84194e981a4bf6843074d7c56

Comment by Gerrit Updater [ 26/Feb/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31111/
Subject: LU-7746 tests: skip tests for older (upstream) client
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: e159c070bdf539adba8863a08c6c65cc46bab4ca

Generated at Sat Feb 10 02:11:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.