[LU-7069] Interop 2.5.3<->master DNE: sanity test_65a/test_65e failed: FAIL: lverify failed Created: 01/Sep/15 Updated: 03/Feb/18 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sarah Liu | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
client: 2.5.3 |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
sanity test_65a and test_65e failed as follow: == sanity test 65a: directory with no stripe info ====================== 19:57:23 (1441076243) Lustre: DEBUG MARKER: == sanity test 65a: directory with no stripe info ====================== 19:57:23 (1441076243) mkdir 1 for /mnt/lustre/d65a.sanity file1 stripe count 4 != dir 1 default stripe 1, ost count 4 sanity test_65a: @@@@@@ FAIL: lverify failed Lustre: DEBUG MARKER: sanity test_65a: @@@@@@ FAIL: lverify failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4343:error_noexit() = /usr/lib64/lustre/tests/test-framework.sh:4374:error() = sanity.sh:4955:test_65a() = /usr/lib64/lustre/tests/test-framework.sh:4613:run_one() = /usr/lib64/lustre/tests/test-framework.sh:4648:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:4516:run_test() = sanity.sh:4957:main() Dumping lctl log to /home/w3liu/toro_home/test_logs/sanity.test_65a.*.1441076244.log FAIL 65a (8s) == sanity test 65e: directory setstripe defaults ========================= 19:57:36 (1441076256) Lustre: DEBUG MARKER: == sanity test 65e: directory setstripe defaults ========================= 19:57:36 (1441076256) mkdir 1 for /mnt/lustre/d65e.sanity (Default) /mnt/lustre/d65e.sanity file1 stripe count 4 != dir 1 default stripe 1, ost count 4 sanity test_65e: @@@@@@ FAIL: lverify failed Lustre: DEBUG MARKER: sanity test_65e: @@@@@@ FAIL: lverify failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4343:error_noexit() = /usr/lib64/lustre/tests/test-framework.sh:4374:error() = sanity.sh:5007:test_65e() = /usr/lib64/lustre/tests/test-framework.sh:4613:run_one() = /usr/lib64/lustre/tests/test-framework.sh:4648:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:4516:run_test() = sanity.sh:5009:main() Dumping lctl log to /home/w3liu/toro_home/test_logs/sanity.test_65e.*.1441076257.log FAIL 65e (2s) |
| Comments |
| Comment by Andreas Dilger [ 01/Sep/15 ] |
|
Is this just a case where the test needs to be skipped because the client doesn't handle striped directories? |
| Comment by Di Wang [ 01/Sep/15 ] |
|
It is 2.5.3 client, so it should create a remote directory, instead of striped directory. mkdir 1 for /mnt/lustre/d65a.sanity file1 stripe count 4 != dir 1 Hmm, the file1 stripe count = 4, but d65a.sanity has no default stripeEA. Hmm, maybe it use global default stripeEA to create the file? And also it seems the test script itself also has such problem, i.e. when there are non defaultEA on dir, it still compares file stripe with dir default stripeEA, maybe it should compare file stripes with default global stripe. And clearly if we set global default stripe as 2, test_65a will also file. So maybe we only need fix the test script on 65a and 65e. [root@testnode tests]# cat /proc/fs/lustre/lov/lustre-MDT0000-mdtlov/stripecount
1
[root@testnode tests]# cat /proc/fs/lustre/lov/lustre-MDT000^C
[root@testnode tests]# echo 2 > /proc/fs/lustre/lov/lustre-MDT0000-mdtlov/stripecount
[root@testnode tests]# cat /proc/fs/lustre/lov/lustre-MDT0000-mdtlov/stripecount
2
[root@testnode tests]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/vg_testnode-lv_root
27228028 13741748 12096508 54% /
tmpfs 4006664 0 4006664 0% /dev/shm
/dev/sda1 487652 62375 399677 14% /boot
192.168.1.31:/Users/wangdi/work
243358976 195038592 48064384 81% /work
/dev/loop0 283512 2204 261396 1% /mnt/mds1
/dev/loop1 771012 17572 711920 3% /mnt/ost1
/dev/loop2 771012 17572 711920 3% /mnt/ost2
testnode@tcp:/lustre 1542024 35144 1423840 3% /mnt/lustre
[root@testnode tests]# ONLY=65a sh sanity.sh
Logging to shared log directory: /tmp/test_logs/1440995597
testnode: Checking config lustre mounted on /mnt/lustre
Checking servers environments
Checking clients testnode environments
Using TIMEOUT=20
disable quota as required
osd-ldiskfs.track_declares_assert=1
running as uid/gid/euid/egid 500/500/500/500, groups:
[touch] [/mnt/lustre/d0_runas_test/f5317]
excepting tests: 76 42a 42b 42c 42d 45 51d 68b
skipping tests SLOW=no: 24o 24D 27m 64b 68 71 77f 78 115 124b
preparing for tests involving mounts
mke2fs 1.42.12.wc1 (15-Sep-2014)
debug=-1
== sanity test 65a: directory with no stripe info ====================== 21:33:17 (1440995597)
file1 stripe count 2 != dir 1
default stripe 1, ost count 2
sanity test_65a: @@@@@@ FAIL: lverify failed
Trace dump:
= /work/lustre-release_new/lustre/tests/test-framework.sh:4748:error_noexit()
= /work/lustre-release_new/lustre/tests/test-framework.sh:4779:error()
= sanity.sh:5274:test_65a()
= /work/lustre-release_new/lustre/tests/test-framework.sh:5026:run_one()
= /work/lustre-release_new/lustre/tests/test-framework.sh:5063:run_one_logged()
= /work/lustre-release_new/lustre/tests/test-framework.sh:4880:run_test()
= sanity.sh:5276:main()
Dumping lctl log to /tmp/test_logs/1440995597/sanity.test_65a.*.1440995598.log
Dumping logs only on local client.
FAIL 65a (1s)
|
| Comment by Andreas Dilger [ 26/Jan/18 ] |
|
I think ll_dirstripe_verify needs to be updated to read the default stripe_count and stripe_size from the $MOUNT directory rather than from /proc/fs/lustre/lov/*/stripecount and .../stripesize. That will allow these tests to pass when the root directory has a different layout than what is stored by default in /proc. |
| Comment by Jian Yu [ 03/Feb/18 ] |
|
Hi Andreas, |
| Comment by Andreas Dilger [ 03/Feb/18 ] |
|
If the root directory does not have a composite layout, then "lfs getstripe -c" and "lfs getstripe -S" will return either the values from the root directory, or if there is none then the global defaults will be returned. That is the behaviour that we want for ll_dirstripe_verify. If the root directory has a composite layout, then the composite layout should be used for new file creation. This is much more complex, however, since the file's "stripe count" is a function of the file size, while the default layout on the root will always have "size == 0". However, it also looks like these tests only use "touch" to create the file, so that should be OK. I gave this a quick test, and I thought that "lfs getstripe -[cS]" on a composite file would return the value on the last ,initialized component, but it seems to be taking it from the last component, which will make this more complex to handle. I'm not against fixing the composite layout problem also, but as a starting point it would be good to fix the problem with the existing plain layout, and we can work on fixing composite layouts next. Another option would be to get rid of ll_dirstripe_verify completely, and use the "--yaml" option to dump the layout in a more-parsable format, and use that to compare the layouts (after removing the file-unique parts like FIDs, etc). |