[LU-7069] Interop 2.5.3<->master DNE: sanity test_65a/test_65e failed: FAIL: lverify failed Created: 01/Sep/15  Updated: 03/Feb/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sarah Liu Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

client: 2.5.3
server: lustre-master build #3142 RHEL6.6 DNE


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity test_65a and test_65e failed as follow:

== sanity test 65a: directory with no stripe info ====================== 19:57:23 (1441076243)
Lustre: DEBUG MARKER: == sanity test 65a: directory with no stripe info ====================== 19:57:23 (1441076243)
mkdir 1 for /mnt/lustre/d65a.sanity
file1 stripe count 4 != dir 1

default stripe 1, ost count 4
 sanity test_65a: @@@@@@ FAIL: lverify failed 
Lustre: DEBUG MARKER: sanity test_65a: @@@@@@ FAIL: lverify failed
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4343:error_noexit()
  = /usr/lib64/lustre/tests/test-framework.sh:4374:error()
  = sanity.sh:4955:test_65a()
  = /usr/lib64/lustre/tests/test-framework.sh:4613:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:4648:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4516:run_test()
  = sanity.sh:4957:main()
Dumping lctl log to /home/w3liu/toro_home/test_logs/sanity.test_65a.*.1441076244.log
FAIL 65a (8s)

== sanity test 65e: directory setstripe defaults ========================= 19:57:36 (1441076256)
Lustre: DEBUG MARKER: == sanity test 65e: directory setstripe defaults ========================= 19:57:36 (1441076256)
mkdir 1 for /mnt/lustre/d65e.sanity
(Default) /mnt/lustre/d65e.sanity
file1 stripe count 4 != dir 1

default stripe 1, ost count 4
 sanity test_65e: @@@@@@ FAIL: lverify failed 
Lustre: DEBUG MARKER: sanity test_65e: @@@@@@ FAIL: lverify failed
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4343:error_noexit()
  = /usr/lib64/lustre/tests/test-framework.sh:4374:error()
  = sanity.sh:5007:test_65e()
  = /usr/lib64/lustre/tests/test-framework.sh:4613:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:4648:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4516:run_test()
  = sanity.sh:5009:main()
Dumping lctl log to /home/w3liu/toro_home/test_logs/sanity.test_65e.*.1441076257.log
FAIL 65e (2s)


 Comments   
Comment by Andreas Dilger [ 01/Sep/15 ]

Is this just a case where the test needs to be skipped because the client doesn't handle striped directories?

Comment by Di Wang [ 01/Sep/15 ]

It is 2.5.3 client, so it should create a remote directory, instead of striped directory.

mkdir 1 for /mnt/lustre/d65a.sanity
file1 stripe count 4 != dir 1

Hmm, the file1 stripe count = 4, but d65a.sanity has no default stripeEA. Hmm, maybe it use global default stripeEA to create the file? And also it seems the test script itself also has such problem, i.e. when there are non defaultEA on dir, it still compares file stripe with dir default stripeEA, maybe it should compare file stripes with default global stripe. And clearly if we set global default stripe as 2, test_65a will also file. So maybe we only need fix the test script on 65a and 65e.

[root@testnode tests]# cat /proc/fs/lustre/lov/lustre-MDT0000-mdtlov/stripecount 
1
[root@testnode tests]# cat /proc/fs/lustre/lov/lustre-MDT000^C
[root@testnode tests]# echo 2 > /proc/fs/lustre/lov/lustre-MDT0000-mdtlov/stripecount 
[root@testnode tests]# cat /proc/fs/lustre/lov/lustre-MDT0000-mdtlov/stripecount 
2
[root@testnode tests]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vg_testnode-lv_root
                      27228028  13741748  12096508  54% /
tmpfs                  4006664         0   4006664   0% /dev/shm
/dev/sda1               487652     62375    399677  14% /boot
192.168.1.31:/Users/wangdi/work
                     243358976 195038592  48064384  81% /work
/dev/loop0              283512      2204    261396   1% /mnt/mds1
/dev/loop1              771012     17572    711920   3% /mnt/ost1
/dev/loop2              771012     17572    711920   3% /mnt/ost2
testnode@tcp:/lustre   1542024     35144   1423840   3% /mnt/lustre
[root@testnode tests]# ONLY=65a sh sanity.sh
Logging to shared log directory: /tmp/test_logs/1440995597
testnode: Checking config lustre mounted on /mnt/lustre
Checking servers environments
Checking clients testnode environments
Using TIMEOUT=20
disable quota as required
osd-ldiskfs.track_declares_assert=1
running as uid/gid/euid/egid 500/500/500/500, groups:
 [touch] [/mnt/lustre/d0_runas_test/f5317]
excepting tests: 76 42a 42b 42c 42d 45 51d 68b
skipping tests SLOW=no: 24o 24D 27m 64b 68 71 77f 78 115 124b
preparing for tests involving mounts
mke2fs 1.42.12.wc1 (15-Sep-2014)

debug=-1


== sanity test 65a: directory with no stripe info ====================== 21:33:17 (1440995597)
file1 stripe count 2 != dir 1

default stripe 1, ost count 2
 sanity test_65a: @@@@@@ FAIL: lverify failed 
  Trace dump:
  = /work/lustre-release_new/lustre/tests/test-framework.sh:4748:error_noexit()
  = /work/lustre-release_new/lustre/tests/test-framework.sh:4779:error()
  = sanity.sh:5274:test_65a()
  = /work/lustre-release_new/lustre/tests/test-framework.sh:5026:run_one()
  = /work/lustre-release_new/lustre/tests/test-framework.sh:5063:run_one_logged()
  = /work/lustre-release_new/lustre/tests/test-framework.sh:4880:run_test()
  = sanity.sh:5276:main()
Dumping lctl log to /tmp/test_logs/1440995597/sanity.test_65a.*.1440995598.log
Dumping logs only on local client.
FAIL 65a (1s)

Comment by Andreas Dilger [ 26/Jan/18 ]

I think ll_dirstripe_verify needs to be updated to read the default stripe_count and stripe_size from the $MOUNT directory rather than from /proc/fs/lustre/lov/*/stripecount and .../stripesize. That will allow these tests to pass when the root directory has a different layout than what is stored by default in /proc.

Comment by Jian Yu [ 03/Feb/18 ]

Hi Andreas,
If the root directory has a composite layout and different components have different stripe options, which ones need to be used to compare with?

Comment by Andreas Dilger [ 03/Feb/18 ]

If the root directory does not have a composite layout, then "lfs getstripe -c" and "lfs getstripe -S" will return either the values from the root directory, or if there is none then the global defaults will be returned. That is the behaviour that we want for ll_dirstripe_verify.

If the root directory has a composite layout, then the composite layout should be used for new file creation. This is much more complex, however, since the file's "stripe count" is a function of the file size, while the default layout on the root will always have "size == 0". However, it also looks like these tests only use "touch" to create the file, so that should be OK.

I gave this a quick test, and I thought that "lfs getstripe -[cS]" on a composite file would return the value on the last ,initialized component, but it seems to be taking it from the last component, which will make this more complex to handle.

I'm not against fixing the composite layout problem also, but as a starting point it would be good to fix the problem with the existing plain layout, and we can work on fixing composite layouts next.

Another option would be to get rid of ll_dirstripe_verify completely, and use the "--yaml" option to dump the layout in a more-parsable format, and use that to compare the layouts (after removing the file-unique parts like FIDs, etc).

Generated at Sat Feb 10 02:05:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.