[LU-7497] conf-sanity test_32b: FAIL: list verification failed and test_32b failed with 4 Created: 30/Nov/15  Updated: 23/Mar/17  Resolved: 23/Mar/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Minor
Reporter: Saurabh Tandan (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: SELinux
Environment:

1 Client, 1 MDT, 2 OST (master)


Attachments: Text File conf-sanity.test_32b.debug_log.eagle-52vm5.1448327304.log     Text File conf-sanity.test_32b.debug_log.eagle-52vm5.1448327407.log     Text File conf-sanity.test_32b.debug_log.eagle-52vm5.1448327511.log     Text File conf-sanity.test_32b.debug_log.eagle-52vm5.1448327619.log     Text File conf-sanity.test_32b.debug_log.eagle-52vm5.1448327634.log     Text File conf-sanity.test_32b.dmesg.eagle-52vm5.1448327304.log     Text File conf-sanity.test_32b.dmesg.eagle-52vm5.1448327407.log     Text File conf-sanity.test_32b.dmesg.eagle-52vm5.1448327511.log     Text File conf-sanity.test_32b.dmesg.eagle-52vm5.1448327619.log     Text File conf-sanity.test_32b.dmesg.eagle-52vm5.1448327634.log     Text File conf-sanity.test_32b.test_log.eagle-52vm5.log    
Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

conf-sanity test_32b failed for SElinux enabled client with the following error:

conf-sanity test_32b: @@@@@@ FAIL: list verification failed 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4812:error_noexit()
  = /usr/lib64/lustre/tests/conf-sanity.sh:2069:t32_test()
  = /usr/lib64/lustre/tests/conf-sanity.sh:2203:test_32b()
  = /usr/lib64/lustre/tests/test-framework.sh:5090:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:5127:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4992:run_test()
  = /usr/lib64/lustre/tests/conf-sanity.sh:2207:main()
Dumping lctl log to /tmp/test_logs/2015-11-23/162346/conf-sanity.test_32b.*.1448327619.log
eagle-52vm5: Host key verification failed.
eagle-52vm5: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
eagle-52vm5: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
eagle-52vm3: Host key verification failed.
eagle-52vm3: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
eagle-52vm3: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
eagle-52vm2: Host key verification failed.
eagle-52vm2: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
eagle-52vm2: rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
 conf-sanity test_32b: @@@@@@ FAIL: test_32b failed with 4 



 Comments   
Comment by Joseph Gmitter (Inactive) [ 01/Dec/15 ]

Hi Saurabh,
This looks to be a test change as well. can you please look into this?
Thanks.
Joe

Comment by Andreas Dilger [ 09/Sep/16 ]

Getting lots of these failures on master:
https://testing.hpdd.intel.com/test_sets/e61527be-7615-11e6-8a8c-5254006e85c2
https://testing.hpdd.intel.com/test_sets/2496eed4-757e-11e6-8afd-5254006e85c2
https://testing.hpdd.intel.com/test_sets/10f454ec-75fc-11e6-b08e-5254006e85c2
https://testing.hpdd.intel.com/test_sets/1660ae68-74bb-11e6-b08e-5254006e85c2
https://testing.hpdd.intel.com/test_sets/9569d428-73e4-11e6-8a8c-5254006e85c2
https://testing.hpdd.intel.com/test_sets/550c1ca2-73d9-11e6-8a8c-5254006e85c2
https://testing.hpdd.intel.com/test_sets/82d686e2-615d-11e6-aa74-5254006e85c2
:
:

Comment by Niu Yawei (Inactive) [ 12/Sep/16 ]

Two kinds of failures in above maloo links:

1. "list verification failed" (for zfs testing only):

+144115205255725229 -rwxr-xr-x 1 0 0 0 1282163949 acpid
+144115205255725271 -rwxr-xr-x 1 0 0 0 1298801889 munge
+144115205255725264 -rwxr-xr-x 1 0 0 0 1309000537 oddjobd
+144115205255725247 -rwxr-xr-x 1 0 0 0 1311100147 haldaemon
+144115205255725219 -rwxr-xr-x 1 0 0 0 1324140347 rngd
+144115205255725215 -rwxr-xr-x 1 0 0 0 1333467218 portreserve
+144115205255725245 -rwxr-xr-x 1 0 0 0 1342512140 psacct
+144115205255725235 -rwxr-xr-x 1 0 0 0 1347555868 messagebus
+144115205255725243 -rwxr-xr-x 1 0 0 0 1353412378 saslauthd
+144115205255725233 -rwxr-xr-x 1 0 0 0 1361485582 smartd
...

From the log we can see all the file size of regular files become zero somehow. (size of directory & symlinks are still correct), I suppose there is something wrong on retrieving file layout for those regular files, because I didn't find any activities in OST log, which means no glimpse was issues to OST (client & MDT logs are truncated, so it's just a reasonable conjecture.)

Perhaps we'd have some zfs expert to take a look to see if there a known defect that makes zfs fail to read the xattr storing file layout? (is there any known zfs interoperability issue?)

2. "verify quota failed" (for 2.7 image only):

Disk quotas for user 60000 (uid 60000):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
/tmp/t32/mnt/lustre
                  30720       0   40960       -       1       0       4       -
t32fs-MDT0000_UUID
                      0       -       0       -       1       -       0       -
t32fs-OST0000_UUID
                  30720       -       0       -       -       -       -       -
Total allocated inode limit: 0, total allocated block limit: 0
ispace, act:1, exp:3
 conf-sanity test_32b: @@@@@@ FAIL: verify quota failed 

That's kind of testing environment problem, the client version is b2_7_fe, and server version is b2_8_fe, but it tried to test "disk2_7-ldiskfs.tar.bz2" image unexpectedly, that will fail of course, because the script of b2_7_fe doesn't support multiple MDTs, but disk2_7 image has 2 MDTs. I suppose the disk2_7 image is leftover from previous installing?

Comment by Andreas Dilger [ 03/Mar/17 ]

+1 for ZFS https://testing.hpdd.intel.com/test_sets/2cb46a64-ff72-11e6-9df3-5254006e85c2

Comment by Jinshan Xiong (Inactive) [ 11/Mar/17 ]

it turned out that the tarball 'disk2_4-zfs.tar.bz2' has mgs enabled for OST device so the starting of ost failed because MGS has already been started along with the MDT.

I'm talking about the test failure with error message 'test_32b failed with 1'

Comment by Gerrit Updater [ 11/Mar/17 ]

Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/25940
Subject: LU-7497 tests: Fix test failure in conf-sanity 32b
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 831542123aaa9bafe48e0d5adcc7b89857570993

Comment by Gerrit Updater [ 23/Mar/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25940/
Subject: LU-7497 tests: Fix test failure in conf-sanity 32b
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c9f12540412a429b824dbc83c1c8b00c8affe28a

Comment by Peter Jones [ 23/Mar/17 ]

Landed for 2.10

Comment by Jinshan Xiong (Inactive) [ 23/Mar/17 ]

there could be other test failure with conf-sanity:32 - so let's reopen it after more is seen

Generated at Sat Feb 10 02:09:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.