[LU-5986] conf-sanity test_83, test_84: failed to start OST Created: 05/Dec/14  Updated: 27/Nov/15  Resolved: 12/Dec/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Blocker
Reporter: Jian Yu Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: zfs
Environment:

FSTYPE=zfs


Issue Links:
Related
is related to LU-6029 conf-sanity test_84: recovery_duratio... Resolved
is related to LU-4119 recovery time hard doesn't limit reco... Resolved
is related to LU-5729 VFS: Busy inodes after unmount of loo... Resolved
Severity: 3
Rank (Obsolete): 16703

 Description   

Patch http://review.whamcloud.com/9078 for LU-4119 introduced a regression failure in conf-sanity test 83 under ZFS configuration:

start ost1 service on onyx-36vm8
CMD: onyx-36vm8 mkdir -p /mnt/ost1
CMD: onyx-36vm8 zpool list -H lustre-ost1 >/dev/null 2>&1 ||
			zpool import -f -o cachefile=none -d /dev/lvm-Role_OSS lustre-ost1
Starting ost1:   lustre-ost1/ost1 /mnt/ost1
CMD: onyx-36vm8 mkdir -p /mnt/ost1; mount -t lustre   		                   lustre-ost1/ost1 /mnt/ost1
onyx-36vm8: mount.lustre: mount lustre-ost1/ost1 at /mnt/ost1 failed: No such file or directory
onyx-36vm8: Is the MGS specification correct?
onyx-36vm8: Is the filesystem name correct?
onyx-36vm8: If upgrading, is the copied client log valid? (see upgrade docs)
Start of lustre-ost1/ost1 on ost1 failed 2

On OSS node:

07:55:10:Lustre: DEBUG MARKER: mkdir -p /mnt/ost1; mount -t lustre   		                   lustre-ost1/ost1 /mnt/ost1
07:55:10:LustreError: 2367:0:(obd_mount_server.c:1168:server_register_target()) lustre-OST0000: error registering with the MGS: rc = -2 (not fatal)
07:55:10:LustreError: 13a-8: Failed to get MGS log lustre-OST0000 and no local copy.

On MDS node:

08:56:17:Lustre: DEBUG MARKER: zfs get -H -o value lustre:svname 		                           lustre-mdt1/mdt1 2>/dev/null
08:56:17:Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1
08:56:17:LustreError: 13b-9: lustre-OST0000 claims to have registered, but this MGS does not know about it, preventing registration.
08:56:17:LustreError: 13b-9: lustre-OST0001 claims to have registered, but this MGS does not know about it, preventing registration.

Maloo reports:
https://testing.hpdd.intel.com/test_sets/77dccc26-6ccf-11e4-960c-5254006e85c2
https://testing.hpdd.intel.com/test_sets/f6eaf896-717d-11e4-b5de-5254006e85c2
https://testing.hpdd.intel.com/test_sets/04731068-7c08-11e4-bdab-5254006e85c2
https://testing.hpdd.intel.com/test_sets/bb09e096-7c0c-11e4-bdab-5254006e85c2

Info required for matching: conf-sanity 83



 Comments   
Comment by Oleg Drokin [ 05/Dec/14 ]

So I think the patch did not even consider there being such a thing as zfs.

Hopefully the fix is a straightforward easy change to a the test.
But if not, please ping me and I'll revert LU-4119 patch instead.

Comment by Nathaniel Clark [ 08/Dec/14 ]

Also the LU-4119 patch added a second test_83.

LU-5729 patch http://review.whamcloud.com/12325 added:

run_test 83 "ENOSPACE on OST doesn't cause message VFS: \
Busy inodes after unmount ..."

LU-4119 patch http://review.whamcloud.com/9078 added:

run_test 83 "check recovery_hard_time"

Both with their own test_83 functions.

Comment by Gerrit Updater [ 08/Dec/14 ]

James Simmons (uja.ornl@gmail.com) uploaded a new patch: http://review.whamcloud.com/12984
Subject: LU-5986 test: fix conflicting conf-sanity 83 test.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 829e13cad6f84706d3cd5e203fd7bdf6fd18ff19

Comment by James A Simmons [ 08/Dec/14 ]

Lets see if a simple fix corrects this.

Comment by Nathaniel Clark [ 08/Dec/14 ]

Testing an identical patch locally, and it appears that it does.

Comment by Andreas Dilger [ 08/Dec/14 ]

Nathaniel, does the simple patch in 12984 to create separate test numbers also fix the ZFS problem, or does that still need a second patch?

Comment by Gerrit Updater [ 09/Dec/14 ]

Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/12984/
Subject: LU-5986 test: fix conflicting conf-sanity 83 test.
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: feaeafe8186fafa4cb0c8180bd9746c4a7af809f

Comment by Nathaniel Clark [ 09/Dec/14 ]

It still seems to fail on ZFS, though test 84 passed in local testing, but I didn't run full conf-sanity, I just ran test_84. I think it's expecting a setup it's not getting for ZFS because of skipped test 83.

Comment by Andreas Dilger [ 09/Dec/14 ]

Nathaniel, it looks like this may be the last major blocker for enforcing review-zfs. Any chance to get a patch for this quickly?

Comment by Gerrit Updater [ 10/Dec/14 ]

Nathaniel Clark (nathaniel.l.clark@intel.com) uploaded a new patch: http://review.whamcloud.com/13016
Subject: LU-5986 test: Ensure correct start for conf-sanity/84
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a63e8110da2689482fbd2a112d34a7f69f555678

Comment by Gerrit Updater [ 11/Dec/14 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13016/
Subject: LU-5986 test: Ensure correct start for conf-sanity/84
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f2e76a69233a2fd348ae1f8af3c937048695a4a1

Comment by Andreas Dilger [ 12/Dec/14 ]

Patch landed to master for 2.7.0.

Generated at Sat Feb 10 01:56:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.