[LU-8816] test-framework needs to reload SPL module after set hostid Created: 09/Nov/16  Updated: 13/Jul/17  Resolved: 18/Nov/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Blocker
Reporter: Minh Diep Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-7134 Ensure ZFS hostid protection if servi... Resolved
is related to LU-8694 ZFS format fails when /etc/hostid is ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

recent failover testing shown failure

https://testing.hpdd.intel.com/test_sets/9560d0c2-a174-11e6-a031-5254006e85c2
https://testing.hpdd.intel.com/test_sets/f1680146-a58f-11e6-95cc-5254006e85c2



 Comments   
Comment by Gerrit Updater [ 09/Nov/16 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: http://review.whamcloud.com/23684
Subject: LU-8816 test: reload SPL module after set_hostid
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9e93a6a219ac22af49449ea147cb25d1af53c4c8

Comment by Nathaniel Clark [ 15/Nov/16 ]

There's an issue with ZFS code that doesn't call initial load of hostid (SPL::zone_gethostid()) until creation or importation of a pool.

Comment by Saurabh Tandan (Inactive) [ 15/Nov/16 ]

After Minh's comment above I retested and the testing could still not continue. Throwing the following error message:

11:45:05:onyx-37vm7: mkfs.lustre FATAL: spl_hostid not set. See mkfs.lustre(8)
11:45:05:onyx-37vm7: mkfs.lustre FATAL: mkfs failed 22
11:45:05:onyx-37vm7: mkfs.lustre: exiting with 22 (Invalid argument)
11:45:05:
11:45:05:   Permanent disk data:
11:45:05:Target:     lustre:MDT0000
11:45:05:Index:      0
11:45:05:Lustre FS:  lustre
11:45:05:Mount type: zfs
11:45:05:Flags:      0x65
11:45:05:              (MDT MGS first_time update )
11:45:05:Persistent mount opts: 
11:45:05:Parameters: failover.node=10.2.4.158@tcp sys.timeout=20 lov.stripesize=1048576 lov.stripecount=0 mdt.identity_upcall=/usr/sbin/l_getidentity
11:45:05:

Result - https://testing.hpdd.intel.com/test_sets/e141da18-ab74-11e6-986b-5254006e85c2

Comment by Peter Jones [ 16/Nov/16 ]

As I understand it, this issue is going to be handled in a DCO ticket

Comment by Gerrit Updater [ 16/Nov/16 ]

Nathaniel Clark (nathaniel.l.clark@intel.com) uploaded a new patch: http://review.whamcloud.com/23804
Subject: LU-8816 utils: Check /etc/hostid instead of failing for ZFS
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a30b786793ff47cd65a07d6112a4e3cffbeb385a

Comment by Minh Diep [ 16/Nov/16 ]

Peter,

Per Nate "There's an issue with ZFS code that doesn't call initial load of hostid (SPL::zone_gethostid()) until creation or importation of a pool."

So this is still an issue even after adding hostid in node provisioning

Comment by Nathaniel Clark [ 17/Nov/16 ]

I've uploaded a patch to fix the ZFS portion of mkfs.lustre and tunefs.lustre.

Comment by Gerrit Updater [ 18/Nov/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/23804/
Subject: LU-8816 utils: Check /etc/hostid instead of failing for ZFS
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 13a18655ac57b3f22813be7b44c627a2fb1c2396

Comment by Peter Jones [ 18/Nov/16 ]

Landed for 2.9

Generated at Sat Feb 10 02:20:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.