[LU-12298] Cannot Start Lustre 2.12.1 Server Created: 14/May/19  Updated: 08/Jul/20  Resolved: 16/May/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.1
Fix Version/s: Lustre 2.13.0, Lustre 2.12.2

Type: Question/Request Priority: Blocker
Reporter: Neale Petrillo (Inactive) Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: ORNL
Environment:

Centos 2.6 with kernel 3.10.0-957.10.1.el7
Mellannox 100GB Ethernet w/ RoCE


Issue Links:
Related
is related to LU-8384 convert startup scripts to systemd un... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

We are trying to stand up a new Lustre instance based on version 2.12.1. The DKMS packages downloaded from the Whamcloud website install without error. But, there doesn't seem to be a way to start the Lustre service. /usr/lib/systemd/system/lustre.service appears to only unmount a filesystem and there is no longer an /etc/rc.d/init.d/lustre

 

We found that removing both lnet.service and lustre.service from /usr/lib/systemd/system/ and recopying /etc/rc.d/init.d/lnet and /etc/rc.d/init.d/lustre from a working Lustre 2.10.3 system allowed us to start the system and mount the OSTs that we created with mkfs.lustre.

 

Creating an MGS/MDS with mkfs.lustre executes and running zpool list reports the expected values. The server is using the in kernel OFED for Mellanox ConnectX 100GBE cards. The packages we installed were: 

 

libuutil1-0.7.13-1.el7.x86_643.rpm
libzfs2-0.7.13-1.el7.x86_643.rpm
libzfs2-devel-0.7.13-1.el7.x86_643.rpm
libzpool2-0.7.13-1.el7.x86_643.rpm
lustre-2.12.1-1.el7.x86_643.rpm
lustre-osd-zfs-mount-2.12.1-1.el7.x86_643.rpm
lustre-resource-agents-2.12.1-1.el7.x86_643.rpm
lustre-zfs-dkms-2.12-1.el7.x86_643.rpm
spl-0.7.13-1.el7.x86_643.rpm
spl-debuginfo-0.7.13-1.el7.x86_643.rpm
spl-dkms-0.7.13-1.el7.x86_643.rpm
zfs-0.7.13-1.el7.x86_643.rpm
zfs-debuginfo-0.7.13-1.el7.x86_643.rpm
zfs-dkms-0.7.13-1.el7.x86_643.rpm
zfs-test-0.7.13-1.el7.x86_643.rpm

 

What am I missing here? Thanks!

 



 Comments   
Comment by Jesse Hanley [ 15/May/19 ]

We've hit this at ORNL with our 2.12.1 testing.

It appears commit 420d8c09887ff178508be0434373f74b5ef7ae6e for LU-8384 broke this. Previously, RHEL-family installations included the SysV init.d lustre script. This allowed systemd-sysv-generator to generate a systemd-managed service for it.

That commit introduced a systemd unit file for Lustre, but with none of the functionality (currently, the exec statement only does ExecStart=/bin/true , so pools are never imported, targets never started, etc). Systemd will prefer this unit file and there's no SysV file for it to fallback to. Additionally, that unit file attempts a full lustre_rmmod on service stop, which should probably only be done within the service stop of lnet. It currently hangs the service on a busy system.

Comment by Neale Petrillo (Inactive) [ 15/May/19 ]

Oh good, it's not just me! We tried building from source using the --disable-client flag but ran into trouble building the RPMs afterward then stopped debugging there.We managed to get a working version by copying the init.d script from version 12.0 into the 12.1 init.d area and deleting the systemd service file.

Do you think this should be marked as a regression?

Comment by Peter Jones [ 15/May/19 ]

Nathaniel

Can you please investigate?

Thanks

Peter

Comment by Nathaniel Clark [ 15/May/19 ]

Reverting LU-8384 should be okay.  It would be good to get the systemd compatibility block into lustre sysvinit.

Comment by Nathaniel Clark [ 15/May/19 ]

Revert of LU-8384 for b2_12

https://review.whamcloud.com/#/c/34868/

Comment by Gerrit Updater [ 15/May/19 ]

Nathaniel Clark (nclark@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34869
Subject: LU-12298 init: Add init info to lustre sysvinit script
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 0f8bcefbfcd6858e571ad75a99690240e6c9cf27

Comment by Gerrit Updater [ 15/May/19 ]

Nathaniel Clark (nclark@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34873
Subject: LU-12298 init: Add init info to lustre sysvinit script
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7dcc7f13bd978c1f38302715cced9061d71f8899

Comment by Philip B Curtis [ 16/May/19 ]

Tested this patch on our TDS at ORNL and it looks like it resolves this issue.

Comment by Gerrit Updater [ 16/May/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34869/
Subject: LU-12298 init: Add init info to lustre sysvinit script
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 6e91b341f3d3d3afeef46e1256e35564f8c9d0d1

Comment by Gerrit Updater [ 16/May/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34873/
Subject: LU-12298 init: Add init info to lustre sysvinit script
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: cd8a5f533d9df8aef4976d1a4c741d0f3673dd05

Comment by Peter Jones [ 16/May/19 ]

Landed for 2.13 and 2.12.2

Generated at Sat Feb 10 02:51:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.