[LU-12298] Cannot Start Lustre 2.12.1 Server Created: 14/May/19 Updated: 08/Jul/20 Resolved: 16/May/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.1 |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.2 |
| Type: | Question/Request | Priority: | Blocker |
| Reporter: | Neale Petrillo (Inactive) | Assignee: | Nathaniel Clark |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | ORNL | ||
| Environment: |
Centos 2.6 with kernel 3.10.0-957.10.1.el7 |
||
| Issue Links: |
|
||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
We are trying to stand up a new Lustre instance based on version 2.12.1. The DKMS packages downloaded from the Whamcloud website install without error. But, there doesn't seem to be a way to start the Lustre service. /usr/lib/systemd/system/lustre.service appears to only unmount a filesystem and there is no longer an /etc/rc.d/init.d/lustre
We found that removing both lnet.service and lustre.service from /usr/lib/systemd/system/ and recopying /etc/rc.d/init.d/lnet and /etc/rc.d/init.d/lustre from a working Lustre 2.10.3 system allowed us to start the system and mount the OSTs that we created with mkfs.lustre.
Creating an MGS/MDS with mkfs.lustre executes and running zpool list reports the expected values. The server is using the in kernel OFED for Mellanox ConnectX 100GBE cards. The packages we installed were:
libuutil1-0.7.13-1.el7.x86_643.rpm
What am I missing here? Thanks!
|
| Comments |
| Comment by Jesse Hanley [ 15/May/19 ] |
|
We've hit this at ORNL with our 2.12.1 testing. It appears commit 420d8c09887ff178508be0434373f74b5ef7ae6e for That commit introduced a systemd unit file for Lustre, but with none of the functionality (currently, the exec statement only does ExecStart=/bin/true , so pools are never imported, targets never started, etc). Systemd will prefer this unit file and there's no SysV file for it to fallback to. Additionally, that unit file attempts a full lustre_rmmod on service stop, which should probably only be done within the service stop of lnet. It currently hangs the service on a busy system. |
| Comment by Neale Petrillo (Inactive) [ 15/May/19 ] |
|
Oh good, it's not just me! We tried building from source using the --disable-client flag but ran into trouble building the RPMs afterward then stopped debugging there.We managed to get a working version by copying the init.d script from version 12.0 into the 12.1 init.d area and deleting the systemd service file. Do you think this should be marked as a regression? |
| Comment by Peter Jones [ 15/May/19 ] |
|
Nathaniel Can you please investigate? Thanks Peter |
| Comment by Nathaniel Clark [ 15/May/19 ] |
|
Reverting |
| Comment by Nathaniel Clark [ 15/May/19 ] |
|
Revert of |
| Comment by Gerrit Updater [ 15/May/19 ] |
|
Nathaniel Clark (nclark@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34869 |
| Comment by Gerrit Updater [ 15/May/19 ] |
|
Nathaniel Clark (nclark@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34873 |
| Comment by Philip B Curtis [ 16/May/19 ] |
|
Tested this patch on our TDS at ORNL and it looks like it resolves this issue. |
| Comment by Gerrit Updater [ 16/May/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34869/ |
| Comment by Gerrit Updater [ 16/May/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34873/ |
| Comment by Peter Jones [ 16/May/19 ] |
|
Landed for 2.13 and 2.12.2 |