Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.13.0, Lustre 2.12.6
-
None
-
CentOS 7.6
-
3
-
9223372036854775807
Description
This is related to LU-7668. The Lustre Manual, in section 14.9.3. Removing an OST from the File System, recommends to useĀ lctl conf_param ost_name.osc.active=0 to permanently disable OSTs.
We are trying to permanently disable 12 old empty OSTs on our Oak filesystem. We used commands like these:
lctl conf_param oak-OST0000.osc.active=0
Lustre logs seem to indicate it works OK:
20000000:02000400:16.0:1612760209.795689:0:334624:0:(mgs_llog.c:3964:mgs_write_log_param()) Permanently deactivating oak-OST0000
On already mounted clients, lctl dl shows the OBD status inactive:
[root@oak-rbh01 ~]# lctl dl | grep oak-OST0000 9 IN osc oak-OST0000-osc-ffff9125e10c3800 f532ae1d-6c67-fa34-deaa-5a130b24844f 4
Also, lfs df works as expected for already mounted clients:
[root@oak-rbh01 ~]# lfs df -v /oak | grep OST0000 OST0000 : inactive device
However, we have observed the following when using Lustre 2.12.6 after client remount:
- the OBD state as reported by lctl dl comes back to UP instead of IN
[root@oak-h01v10 ~]# lctl dl | grep oak-OST0000 9 UP osc oak-OST0000-osc-ffff9c5b6b90f800 523b8803-837d-acf8-a8e6-aae2d47585ac 3
- the OSC state, however, is properly set to 0
[root@oak-h01v10 ~]# cat /sys/fs/lustre/osc/oak-OST0000-osc-ffff9c5b6b90f800/active 0
- a lfs check osts reports the following error:
[root@oak-h01v10 ~]# lfs check osts lfs check: error: check 'oak-OST0000-osc-ffff9c5b6b90f800': Cannot allocate memory (12) ...
- lfs df shows the following error for the permanently deactivated OST:
OST0000 : Invalid argument
I'm attaching client logs of a remounting client. We can see that the OST is disabled:
00020000:01000000:0.0:1612805236.515621:0:2155:0:(lov_obd.c:166:lov_connect_obd()) not connecting OSC oak-OST0000_UUID; administratively disabled
It looks like at some point, the status of the OBD is not updated properly at mount time and this seems to be causing the confusion. Ideally, we would like to see the same behavior after client remount (IN in lctl dl and lfs df -v showing inactive device). Any ideas on how best to fix/improve this? Thanks!
Attachments
Issue Links
- is related to
-
LU-7668 permanently remove deactivated OSTs from configuration log
- Resolved