Details
-
Question/Request
-
Resolution: Fixed
-
Minor
-
None
-
Lustre 2.5.5
-
None
-
9223372036854775807
Description
Greetings,
We had an OST which was physically damaged recently on our Lustre 2.5.5 system. We were able to deactivate new file creation on the OST from the MDS (using lctl --device data-OST0036-osc-MDT0000 deactivate) , and lfs_migrate the data off, but then there were still quota problems when contacting the damaged OST. So, we tried to disable the OST from the client side as well.
That worked, but now there are stray messages from our MDS warning of “slow creates” to this supposedly disabled OST, and filesystem creates are now very slow:
Jul 2 08:40:21 mds1 kernel: Lustre: data-OST0036-osc-MDT0000: slow creates, last=[0x100360000:0xe4f61:0x0], next=[0x100360000:0xe4f61:0x0], reserved=0, syn_changes=0, syn_rpc_in_progress=0, status=-19
All of the below have been tried to fix this on the MDS:
lctl --device data-OST0036-osc-MDT0000 deactivate
lctl conf_param data-OST0036-osc-MDT0000.osc.active=0
lctl conf_param data-OST0036.osc.active=0
lctl set_param osp.data-OST0036-osc-MDT0000.active=0
lctl set_param osp.data-OST0036-*.max_create_count=0
On clients, the OST is disabled, and the logs show “Lustre: setting import data-OST0036_UUID INACTIVE by administrator request”:
client$ lctl get_param osc.*-OST0036*.active
osc.data-OST0036-osc-ffff882023331800.active=0
The MDS also believes this OST is inactive:
mds$ cat /proc/fs/lustre/osp/data-OST0036-osc-MDT0000/active
0
However, the slow creates message persists on the MDS, about one every 10 minutes, always with the same “last” and “next” ids. Is there something we have missed, or some other way this should have been resolved to permanently remove this OST?
We have not yet tried standing up a new OST at the same index, or restarting the MDS.
(Update: Standing up a new OST to replace the defunct blank one, and setting back to active, cleaned this up. It still would be nice to know the proper way to handle this situation, though.)
Thanks for any advice you may have,
Chris