[LU-15430] Index cannot be reused after permanently removing OST Created: 11/Jan/22  Updated: 12/Jan/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.4
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Jiahao Li Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Epic/Theme: mgs
Severity: 3
Epic: server
Rank (Obsolete): 9223372036854775807

 Description   

when i follow the manual
1. mds: lctl set_param osp.lustre-OST0130*.max_create_count=0
2. client: lfs find ./ --ost 304 | lfs_migrate -y
3. mgs: lctl conf_param lustre-OST0130.osc.active=0
4. oss: umount /dev/ost304_dev
Then execute lfs df /mnt/<mountpoint> on the client and find that OST304 has disappeared, but the record of OST0130 can still be seen by using lctl dl | grep OST0130.
Finally, execute lctl --device MGS llog_print lustre-client | egrep "OST0130" on the MGS to obtain the llog index of OST0130, and then use lctl --device MGS llog_cancel lustre-client <index> to delete all OST0130 indexes.

 

#lctl --device MGS llog_print muzitest-client | grep OST0130
- { index: 80, event: attach, device: muzitest-OST0130-osc, type: osc, UUID: muzitest-clilov_UUID }
- { index: 81, event: setup, device: muzitest-OST0130-osc, UUID: muzitest-OST0130_UUID, node: 10.0.0.48@tcp }
- { index: 83, event: add_conn, device: muzitest-OST0130-osc, node: 10.0.0.48@tcp }
- { index: 84, event: add_osc, device: muzitest-clilov, ost: muzitest-OST0130_UUID, index: 304, gen: 1 }
- { index: 185, event: conf_param, device: muzitest-OST0130-osc, parameter: osc.active=0 } 

#lctl --device MGS llog_cancel muzitest-client 185
index 185 was canceled.
#lctl --device MGS llog_cancel muzitest-client 84
index 84 was canceled.
#lctl --device MGS llog_cancel muzitest-client 83
index 83 was canceled.
#lctl --device MGS llog_cancel muzitest-client 81
index 81 was canceled.
#lctl --device MGS llog_cancel muzitest-client 80
index 80 was canceled.

Then execute lctl dl | grep OST0130 on the client and find that the record of OST0130 is gone.
Suppose I want to restore OST130 at this time, after oss executes mount.lustre -o max_sectors_kb=128 /dev/vdb /mnt/lustre_OST0130, after mgs executes lctl conf_param muzitest-OST0130.osc.active=1, it does not work on the client side See OST0130 recovery, and remount after client umount, there will be the following error

mount.lustre 10.0.0.32:/lustre /mnt/lustre/
mount.lustre: mount 10.0.0.32:/lustre at /mnt/lustre failed: Invalid argument
This may have multiple causes.
Is 'lustre' the correct filesystem name?
Are the mount options correct?
Check the syslog for more info. 

Check in lctl --device MGS llog_print muzitest-client | grep OST0130 , and see that there is a newly generated llog log. After deleting the log, the mount is restored, but OST0130 still cannot be restored. What should I do in this situation, thank you very much

 



 Comments   
Comment by Etienne Aujames [ 12/Jan/22 ]

Hello,

The procedure that you tried came from the LU-7668. You could get additional information from there.

It seems you remove OST0130 from client configurations but not on the MDT configurations ($fsname-MDTxxxx) on the MGS.
You could verify that by executing on the MGS:

lctl --device MGS llog_print muzitest-MDT0000 | egrep  "OST0130|10.0.0.48@tcp"

To list the MGS's configurations files you could use (there is bug on that tool for 2.12.4 : LU-13609):

lctl --device MGS llog_catlist

If you have a backup of " muzitest-client" (before removing the target) you could restore it by mounting your MGT target in ldiskfs and then restore it in CONFIGS dir.
If not you could follow the "--replace" procedure: https://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#lustremaint.restore_ost.
If you mess up your configurations you have to "--writeconf": https://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#lustremaint.regenerateConfigLogs

I am not an expert on that subject, so please be careful and double check this information.

Generated at Sat Feb 10 03:18:15 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.