[LU-16232] emergency llog cleanup server scripts Created: 11/Oct/22 Updated: 31/Aug/23 Resolved: 30/May/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Mikhail Pershin | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
There can be situations when update llog or changelog files are corrupted and we are just removing/truncating them. Scripts are needed to remove corrupted llogs properly with all plain llogs, so no orphaned data remains on server |
| Comments |
| Comment by Mikhail Pershin [ 12/Oct/22 ] |
|
Copied from EX-4969: For both scripts the steps to cleanup problematic llogs are:
# bash remove_(changelog|updatelog) -n <ldiskfs_mount>
# bash remove_(changelog|updatelog) -n -z /tmp/llogs_saved <ldiskfs_mount>
# ls -ali /tmp/llogs_saved.tar.gz # tar -tf /tmp/llog_saved.tar.gz
# bash remove_(changelog|updatelog) <ldiskfs_mount> Note: for better llogs compression xz can be used as well, pass it to the script via GZIP env variable: # GZIP=xz bash remove_(changelog|updatelog) -n -z /tmp/llogs_saved <ldiskfs_mount> Archive name will ends with .xz in that case instead of .gz |
| Comment by Gerrit Updater [ 12/Oct/22 ] |
|
"Mikhail Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48838 |
| Comment by Gerrit Updater [ 02/Nov/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48838/ |
| Comment by Peter Jones [ 02/Nov/22 ] |
|
Landed for 2.16 |
| Comment by Andreas Dilger [ 23/Dec/22 ] |
|
In further discussion, it would be useful to allow a "-m <mdt_index>" argument to the script to have it selectively delete only the update logs for the specified MDT index. Otherwise, if the script is run on a remote MDT, then it will delete the update logs for all other MDTs, and then those MDTs will then have issues during operation and/or later recovery. |
| Comment by Chris Hunter (Inactive) [ 09/Jan/23 ] |
|
I receive an error message when using the script: ./remove_updatelog.lu16232.sh: line 100: syntax error near unexpected token `<' ./remove_updatelog.lu16232.sh: line 100: ` read -r -d '' -a OPT_MDTS < <(hexdump -v -e '2/8 " %16x" 2/8 "\n"' $catlist |' appears to be an issue with shell pipe read -r -d '' -a OPT_MDTS < <(hexdump -v -e '2/8 " %16x" 2/8 "\n"' $catlist | awk '{print "[0x"$2":0x"$1":0x0]"}')
instead of read -r -d '' -a OPT_MDTS <<< $(hexdump -v -e '2/8 " %16x" 2/8 "\n"' $catlist | awk '{print "[0x"$2":0x"$1":0x0]"}')
|
| Comment by Mikhail Pershin [ 09/Jan/23 ] |
|
Chris, try it with bash : # bash ./remove_updatelog The syntax of '< <()' correct in bash (that is process substitution, creates temp file and redirects from it) but may be not in other shells. Your proposal is possible replacement so it is worth to update script in that manner but for now just call bash explicitly |
| Comment by Gerrit Updater [ 06/Apr/23 ] |
|
"Mikhail Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50558 |
| Comment by Mikhail Pershin [ 25/Apr/23 ] |
|
I've just added to the ticket another script which allows to collect update logs on live server node by using debugfs. It helps to get llogs for further analysis and replace multiple commands need to be executed for that |
| Comment by Gerrit Updater [ 06/May/23 ] |
|
"Yang Sheng <ys@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50876 |
| Comment by Gerrit Updater [ 19/May/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50558/ |
| Comment by Gerrit Updater [ 31/Aug/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50876/ |