[LU-15706] "lctl pool_destroy" can't work correctly due to "SKIP" records Created: 30/Mar/22  Updated: 20/Oct/23  Resolved: 27/Jun/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0, Lustre 2.15.2

Type: Bug Priority: Minor
Reporter: Emoly Liu Assignee: Emoly Liu
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-15142 LCTL: permanent parameter deletion an... Resolved
is related to LU-16324 conf-sanity test_123ae: old conf_para... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

"lctl pool_destroy" can't work correctly when MGS and MDT0 is from different node.

Then I did the following test on two nodes(centos7-2 and centos7-4) and found it failed because it didn't process "SKIP" pool llog records properly.

centos7-2 (MDT+2OST+client):

[root@centos7-2 tests]# MGSDEV=/tmp/lustre-mgs mgs_HOST=centos7-4 mdt_HOST=centos7-2 ost1_HOST=centos7-2 ost2_HOST=centos7-2 PDSH="pdsh -S -Rssh -w" sh llmount.sh

centos7-4(MGS):

[root@centos7-4 tests]# lctl pool_new lustre.testpool
Pool lustre.testpool created
[root@centos7-4 tests]# lctl pool_add lustre.testpool OST0000
OST lustre-OST0000_UUID added to pool lustre.testpool
[root@centos7-4 tests]# lctl pool_add lustre.testpool OST0001
OST lustre-OST0001_UUID added to pool lustre.testpool
[root@centos7-4 tests]# lctl pool_destroy lustre.testpool
Pool lustre.testpool not empty, please remove all members
pool_destroy: Directory not empty
[root@centos7-4 tests]# lctl pool_remove lustre.testpool OST0000
OST lustre-OST0000_UUID removed from pool lustre.testpool
[root@centos7-4 tests]# lctl pool_destroy lustre.testpool
Pool lustre.testpool destroyed

"testpool" was destroyed wrongly even OST0001 was still in the pool.

After checking the llog records, I found "SKIP pool add 0:lustre-clilov  1:lustre  2:testpool  3:lustre-OST0000_UUID" was skipped.

#43 (112)SKIP pool new 0:lustre-clilov  1:lustre  2:testpool  
#44 (224)SKIP END   marker  29 (flags=0x06, v2.15.0.0) lustre-clilov   'new lustre.testpool' Wed Mar 30 11:02:17 2022-Wed Mar 30 11:02:51 2022
#45 (224)SKIP START marker  31 (flags=0x05, v2.15.0.0) lustre-clilov   'add lustre.testpool.lustre-OST0000_UUID' Wed Mar 30 11:02:28 2022-Wed Mar 30 11:02:49 2022
#46 (136)SKIP pool add 0:lustre-clilov  1:lustre  2:testpool  3:lustre-OST0000_UUID  
#48 (224)SKIP END   marker  31 (flags=0x06, v2.15.0.0) lustre-clilov   'add lustre.testpool.lustre-OST0000_UUID' Wed Mar 30 11:02:28 2022-Wed Mar 30 11:02:49 2022
#49 (224)marker  33 (flags=0x01, v2.15.0.0) lustre-clilov   'add lustre.testpool.lustre-OST0001_UUID' Wed Mar 30 11:02:30 2022-
#50 (136)pool add 0:lustre-clilov  1:lustre  2:testpool  3:lustre-OST0001_UUID  
#51 (224)END   marker  33 (flags=0x02, v2.15.0.0) lustre-clilov   'add lustre.testpool.lustre-OST0001_UUID' Wed Mar 30 11:02:30 2022-
#52 (224)marker  35 (flags=0x01, v2.15.0.0) lustre-clilov   'rem lustre.testpool.lustre-OST0000_UUID' Wed Mar 30 11:02:49 2022-
#53 (136)pool remove 0:lustre-clilov  1:lustre  2:testpool  3:lustre-OST0000_UUID  
#54 (224)END   marker  35 (flags=0x02, v2.15.0.0) lustre-clilov   'rem lustre.testpool.lustre-OST0000_UUID' Wed Mar 30 11:02:49 2022-
#55 (224)marker  37 (flags=0x01, v2.15.0.0) lustre-clilov   'del lustre.testpool' Wed Mar 30 11:02:51 2022-
#56 (112)pool destroy 0:lustre-clilov  1:lustre  2:testpool  
#57 (224)END   marker  37 (flags=0x02, v2.15.0.0) lustre-clilov   'del lustre.testpool' Wed Mar 30 11:02:51 2022-

I will look into this issue.



 Comments   
Comment by Gerrit Updater [ 30/Mar/22 ]

"Emoly Liu <emoly@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46951
Subject: LU-15706 lctl: deal with "SKIP" pool record correctly
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 67aca5483a7718c75b82b9fb20c3ad1fccd4b700

Comment by Emoly Liu [ 01/Apr/22 ]

During Maloo test and my local test, I also noticed the following issue, that is, if jt_llog_print_iter() tries to get llog records more times in for loop, the "boundary" record is easy to be processed by llog_print_cb()->class_config_yaml_output() improperly, skipped or not skipped wrongly. 

For example, in the failure at https://testing.whamcloud.com/test_logs/ca4871c9-d298-4a2a-99f6-7449aa6fb803/show_text , it says "trevis-39vm5: OST lustre-OST0006_UUID is not part of the 'lustre' fs.". I can reproduce this issue locally, and I found the reason as follows:
The original llog records output by debugfs are:

#23 (128)attach    0:lustre-OST0000-osc  1:osc  2:lustre-clilov_UUID
#24 (144)setup     0:lustre-OST0000-osc  1:lustre-OST0000_UUID  2:192.168.0.118@tcp
#25 (128)lov_modify_tgts add 0:lustre-clilov  1:lustre-OST0000_UUID  2:0  3:1

But, the record output by class_config_yaml_output() shows 24th and 25th records were skipped wrongly, that is why OST0000 can't be found.

- { index: 22, event: add_uuid, nid: 192.168.0.118@tcp(0x20000c0a80076), node: 192.168.0.118@tcp }
- { index: 23, event: attach, device: lustre-OST0000-osc, type: osc, UUID: lustre-clilov_UUID }
======BOUNDARY======
- { index: 31, event: add_uuid, nid: 192.168.0.118@tcp(0x20000c0a80076), node: 192.168.0.118@tcp }
- { index: 32, event: attach, device: lustre-OST0001-osc, type: osc, UUID: lustre-clilov_UUID }

 

Another, in lost-pool.sh test23a failed due to "Pool lustre.testpool not empty, please remove all members pool_destroy: Directory not empty ", because the 1412th record was not skipped correctly.

The original llog records output by debugfs are:

#1412 (136)SKIP pool add 0:lustre-clilov  1:lustre  2:testpool  3:lustre-OST0003_UUID
#1413 (224)SKIP END   marker 939 (flags=0x06, v2.15.0.0) lustre-clilov   'add lustre.testpool.lustre-OST0003_UUID' Fri Apr  1 08:47:01 2022-Fri Apr  1 08:47:02 2022

But class_config_yaml_output() didn't add SKIP to this record properly, so that OST0003 was considered still in testpool wrongly.

=====BOUNDARY=====
- { index: 1412, event: add_pool, device: lustre-clilov, fsname: lustre, pool: testpool, ost: lustre-OST0003_UUID }
- { index: 1428, event: remove_pool, device: lustre-clilov, fsname: lustre, pool: testpool, ost: lustre-OST0001_UUID }

 

I will look into the following part of code in class_config_yaml_output() and make a fix.

        if (lcfg->lcfg_command == LCFG_MARKER) {
                struct cfg_marker *marker = lustre_cfg_buf(lcfg, 1);

                lustre_swab_cfg_marker(marker, swab,
                                       LUSTRE_CFG_BUFLEN(lcfg, 1));
                if (marker->cm_flags & CM_START) {
                        *cfg_flags = CFG_F_MARKER;
                        if (marker->cm_flags & CM_SKIP)
                                *cfg_flags = CFG_F_SKIP;
                } else if (marker->cm_flags & CM_END) {
                        *cfg_flags = 0;
                }
                if (likely(!raw))
                        return 0;
        }

        /* entries outside marker are skipped */
        if (!(*cfg_flags & CFG_F_MARKER) && !raw)
                return 0;

        /* inside skipped marker */
        if (*cfg_flags & CFG_F_SKIP && !raw)
                return 0;
Comment by Emoly Liu [ 06/Apr/22 ]

The root cause of this issue is that if the "start" several record in one ioctl request are just the ones "protected" by one marker, they will don't know their "SKIP or not" flag, and then will be mis-labeled in class_config_yaml_output().

I'm trying to remember the flag in llog_print_cb() to fix this issue.

Comment by Gerrit Updater [ 27/Jun/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46951/
Subject: LU-15706 llog: deal with "SKIP" pool llog records correctly
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 474f2670d63b66a77ee3acb72b18bc7b5afbec84

Comment by Peter Jones [ 27/Jun/22 ]

Landed for 2.16

Comment by Gerrit Updater [ 24/Nov/22 ]

"Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49230
Subject: LU-15706 llog: deal with "SKIP" pool llog records correctly
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: 8f3127f0e20992a630838db5f65daf4b008c5dc3

Comment by Gerrit Updater [ 06/Dec/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49230/
Subject: LU-15706 llog: deal with "SKIP" pool llog records correctly
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: 70accf10fac5032e185d981b0a0193aad9980371

Comment by Gerrit Updater [ 20/Oct/23 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52773
Subject: LU-15706 llog: deal with "SKIP" pool llog records correctly
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 30a7ee61140655d2dac1fd43ade538a71b384787

Generated at Sat Feb 10 03:20:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.