[LU-10906] checksums parameter not persistent after reboot Created: 11/Apr/18  Updated: 07/Jan/19  Resolved: 06/Oct/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Minor
Reporter: Andreas Dilger Assignee: Emoly Liu
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-8066 Move lustre procfs handling to sysfs ... Open
is related to LU-11011 checksum type can not be selected per... Resolved
is related to LU-11058 sanity test_77k: checksum(1) != 0 Reopened
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Attempting to run

mgs$ lctl set_param -P osc.s1lcl*.checksums=0

to disable checksums for their clients does not return any errors, and a

mgs$ lctl get_param osc.s1lcl*.checksums

shows that it is properly set to 0 at that time. However, after a reboot of a client node, this parameter gets reset to 1. This behavior does not appear for any of the other parameters that they are setting - just the checksums. There is something preventing the checksums parameter from being set permanently.



 Comments   
Comment by Andreas Dilger [ 11/Apr/18 ]

It isn't clear why the lctl set_param -P setting is not persistent for this parameter. It might be caused by ordering issues during client mount, since there is a separate "-o checksum" and "-o nochecksum" mount option available (which is enabled by default) that is also trying to enable the checksum during mount.

Mounting the clients with "-o nochecksum" is verified to be working as expected (lctl get_param osc.*.checksums returns 0 in this case, and returns 1 without this parameter).

Some notes:

  • this is likely related to the handling of LL_SBI_CHECKSUM during mount clobbering the osc.*.checksums=0 parameter stored in the configuration. It looks like the OSC cl_checksum is already enabled by default in client_obd_setup() so there is little benefit to set it again from client_common_fill_super() at mount time, so fixing this issue may be as simple as making the following lines conditional on the checksum or nochecksum options actually being passed, rather than always overriding the OSC settings:
            checksum = sbi->ll_flags & LL_SBI_CHECKSUM;
            err = obd_set_info_async(NULL, sbi->ll_dt_exp, sizeof(KEY_CHECKSUM),
                                     KEY_CHECKSUM, sizeof(checksum), &checksum,
                                     NULL);
            if (err) {
                    CERROR("%s: Set checksum failed: rc = %d\n",
                           sbi->ll_dt_exp->exp_obd->obd_name, err);
                    GOTO(out_root, err);
            }
    
  • the llite.*.checksum_pages tunable is misnamed. This was (in 1.8) a tunable to enable per-page checksums at the llite layer, independent of the BRW RPC data checksums. It would be better to rename this llite.*.checksums to match osc.*.checksums, and leave checksum_pages for its original purpose of enabling per-page checksums (which was not implemented in the CLIO development).
Comment by Emoly Liu [ 19/Apr/18 ]

As Andreas said,  after a remount, this parameter is really set to 0 by "process_param2_config()) lctl: invoked upcall /usr/sbin/lctl set_param osc.lustre*.checksums=0", but it is changed later by

client_common_fill_super()
  ->obd_set_info_async()
    ->lov_set_info_async()
      ->osc_set_info_async()

int osc_set_info_async()
{
...
        if (KEY_IS(KEY_CHECKSUM)) {
                if (vallen != sizeof(int))
                        RETURN(-EINVAL);
                exp->exp_obd->u.cli.cl_checksum = (*(int *)val) ? 1 : 0;
                RETURN(0);
        }
...
}

 

Comment by Emoly Liu [ 20/Apr/18 ]

There are three ways to set checksum support in Lustre:

  • configure: --enable-checksum or --disable-checksum
  • mount option: -o checksum/nochecksum
  • lctl set_param: osc.*.checksums

My question is if "--disable-checksum" is specified when configuring, can we enable checksum again with "-o checksum" or osc.*.checksums=1? I think no.

Comment by Andreas Dilger [ 20/Apr/18 ]

It looks like "configure --enable-checksum" (ENABLE_CHECKSUM) only affects the default mount option in client_obd_setup() and ll_sbi_init(), but doesn't actually disable any of the code. That means it should be possible to enable checksums even if --disable-checksum is used. There does look like one minor bug in client_obd_setup() - the initialization of cl_supp_cksum_types is not done if ENABLE_CHECKSUM is unset, but it should probably always do that.

I think the call to obd_set_info_async() could be removed if the current setting matches the ENABLE_CHECKSUM definition and there isn't a checksum or nochecksum mount option. Maybe something like the following, to only change the OSC checksum values if it has been changed from the default value:

        checksum = sbi->ll_flags & LL_SBI_CHECKSUM;
#ifdef ENABLE_CHECKSUM           
        default_checksum = LL_SBI_CHECKSUM;
#endif
        if (checksum ^ default_checksum) {
                err = obd_set_info_async(NULL, sbi->ll_dt_exp, 
                                         sizeof(KEY_CHECKSUM), KEY_CHECKSUM, 
                                         sizeof(checksum), &checksum, NULL);
                if (err) {
                        CERROR("%s: Set checksum failed: rc = %d\n",
                               sbi->ll_dt_exp->exp_obd->obd_name, err);
                        GOTO(out_root, err);
                }
        }       
Comment by Emoly Liu [ 20/Apr/18 ]

Got, thanks!

Comment by Gerrit Updater [ 20/Apr/18 ]

Emoly Liu (emoly.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/32095
Subject: LU-10906 checksum: enable/disable checksum correctly
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: afe8d04a2aee007a9df086b7a1d70096d75d3330

Comment by Gerrit Updater [ 24/Apr/18 ]

Emoly Liu (emoly.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/32131
Subject: LU-10906 llite: rename llite checksum_pages to checksums
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0949d2acc3887d4673a617f46b01b6771c8a8695

Comment by Gerrit Updater [ 21/May/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/32095/
Subject: LU-10906 checksum: enable/disable checksum correctly
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e9b13cd1daf959b40e9ff5c810368c67e9491e06

Comment by Gerrit Updater [ 22/Sep/18 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33222
Subject: LU-10906 llite: create checksums to replace checksum_pages
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 34df45679fd854b903467a259102b7104dc2bd55

Comment by Gerrit Updater [ 05/Oct/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33222/
Subject: LU-10906 llite: create checksums to replace checksum_pages
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 123ee3cf96dd152c55f7e46042eba85cd23310a1

Comment by Peter Jones [ 06/Oct/18 ]

Landed for 2.12

Generated at Sat Feb 10 02:39:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.