Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15095

lctl: error invoking upcall /usr/sbin/lctl set_param *.*.lbug_on_grant_miscount=1

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.15.0
    • Upstream
    • None
    • 3
    • 9223372036854775807

    Description

      I'm getting many messages like this:
      lctl: error invoking upcall /usr/sbin/lctl set_param ..lbug_on_grant_miscount=1
      in the logs.
      this was introduced in bb5d81ea95 ("LU-14543 target: prevent overflowing of tgd->tgd_tot_granted")
      I don't quite understand why this clearly debugging tunable needs to be persisten in the config logs?
      IMO, the better would be to have node-wide non-persisten tunable like osd's track_declare_assert

      Attachments

        Issue Links

          Activity

            [LU-15095] lctl: error invoking upcall /usr/sbin/lctl set_param *.*.lbug_on_grant_miscount=1

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45521/
            Subject: LU-15095 target: lbug_on_grant_miscount module parameter
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 2c787065441ee60c6c163dc77851d0964f81a89c

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45521/ Subject: LU-15095 target: lbug_on_grant_miscount module parameter Project: fs/lustre-release Branch: master Current Patch Set: Commit: 2c787065441ee60c6c163dc77851d0964f81a89c

            "Vladimir Saveliev <vlaidimir.saveliev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45521
            Subject: LU-15095 target: lbug_on_grant_miscount module parameter
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 91f914fe01c71280523c5fee3bf2d31db593c9e5

            gerrit Gerrit Updater added a comment - "Vladimir Saveliev <vlaidimir.saveliev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45521 Subject: LU-15095 target: lbug_on_grant_miscount module parameter Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 91f914fe01c71280523c5fee3bf2d31db593c9e5

            Do you mean to make it as parameter of module ptlrpc?

            yes

            bzzz Alex Zhuravlev added a comment - Do you mean to make it as parameter of module ptlrpc? yes

            in some cases this parameter is set (written) into the config few times.

            to reproduce it should be enough to run llmount.sh locally:

            [   30.040858] Lustre: Modifying parameter general.*.*.lbug_on_grant_miscount in log params
            [   30.161330] LustreError: 5029:0:(obd_config.c:1326:process_param2_config()) lctl: error invoking upcall /usr/sbin/lctl set_param *.*.lbug_on_grant_miscount=1: rc = -2; time 191us
            
            bzzz Alex Zhuravlev added a comment - in some cases this parameter is set (written) into the config few times. to reproduce it should be enough to run llmount.sh locally: [ 30.040858] Lustre: Modifying parameter general.*.*.lbug_on_grant_miscount in log params [ 30.161330] LustreError: 5029:0:(obd_config.c:1326:process_param2_config()) lctl: error invoking upcall /usr/sbin/lctl set_param *.*.lbug_on_grant_miscount=1: rc = -2; time 191us

            do we really need to save this to the log? why not use a variable like cfs_fail_loc or ldiskfs_track_declares_assert ?

            Do you mean to make it as parameter of module ptlrpc?

            int ldiskfs_track_declares_assert;
            module_param(ldiskfs_track_declares_assert, int, 0644);
            

            It sounds like a good idea, thanks.

            vsaveliev Vladimir Saveliev added a comment - do we really need to save this to the log? why not use a variable like cfs_fail_loc or ldiskfs_track_declares_assert ? Do you mean to make it as parameter of module ptlrpc? int ldiskfs_track_declares_assert; module_param(ldiskfs_track_declares_assert, int , 0644); It sounds like a good idea, thanks.
            vsaveliev Vladimir Saveliev added a comment - - edited

            Do we want lbug_on_grant_miscount set across reboots?

            Without that the parameter lbug_on_grant_miscount would get turned off in tests which include failover.
            That is, it was made permanent intentionally.

            I did not get "lctl: error invoking upcall" in my tests and will debug the issue.

             

            vsaveliev Vladimir Saveliev added a comment - - edited Do we want lbug_on_grant_miscount set across reboots? Without that the parameter lbug_on_grant_miscount would get turned off in tests which include failover. That is, it was made permanent intentionally. I did not get "lctl: error invoking upcall" in my tests and will debug the issue.  

            That is really good point. Do we want lbug_on_grant_miscount set across reboots?

            simmonsja James A Simmons added a comment - That is really good point. Do we want lbug_on_grant_miscount set across reboots?

            do we really need to save this to the log? why not use a variable like cfs_fail_loc or ldiskfs_track_declares_assert ?

            bzzz Alex Zhuravlev added a comment - do we really need to save this to the log? why not use a variable like cfs_fail_loc or ldiskfs_track_declares_assert ?

            I think the simple solution is change ..lbug_on_grant_miscount=1 to [mdt|obdfilter].*.lbug_on_grant_miscount=1

            simmonsja James A Simmons added a comment - I think the simple solution is change . .lbug_on_grant_miscount=1 to [mdt|obdfilter] .*.lbug_on_grant_miscount=1

            Looking at it with fresh eyes and I see the -P now. Looking at the error I see a corner case missed. Normally when setting persistent values an udev event is sent out. In this case it doesn't and ends up with the fall back of using the upcall which does fail. The reason for this is that in process_param2_config() we use kset_find_obj() to find a top level kobj to use to send the event. In this case its '*' which doesn't exist. In this case we should just use the top kset kobject to send the event.

            simmonsja James A Simmons added a comment - Looking at it with fresh eyes and I see the -P now. Looking at the error I see a corner case missed. Normally when setting persistent values an udev event is sent out. In this case it doesn't and ends up with the fall back of using the upcall which does fail. The reason for this is that in process_param2_config() we use kset_find_obj() to find a top level kobj to use to send the event. In this case its '*' which doesn't exist. In this case we should just use the top kset kobject to send the event.

            hmm, I think MGS was specified to make the parameter persistent, i.e. write it to the config logs for MDTs and OSTs (which do grants)

            bzzz Alex Zhuravlev added a comment - hmm, I think MGS was specified to make the parameter persistent, i.e. write it to the config logs for MDTs and OSTs (which do grants)

            People

              vsaveliev Vladimir Saveliev
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: