Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12529

How to remove bad config values from MGS logs

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • None
    • None
    • None
    • lustre 2.10.
    • 9223372036854775807

    Description

      We have some bad config vlaues.

      Jul  9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'lov.nbp7-MDT0000-mdtlov.stripecount=1'' failed with exit code 2.
      Jul  9 16:13:23 testpbs systemd-udevd[98110]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST0026-osc-MDT0000.active=0'' failed with exit code 2.
      Jul  9 16:13:23 testpbs systemd-udevd[98110]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST002a-osc-MDT0000.active=0'' failed with exit code 2.
      Jul  9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST0046-osc-MDT0000.active=0'' failed with exit code 2.
      Jul  9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST003e-osc-MDT0000.active=0'' failed with exit code 2.
      Jul  9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST003a-osc-MDT0000.active=0'' failed with exit code 2.
      Jul  9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST002e-osc-MDT0000.active=0'' failed with exit code 2.
      Jul  9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST0032-osc-MDT0000.active=0'' failed with exit code 2.
      Jul  9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST004e-osc-MDT0000.active=0'' failed with exit code 2.
      Jul  9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST0042-osc-MDT0000.active=0'' failed with exit code 2.
      Jul  9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST0052-osc-MDT0000.active=0'' failed with exit code 2.
      Jul  9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST004a-osc-MDT0000.active=0'' failed with exit code 2.
      Jul  9 16:13:23 testpbs systemd-udevd[98107]: Process '/usr/sbin/lctl set_param 'osc.nbp7-OST0036-osc-MDT0000.active=0'' failed with exit code 2.
       

      How do we remove these I tried like this.
      lctl set_param -P -d "osc.nbp7-OST0036-osc-MDT0000.active"
      but didn't remove the entry from the config file.

      Attachments

        Issue Links

          Activity

            [LU-12529] How to remove bad config values from MGS logs

            The various issues in this ticket have been resolved, or are fixed in a newer release.

            adilger Andreas Dilger added a comment - The various issues in this ticket have been resolved, or are fixed in a newer release.

            It looks like the MGS needs patch https://review.whamcloud.com/34250:

            LU-4939 obdclass: llog_print params file
                
                Allow llog_print to handle the params file in yaml
            

            which was landed to b2_10 as v2_10_6-45-gfb77f09ac8, so it would be included into 2.10.7 and later. My home/test server is running only 2.10.5.

            adilger Andreas Dilger added a comment - It looks like the MGS needs patch https://review.whamcloud.com/34250: LU-4939 obdclass: llog_print params file Allow llog_print to handle the params file in yaml which was landed to b2_10 as v2_10_6-45-gfb77f09ac8 , so it would be included into 2.10.7 and later. My home/test server is running only 2.10.5.

            What about the Inappropriate ioctl error.

            nbp2-mds ~ # lctl --device MGS llog_print params 
            OBD_IOC_LLOG_PRINT failed: Inappropriate ioctl for device
            

            I tested this out locally on my 2.10 system and got the same error. I thought that llog_print was working for the params file, and there is even a regression test for this (conf-sanity.sh test_123ab(), so it isn't clear why this isn't working.

            adilger Andreas Dilger added a comment - What about the Inappropriate ioctl error. nbp2-mds ~ # lctl --device MGS llog_print params OBD_IOC_LLOG_PRINT failed: Inappropriate ioctl for device I tested this out locally on my 2.10 system and got the same error. I thought that llog_print was working for the params file, and there is even a regression test for this ( conf-sanity.sh test_123ab() , so it isn't clear why this isn't working.
            mhanafi Mahmoud Hanafi added a comment - - edited

            What about the Inappropriate ioctl error.

             nbp2-mds ~ # lctl --device MGS llog_print params 
            OBD_IOC_LLOG_PRINT failed: Inappropriate ioctl for device
            

             

             brk(0x67d000)                           = 0x67d000
            brk(NULL)                               = 0x67d000
            open("/dev/obd", O_RDWR)                = 3
            ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x66, 0x7f, 0x08), 0x7fffffffc9f0) = 0
            ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x66, 0xc0, 0x08), 0x7fffffffc9f0) = -1 ENOTTY (Inappropriate ioctl for device)
            write(2, "OBD_IOC_LLOG_PRINT failed: Inapp"..., 58OBD_IOC_LLOG_PRINT failed: Inappropriate ioctl for device
            ) = 58
            rt_sigaction(SIGINT, {0x405ef0, ~[RTMIN RT_1], SA_RESTORER|SA_RESTART, 0x7fffed6d65d0}, NULL, 8) = 0
            exit_group(1)                           = ?
            +++ exited with 1 +++
            
            mhanafi Mahmoud Hanafi added a comment - - edited What about the Inappropriate ioctl error. nbp2-mds ~ # lctl --device MGS llog_print params OBD_IOC_LLOG_PRINT failed: Inappropriate ioctl for device   brk(0x67d000) = 0x67d000 brk(NULL) = 0x67d000 open( "/dev/obd" , O_RDWR) = 3 ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x66, 0x7f, 0x08), 0x7fffffffc9f0) = 0 ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x66, 0xc0, 0x08), 0x7fffffffc9f0) = -1 ENOTTY (Inappropriate ioctl for device) write(2, "OBD_IOC_LLOG_PRINT failed: Inapp" ..., 58OBD_IOC_LLOG_PRINT failed: Inappropriate ioctl for device ) = 58 rt_sigaction(SIGINT, {0x405ef0, ~[RTMIN RT_1], SA_RESTORER|SA_RESTART, 0x7fffed6d65d0}, NULL, 8) = 0 exit_group(1) = ? +++ exited with 1 +++

            mhanafi the bad config records may appear in either the "params" log or in the "fsname-client" config log. The name of the logfile passed to "lctl llog_cancel" needs to match.

            If "llog_print" complains about the log file being too large, you can grab the lctl binary from 2.10.7 or later, or apply the patch https://review.whamcloud.com/3381 "LU-11566 utils: fix lctl llog_print for large configs". This is purely a userspace problem and can be solved with an updated lctl binary, no need to update the kernel
            Modules or restart.

            adilger Andreas Dilger added a comment - mhanafi the bad config records may appear in either the " params " log or in the " fsname-client " config log. The name of the logfile passed to " lctl llog_cancel " needs to match. If " llog_print " complains about the log file being too large, you can grab the lctl binary from 2.10.7 or later, or apply the patch https://review.whamcloud.com/3381  " LU-11566 utils: fix lctl llog_print for large configs ". This is purely a userspace problem and can be solved with an updated lctl binary, no need to update the kernel Modules or restart.

            They are all running the same luster version.

             nbp7-mds1 ~ # modinfo lustre
            filename:       /lib/modules/3.10.0-693.21.1.el7.20180508.x86_64.lustre2106/extra/lustre/fs/lustre.ko
            license:        GPL
            version:        2.10.6
            description:    Lustre Client File System
            author:         OpenSFS, Inc. <http://www.lustre.org/>
            retpoline:      Y
            rhelversion:    7.4
            srcversion:     E459483EA54C83D0585ECA3
            depends:        obdclass,ptlrpc,libcfs,lnet,lmv,mdc,lov
            vermagic:       3.10.0-693.21.1.el7.20180508.x86_64.lustre2106 SMP mod_unload modversions 
            
            
            nbp2-mds ~ # modinfo lustre
            filename:       /lib/modules/3.10.0-693.21.1.el7.20180508.x86_64.lustre2106/extra/lustre/fs/lustre.ko
            license:        GPL
            version:        2.10.6
            description:    Lustre Client File System
            author:         OpenSFS, Inc. <http://www.lustre.org/>
            retpoline:      Y
            rhelversion:    7.4
            srcversion:     E459483EA54C83D0585ECA3
            depends:        obdclass,ptlrpc,libcfs,lnet,lmv,mdc,lov
            vermagic:       3.10.0-693.21.1.el7.20180508.x86_64.lustre2106 SMP mod_unload modversions 
            
            mhanafi Mahmoud Hanafi added a comment - They are all running the same luster version. nbp7-mds1 ~ # modinfo lustre filename: /lib/modules/3.10.0-693.21.1.el7.20180508.x86_64.lustre2106/extra/lustre/fs/lustre.ko license: GPL version: 2.10.6 description: Lustre Client File System author: OpenSFS, Inc. <http: //www.lustre.org/> retpoline: Y rhelversion: 7.4 srcversion: E459483EA54C83D0585ECA3 depends: obdclass,ptlrpc,libcfs,lnet,lmv,mdc,lov vermagic: 3.10.0-693.21.1.el7.20180508.x86_64.lustre2106 SMP mod_unload modversions nbp2-mds ~ # modinfo lustre filename: /lib/modules/3.10.0-693.21.1.el7.20180508.x86_64.lustre2106/extra/lustre/fs/lustre.ko license: GPL version: 2.10.6 description: Lustre Client File System author: OpenSFS, Inc. <http: //www.lustre.org/> retpoline: Y rhelversion: 7.4 srcversion: E459483EA54C83D0585ECA3 depends: obdclass,ptlrpc,libcfs,lnet,lmv,mdc,lov vermagic: 3.10.0-693.21.1.el7.20180508.x86_64.lustre2106 SMP mod_unload modversions

            What do you mean by "server versions"?  These commands need to be run on the MGS.

            pfarrell Patrick Farrell (Inactive) added a comment - What do you mean by "server versions"?  These commands need to be run on the MGS.
            mhanafi Mahmoud Hanafi added a comment - - edited

            On nbp7 I already ran Andreas command and it work. But we have 2 other filesystem that also have bad records. On those I get

            nbp2-mds ~ # lctl --device MGS llog_print params
            OBD_IOC_LLOG_PRINT failed: Inappropriate ioctl for device
            

            They both are running server versions.
            lctl --device MGS llog_print nbp2-client works.
            btw, with both param or client config if the number of records is large, they get truncated and we get this error

            [3115652.139607] LustreError: 35937:0:(llog_ioctl.c:254:llog_print_cb()) not enough space for print log records
            
            mhanafi Mahmoud Hanafi added a comment - - edited On nbp7 I already ran Andreas command and it work. But we have 2 other filesystem that also have bad records. On those I get nbp2-mds ~ # lctl --device MGS llog_print params OBD_IOC_LLOG_PRINT failed: Inappropriate ioctl for device They both are running server versions. lctl --device MGS llog_print nbp2-client works. btw, with both param or client config if the number of records is large, they get truncated and we get this error [3115652.139607] LustreError: 35937:0:(llog_ioctl.c:254:llog_print_cb()) not enough space for print log records

            Mahmoud,

            Even if Andreas' suggestion works (you should definitely try it), could you please gather those logs I asked for?  They may be helpful in identifying a problem or problems.

            pfarrell Patrick Farrell (Inactive) added a comment - Mahmoud, Even if Andreas' suggestion works (you should definitely try it), could you please gather those logs I asked for?  They may be helpful in identifying a problem or problems.

            You could also run on the MGS "lctl --device MGS llog_print nbp7-client" or "... params" to dump the client config llogs while the system is mounted instead of the "params" log. You can use "lctl --device MGS llog_cancel params 69" to cancel that record number.

            adilger Andreas Dilger added a comment - You could also run on the MGS " lctl --device MGS llog_print nbp7-client " or " ... params " to dump the client config llogs while the system is mounted instead of the " params " log. You can use " lctl --device MGS llog_cancel params 69 " to cancel that record number.

            People

              pfarrell Patrick Farrell (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: