Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9104

Unknown config param in llog fails mounting target

Details

    • 3
    • 9223372036854775807

    Description

      If we have unknown config parameter in llog then it fails the target mounting.

      Steps to re-create it are as :

      $>sh llmount.sh
      $>export NOFORMAT=yes
      $>for OST in $(lctl get_param mgs.MGS.live.lustre | grep OST); do echo "max_rpcs_in_flight=50 on $OST" ; lctl conf_param $OST.ost.max_rpcs_in_flight=50 ; done <--- Note here ost.max.. is mention instead of osc.max..
      $>sh llmountcleanup.sh
      $>sh llmount.sh <------- this hangs, as ost is not avilable 
      

      Attachments

        Issue Links

          Activity

            [LU-9104] Unknown config param in llog fails mounting target
            pjones Peter Jones added a comment -

            great - thanks James

            pjones Peter Jones added a comment - great - thanks James

            I can fix this under LU-8066 since we will be moving to sysfs which handles these error codes differently.

            simmonsja James A Simmons added a comment - I can fix this under LU-8066 since we will be moving to sysfs which handles these error codes differently.
            pjones Peter Jones added a comment -

            IMHO, given that this bug has existed in multiple shipped releases, it would be better to track this issue in a new ticket so that we can more easily track getting it into future releases.

            pjones Peter Jones added a comment - IMHO, given that this bug has existed in multiple shipped releases, it would be better to track this issue in a new ticket so that we can more easily track getting it into future releases.
            lixi_wc Li Xi added a comment -

            I think we should fix this problem by ignoring ENOSYS in osd_process_config() so reopening the ticket.

            lixi_wc Li Xi added a comment - I think we should fix this problem by ignoring ENOSYS in osd_process_config() so reopening the ticket.
            lixi_wc Li Xi added a comment -

            The patch of https://review.whamcloud.com/25368/ might still has a bug in osd_process_config()

            When a llog config of type MDT can not be understood by class_process_proc_param(PARAM_MDT) or any other prefix (e.g. PARAM_HSM), it is passed to osd_process_config() and class_process_proc_param() is called for it for twice.

            The problem is why the -ENOSYS failure of class_process_proc_param(PARAM_OST) is not ignored in osd_process_config()?

            I am not sure about the llog processing codes, but I guess the process is: class_process_proc_param(PARAM_MDT) -> class_process_proc_param(PARAM_HSM) -> class_process_proc_param(PARAM_HSM) -> class_process_proc_param(PARAM_OSD) -> class_process_proc_param(PARAM_OST). So if osd_process_config() doesn't understand the llog record, it should ignore the -ENOSYS. Right?

            lixi_wc Li Xi added a comment - The patch of https://review.whamcloud.com/25368/ might still has a bug in osd_process_config() When a llog config of type MDT can not be understood by class_process_proc_param(PARAM_MDT) or any other prefix (e.g. PARAM_HSM), it is passed to osd_process_config() and class_process_proc_param() is called for it for twice. The problem is why the -ENOSYS failure of class_process_proc_param(PARAM_OST) is not ignored in osd_process_config()? I am not sure about the llog processing codes, but I guess the process is: class_process_proc_param(PARAM_MDT) -> class_process_proc_param(PARAM_HSM) -> class_process_proc_param(PARAM_HSM) -> class_process_proc_param(PARAM_OSD) -> class_process_proc_param(PARAM_OST). So if osd_process_config() doesn't understand the llog record, it should ignore the -ENOSYS. Right?

            I am trying to use `mount -t lustre -o nosvc xxx` which only start mgs, then execute `

            lctl conf_param -d public4-MDT0000.mdt.job_cleanaup_interval` to remove wrong config.

            wutaizeng Taizeng Wu (Inactive) added a comment - I am trying to use `mount -t lustre -o nosvc xxx` which only start mgs, then execute ` lctl conf_param -d public4-MDT0000.mdt.job_cleanaup_interval` to remove wrong config.

            I am using lustre-2.10.5, when i mount mgs&mdt, it report unkown config parameter, how i remove wrong parameter.

             

            ```

            [Thu Nov 29 15:00:14 2018] LDISKFS-fs (sdb): mounted filesystem with ordered data mode. Opts: errors=remount-ro
            [Thu Nov 29 15:00:17 2018] LDISKFS-fs (sdb): mounted filesystem with ordered data mode. Opts: errors=remount-ro
            [Thu Nov 29 15:00:18 2018] LDISKFS-fs (sdb): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
            [Thu Nov 29 15:00:19 2018] Lustre: MGS: Connection restored to 0b413a06-995f-b510-6aba-9810cdf71a17 (at 0@lo)
            [Thu Nov 29 15:00:19 2018] Lustre: Found index 0 for public4-MDT0000, updating log
            [Thu Nov 29 15:00:19 2018] LustreError: 11-0: public4-OST0000-osc-MDT0000: operation ost_connect to node 10.10.1.11@o2ib failed: rc = -114
            [Thu Nov 29 15:00:19 2018] LustreError: Skipped 4 previous similar messages
            [Thu Nov 29 15:00:19 2018] LustreError: 19825:0:(obd_config.c:1361:class_process_proc_param()) public4-MDT0000: unknown config parameter 'mdt.job_cleanaup_interval=60'
            [Thu Nov 29 15:00:19 2018] LustreError: 19825:0:(obd_config.c:1682:class_config_llog_handler()) MGC10.10.1.14@o2ib: cfg command failed: rc = -38
            [Thu Nov 29 15:00:19 2018] Lustre: cmd=cf00f 0:public4-MDT0000 1:mdt.job_cleanaup_interval=60

            [Thu Nov 29 15:00:19 2018] LustreError: 15c-8: MGC10.10.1.14@o2ib: The configuration from log 'public4-MDT0000' failed (-38). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
            [Thu Nov 29 15:00:19 2018] LustreError: 19761:0:(obd_mount_server.c:1386:server_start_targets()) failed to start server public4-MDT0000: -38
            [Thu Nov 29 15:00:19 2018] LustreError: 19761:0:(obd_mount_server.c:1879:server_fill_super()) Unable to start targets: -38
            [Thu Nov 29 15:00:19 2018] Lustre: Failing over public4-MDT0000
            [Thu Nov 29 15:00:25 2018] Lustre: 19761:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1543474819/real 1543474819] req@ffff99bab7d6a100 x1618446900006864/t0(0) o251->MGC10.10.1.14@o2ib@0@lo:26/25 lens 224/224 e 0 to 1 dl 1543474825 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
            [Thu Nov 29 15:00:26 2018] Lustre: server umount public4-MDT0000 complete
            [Thu Nov 29 15:00:26 2018] LustreError: 19761:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount (-38)

            ```

            wutaizeng Taizeng Wu (Inactive) added a comment - I am using lustre-2.10.5, when i mount mgs&mdt, it report unkown config parameter, how i remove wrong parameter.   ``` [Thu Nov 29 15:00:14 2018] LDISKFS-fs (sdb): mounted filesystem with ordered data mode. Opts: errors=remount-ro [Thu Nov 29 15:00:17 2018] LDISKFS-fs (sdb): mounted filesystem with ordered data mode. Opts: errors=remount-ro [Thu Nov 29 15:00:18 2018] LDISKFS-fs (sdb): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc [Thu Nov 29 15:00:19 2018] Lustre: MGS: Connection restored to 0b413a06-995f-b510-6aba-9810cdf71a17 (at 0@lo) [Thu Nov 29 15:00:19 2018] Lustre: Found index 0 for public4-MDT0000, updating log [Thu Nov 29 15:00:19 2018] LustreError: 11-0: public4-OST0000-osc-MDT0000: operation ost_connect to node 10.10.1.11@o2ib failed: rc = -114 [Thu Nov 29 15:00:19 2018] LustreError: Skipped 4 previous similar messages [Thu Nov 29 15:00:19 2018] LustreError: 19825:0:(obd_config.c:1361:class_process_proc_param()) public4-MDT0000: unknown config parameter 'mdt.job_cleanaup_interval=60' [Thu Nov 29 15:00:19 2018] LustreError: 19825:0:(obd_config.c:1682:class_config_llog_handler()) MGC10.10.1.14@o2ib: cfg command failed: rc = -38 [Thu Nov 29 15:00:19 2018] Lustre: cmd=cf00f 0:public4-MDT0000 1:mdt.job_cleanaup_interval=60 [Thu Nov 29 15:00:19 2018] LustreError: 15c-8: MGC10.10.1.14@o2ib: The configuration from log 'public4-MDT0000' failed (-38). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. [Thu Nov 29 15:00:19 2018] LustreError: 19761:0:(obd_mount_server.c:1386:server_start_targets()) failed to start server public4-MDT0000: -38 [Thu Nov 29 15:00:19 2018] LustreError: 19761:0:(obd_mount_server.c:1879:server_fill_super()) Unable to start targets: -38 [Thu Nov 29 15:00:19 2018] Lustre: Failing over public4-MDT0000 [Thu Nov 29 15:00:25 2018] Lustre: 19761:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1543474819/real 1543474819] req@ffff99bab7d6a100 x1618446900006864/t0(0) o251->MGC10.10.1.14@o2ib@0@lo:26/25 lens 224/224 e 0 to 1 dl 1543474825 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 [Thu Nov 29 15:00:26 2018] Lustre: server umount public4-MDT0000 complete [Thu Nov 29 15:00:26 2018] LustreError: 19761:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount (-38) ```

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28232/
            Subject: LU-9104 obd: Ignore unknown config param while mounting
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 542815f43c2fbbc29e528fc1de496e54851ca720

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28232/ Subject: LU-9104 obd: Ignore unknown config param while mounting Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 542815f43c2fbbc29e528fc1de496e54851ca720

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28232
            Subject: LU-9104 obd: Ignore unknown config param while mounting
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 902a5fba10096e3196cfed25a2bfc7db0bfe692a

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28232 Subject: LU-9104 obd: Ignore unknown config param while mounting Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 902a5fba10096e3196cfed25a2bfc7db0bfe692a
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            People

              simmonsja James A Simmons
              520557 Rahul Deshmukh (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: