[LU-1014] MGS with sys.timeout is ignored and if sys.timeout is changed its not synced across the file system. Created: 20/Jan/12  Updated: 24/Aug/12  Resolved: 24/Aug/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0, Lustre 2.3.0, Lustre 2.4.0
Fix Version/s: Lustre 2.3.0, Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: James A Simmons Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None
Environment:

Any Lustre 2.X setup.


Issue Links:
Related
Severity: 3
Rank (Obsolete): 4478

 Description   

While testing Lustre master branch on our production system we ran into the issue of the timeout being incorrect on the MGS. This was causing the MGS to drop OSS and MDS connections. Also discovered if we changed the obd_timeout parameter anywhere to another value that value was never propagated to the rest of the system.



 Comments   
Comment by James A Simmons [ 20/Jan/12 ]

Here is a easy example of seeing the problem. Building a test file system with --param sys.timeout=20 to every device and then mounted this file system. Doing the following command gives:

pdsh -w spoon02,maid-mds1,barry-mds1,barry-oss[1-4] "lctl get_param timeout"
spoon02: timeout=20
barry-oss3: timeout=20
barry-oss4: timeout=20
maid-mds1: timeout=100
barry-oss2: timeout=20
barry-oss1: timeout=20
barry-mds1: timeout=20

As you can see the timeout on the MGS unit is set to the default 100 meaning the --param-sys.timeout=20 flag is being ignored.

Now for the technical reasons for why I think this is the case. Information such as the timeout is stored in the MGS logs via the function mgs_write_log_target. This information is passed with the struct mgs_target_info. This is all done when a target is registered with the MGS with mgs_handle_target_reg which is called when a RQF_MGS_TARGET_REG rpc is recieved. You will notice that the RQF_MGS_TARGET_REG is sent by mgc_target_register which happens when the key KEY_REGISTER_TARGET is received in mgc_set_info_async. This is all triggered by a mount of a target as you can see in obd_mount.c with the function server_register_target which is used to start the service targets i.e MDTs and OSTs. This is all to register via the MGC to the MGS for the specific targets. Well for the case of the MGS it has to be started BEFORE the MGC. Then the MGC is registered before all the server targets. This is the not the case for the MGS so no data is ever written to the logs of the MGS unit since the MGS is never registered to itself.

Comment by Jason Hill (Inactive) [ 20/Jan/12 ]

Something that also may be of interest is what is stored in the CONFIGS section of the MGT:

[root@maid-mds1 ~]# tunefs.lustre --dryrun /dev/sda
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

Read previous values:
Target: MGS
Index: unassigned
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x74
(MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: sys.timeout=20

Permanent disk data:
Target: MGS
Index: unassigned
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x74
(MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: sys.timeout=20

exiting before disk write.

Comment by Peter Jones [ 20/Jan/12 ]

Oleg

Could you please look into this one?

Thanks

Peter

Comment by Peter Jones [ 06/Feb/12 ]

Lai

Could you please look into this one?

Thanks

Peter

Comment by Lai Siyao [ 13/Feb/12 ]

Hi Jason, tunefs.lustre dumps mountdata, not llog config, which is the data written by mkfs, not the one directly used in device startup. The problem is just as James pointed out, MGS doesn't follow normal target device setup process, but just get created by lustre_start_simple().

Comment by Lai Siyao [ 14/Feb/12 ]

Review is on http://review.whamcloud.com/#change,2139

In this patch, MGS will process sys parameters from on-disk mountdata, that means, sys parameters can't be changed via `lctl set_param/conf_param ...` temporarily or permanently on disk, but through tunefs.lustre. IMO this is reasonable, because MGS may hold config for several setups, sys parameters change on one setup should not affect other setups.

James, would you verify the patch works?

Comment by Peter Jones [ 16/Feb/12 ]

James

When do you expect to be able to verify whether this patch works?

Thanks

Peter

Comment by Jason Hill (Inactive) [ 16/Feb/12 ]

Peter/Lai :

Taking away the lctl interface from a client can potentially break some legacy administrative scripts. For example our client startup script mounts the filesystem looks at the values in /proc/sys/lustre/

{timeout,ldlm_timeout,at_min,at_max}

and sets them if they are not correct. Now with this patch the timeout should always be correct, but will it still be queryable via lctl? Is it reasonable to expect that you need to unmount your MGT to change this value in production (I can see a use case there where it is not reasonable)? What affects would it have on the currently mounted MDT's and OST's - would they then need to be unmounted and re-mounted to pick up the change? My concern here is availibility and how we have architected the Spider filesystems here at ORNL (common MGS for all filesystems). Same concern for currently mounted clients - or would recovery handle getting the timeout value set on the client? Are there any precedents for removing a tunable from lustre in terms of compatibility that are of concern?

Thanks,

-Jason

Comment by James A Simmons [ 16/Feb/12 ]

I plan to test it tomorrow and do regression testing over the weekend. Monday we will boot our test cray system in the latest Lustre version to see if it holds up.

Comment by James A Simmons [ 20/Feb/12 ]

Okay I have tested your patch. Test the MGS formated with --param sys.timeout and also tried lctl [s/g]et_param timeout[=XXX] on clients and servers. On the MGS lct get_param timeout reported the right value when compared to the tune.lustre reported values which were passed in by the format command. I also wanted to see if you could change the timeout with lctl and I discovered you can change the value on any client or server including the MGS. That changed value also didn't change back to the old value at any time as long as you didn't reboot the node. The regression I did see is if you did change the obd_timeout on any client or server that new value did not propagate to all the nodes of the file system.

Comment by Lai Siyao [ 21/Feb/12 ]

Ahh, I misunderstood it, I used to think that timeout can only be changed through mountconf, but actually `lctl set_param timeout=xxx` just write to /proc/sys/lustre/timeout, which corresponds to a global variable 'obd_timeout', and it won't be propagated to the file system; while `lctl conf_param <target>.sys.timeout=xxx` will trigger mountconf update on the target.

Could you confirm it is regression? or do I miss anything?

Comment by James A Simmons [ 21/Feb/12 ]

Before I do a lctl conf_param lustre.sys.timeout=150 on the MGS (maid-mds1, barry-mds is the MDS)

pdsh -w maid-mds1,barry-mds1,barry-oss[1-4],spoon01 "lctl get_param timeout"
barry-mds1: timeout=300
barry-oss2: timeout=300
barry-oss4: timeout=300
barry-oss1: timeout=300
barry-oss3: timeout=300
maid-mds1: timeout=300
spoon01: timeout=300 (client)

After a pdsh -w maid-mds1 "lctl conf_param lustre.sys.timeout=150"

pdsh -w maid-mds1,barry-mds1,barry-oss[1-4],spoon01 "lctl get_param timeout"
barry-oss1: timeout=150
barry-oss4: timeout=150
barry-mds1: timeout=150
barry-oss3: timeout=150
barry-oss2: timeout=150
maid-mds1: timeout=300
spoon01: timeout=300

As you can see the client and the MGS are not updated with the new obd_timeout value.

Comment by Lai Siyao [ 22/Feb/12 ]

James, I did the same test, but the result is different: client always updates timeout, but MGS never. And the result is identical on both patched and unpatched code.

The code shows that although MGS has a MGC, but it doesn't enqueue config lock, so when config changes, it's not notified. I'll update the patch to fix it later.

Could you verify that client not update its timeout?

Comment by James A Simmons [ 22/Feb/12 ]

I'm a idiot. The client had the LNET set up but I didn't have the file system mounted. Once I mounted a file system on the client I seen the timeout change on the client. Only the MGS timeout doesn't change which it appears that you know what the solution is. Look forward to your new patch.

Comment by James A Simmons [ 22/Feb/12 ]

Also could you update cfg/local.sh from

MGS_MKFS_OPTS="--mgs --device-size=$MGSSIZE"

to

MGS_MKFS_OPTS="--mgs --device-size=$MGSSIZE --param sys.timeout=$TIMEOUT"

Comment by James A Simmons [ 22/Feb/12 ]

Looking at the the mgs parameter setting I'm seeing mgs_iocontrol -> mgs_setparam -> which does mgs_write_log_param as well as call mgs_revoke_lock. Mgs_revoke_lock does a local "ldlm_cli_enqueue_local" which handle the config changes. It should be notified which in turn calls mgs_completion_ast_config

Comment by Lai Siyao [ 23/Feb/12 ]

James, as I've said, although standalone MGS has a mgc device, because MGS doesn't have config of its own, this mgc doesn't enqueue any config lock, so upon config changes, standalone MGS won't be notified.

In the new patch I'll commit later, the mgc on MGS will enqueue '<profile>-params' config lock, this config is global parameters for the system. But there is one thing strange: even when you issue `lfs conf_param <profile>-MDT0000.sys.timeout=xxx` (or any other target/client device), all the config files will update its timeout value (ie. <profile>-params and <profile>-MDT0000, <profile>-OSTxxxx and <profile>-client), I'm not clear whether this is intended. So with this patch, there is no need to change cfg/local.sh script, because the timeout value will be set by the last device timeout parameter.

Comment by James A Simmons [ 23/Feb/12 ]

Thanks for the patch and info. I'm just attempting to understand the lustre code in greater depth. With the second patch I now understand how config is set up. Is patch 3 a total replacement of patch 2 or do you need to use both? I applied patch number 2 and patch number 3 and tested them with success. The timeout format parameter on the MGS is recognized and when I do a lctl conf_param it updates the timeouts correctly. As for was the syncing of obd_timeout intended, well I know our site is depended on that being available.

Comment by Lai Siyao [ 23/Feb/12 ]

Patch 3 is a total replacement for patch 2. The original issue is that MGS doesn't have llog config (normally for target like MDT/OST this config is created at device setup time, which will connect to MGS and register itself), so that it doesn't enqueue any config lock.

In patch 2 I tried to parse sys parameters through mountdata created by mkfs.lustre in MGS startup, but in this way MGS won't be notified if config changes; so in patch 3 MGC on MGS will enqueue '<profile>-params' config lock, this config contains global sys parameters, therefore MGS can process global sys parameters like other nodes process their config. Because patch 2 is a special way to process parameters, and patch 3 is using a normal method to process config, and can cover the functionality of patch 2, I'd like use patch 3 only.

Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #494
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/obdclass/obd_mount.c
  • lustre/mgc/mgc_request.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #494
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/mgc/mgc_request.c
  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #494
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/obdclass/obd_mount.c
  • lustre/mgc/mgc_request.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » x86_64,client,el6,ofa #494
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/mgc/mgc_request.c
  • lustre/obdclass/obd_mount.c
Comment by Peter Jones [ 01/Mar/12 ]

Landed for 2.2

Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » i686,client,el6,inkernel #494
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/mgc/mgc_request.c
  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » i686,client,el5,ofa #494
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/mgc/mgc_request.c
  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #494
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/mgc/mgc_request.c
  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » x86_64,client,el5,ofa #494
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/mgc/mgc_request.c
  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » i686,client,el5,inkernel #494
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/obdclass/obd_mount.c
  • lustre/mgc/mgc_request.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » i686,server,el5,ofa #494
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/mgc/mgc_request.c
  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » x86_64,server,el5,ofa #494
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/mgc/mgc_request.c
  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » i686,server,el5,inkernel #494
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/mgc/mgc_request.c
  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » x86_64,server,el6,ofa #494
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/mgc/mgc_request.c
  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » i686,server,el6,inkernel #494
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/obdclass/obd_mount.c
  • lustre/mgc/mgc_request.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #494
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/obdclass/obd_mount.c
  • lustre/mgc/mgc_request.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #494
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/mgc/mgc_request.c
  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » i686,client,el6,ofa #494
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/mgc/mgc_request.c
  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » i686,server,el6,ofa #494
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/obdclass/obd_mount.c
  • lustre/mgc/mgc_request.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,client,el5,inkernel #340
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/obdclass/obd_mount.c
  • lustre/mgc/mgc_request.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » i686,client,el6,inkernel #340
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/mgc/mgc_request.c
  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » i686,server,el5,inkernel #340
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/obdclass/obd_mount.c
  • lustre/mgc/mgc_request.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,server,el6,inkernel #340
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/mgc/mgc_request.c
  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » i686,client,el5,inkernel #340
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/mgc/mgc_request.c
  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,server,el5,inkernel #340
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/mgc/mgc_request.c
  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,client,el6,inkernel #340
LU-1014 mountconf: MGS should process parameter config (Revision 6cde8692384a1d3a6d0e8114629fa943c951fe40)

Result = SUCCESS
Oleg Drokin : 6cde8692384a1d3a6d0e8114629fa943c951fe40
Files :

  • lustre/obdclass/obd_mount.c
  • lustre/mgc/mgc_request.c
Comment by Bob Glossman (Inactive) [ 07/May/12 ]

http://review.whamcloud.com/#change,2667
back port to b2_1

Comment by Andreas Dilger [ 09/May/12 ]

There is some concern that this patch is causing breakage on the Orion branch, and may be fundamentally incorrect, per comments in ORI-651. Since this problem is longstanding and has never been reported in production, I would prefer that this patch be reverted from b2_1 until there is a more thorough understanding of the issue.

Comment by Brian Behlendorf [ 09/May/12 ]

I believe we need to reopen this bug and revisit this issue. After this patch was applied I am now encountering the following issue which causes the MGS to crash when mounted, ORI-651.

Now leaving aside the bag paging request which is clearly just a a use-after-free bug in the initial patch. The more that I started looking at this the more I'm convinced this patch needs to be reverted and reworked. I say that because the current implementation is at odds with my understanding of the Lustre MGS architecture (correct me if I'm wrong about this).

But this patch expects the MGS service name to include the filesystem name (<fsname>-MGS). However, until the Lustre clients support connecting to multiple MGSs that's impossible since the MGS is a resource which is shared between multiple filesystems. This change needs to be reworked to cleanly handle the multiple filesystem case.

Comment by Andreas Dilger [ 09/May/12 ]

Also note that reverting http://review.whamcloud.com/2139 from orion resolves the crash seen in ORI-651.

Comment by Mikhail Pershin [ 10/May/12 ]

Please refer to ORI-662 instead of ORI-651. The key issue is separate MGS setup, which has no ldd_fsname, so param log has name '-params' and things are not working as expected.

ORI-651 has normal names and the only issue there is memory access which looks like Orion-side issue.

Comment by Mikhail Pershin [ 10/May/12 ]

No, memory issue is not Orion-related, there is clear use-after-free case in original patch, check 'logname' is used being freed:

+                OBD_FREE(logname, MGS_PARAM_MAXLEN);
+                if (rc) {
+                        CERROR("failed to process parameters %s: %d\n",
+                               logname, rc);
+                        GOTO(out_mnt, rc);

That causing issues we've seen, and there is still issue about separate MGS case with no fsname

Comment by Mikhail Pershin [ 10/May/12 ]

Another question about this patch is about fsname-params log processing:

+        } else if (IS_MGS(lsi->lsi_ldd) &&
+                   !(lsi->lsi_lmd->lmd_flags & LMD_FLG_NOMGS)){
+                struct config_llog_instance cfg;
+                char *logname;
+
+                OBD_ALLOC(logname, MGS_PARAM_MAXLEN);
+                if (logname == NULL)
+                        GOTO(out_mnt, rc = -ENOMEM);
+                strcpy(logname, lsi->lsi_ldd->ldd_fsname);
+                strcat(logname, "-params");
+
+                memset(&cfg, 0, sizeof(cfg));
+                rc = lustre_process_log(sb, logname, &cfg);
+                OBD_FREE(logname, MGS_PARAM_MAXLEN);
+                if (rc) {
+                        CERROR("failed to process parameters %s: %d\n",
+                               logname, rc);
+                        GOTO(out_mnt, rc);
+                }

so it is processed only on MGS, second condition should be always true as well, because MGS cannot be with NOMGS flag I suppose, at least that sounds strange. But we don't need params processing on MGS, it should be processed on every node except MGS, right?

I didn't notice that during inspection, sorry.

Comment by James A Simmons [ 10/May/12 ]

I believe the flag is needed as well because IS_MGS(..) could equal IS_MDS(..). I discovered this bug when the MGS and MDS were separate servers thus they didn't share the same llogs. Yes we do need params processing so that when for example the timeout is changed on the system it gets set on the MGS as well.

Comment by Lai Siyao [ 11/May/12 ]

This is actually a MGS design issue: MGS allows to be shared by multiple filesystems, and it will create a <profile>-params for each filesystem, besides, MGS will create a lustre-params by default (the name 'lustre' is written into mountdata by mkfs.lustre by default, and that should be why ORION code can't find this llog config). In my previous patch MGC on MGS will enqueue lustre-params config lock, but sys.timeout won't be synced if filesystem name is not 'lustre'. And I don't know how this can work if MGS holds llog configs of several filesystems with different sys.timeout.

Any comment is welcome!

Comment by Andreas Dilger [ 11/May/12 ]

Some notes:

  • it probably would make sense to remove "lustre" as the default filesystem name. The admin should specify a proper filesystem name for any filesystem, and mkfs.lustre should fail if it is not specified, along with updating man pages and usage() output. We've already had reports of a site having two filesystems with the same name. That should be done in a separate bug.
  • There should not be any harm to have a larger timeout on the MGS than the rest of the filesystem, since at worst the clients will be pinging the MGS more frequently than it expects. In this regard, the MGS could check all of the configuration files, and only increase the timeout to the maximum one found in any config log.
  • A more important question is to understand what the MGS uses timeouts for at all? Is this only to evict stale clients? I suspect the reason we haven't seen this problem very much is that the timeout in the config log is rarely changed to be more than 4x the timeout (the server allows clients to miss 4 pings before evicting). If this is just the timeout, it might make sense to have the MGS just track the average ping interval from the client, and only evict the client if it misses 4x the average ping interval? The benefit is that the MGS doesn't need to parse the different obd_timeouts at all. The drawback is that this is extra code and runtime overhead and it allows stale clients to linger a bit longer, though if there are DLM callback timeouts the client will still be evicted.
Comment by Mikhail Pershin [ 14/May/12 ]

> I believe the flag is needed as well because IS_MGS(..) could equal IS_MDS(..).

James, IS_MGS() checks just LDD_F_SV_TYPE_MGS flag is set, you are right, it is true for combined MGS+MDT but in that case there can't be LMD_FLG_NOMGS, because there is MGS combined with MDT. Another issue with patch is 'else if', as I can see that check will be just skipped if first condition was true.

1828         if (!(lsi->lsi_lmd->lmd_flags & LMD_FLG_NOSVC) &&
1829                 (IS_OST(lsi->lsi_ldd) || IS_MDT(lsi->lsi_ldd))) {
1830                 rc = server_start_targets(sb, mnt);

So with combined MDT+MGS this check is true and MDT service will be started but 'else if' will not be even checked, so that code under 'else if' will work only for separate MGS. But I thought we need params processing on other nodes too?

Also I don't agree we have to change MGS parameters along with filesystem params, just because MGS is shared. Probably MGS params should be set separately and stored somewhere in mgs-params config which contains just MGS params?

Comment by James A Simmons [ 21/May/12 ]

New patch at http://review.whamcloud.com/#change,2843

I just tested the new patch. No regressions, and worked properly as expected on my end. Thank you. Now to see if ORI 651 still crashes for Livermore.

Comment by James A Simmons [ 25/May/12 ]

The patch behaves as expected except for one thing which appears to be a lctl bug. I logged into the MGS to make sure it is not a pdsh issue.

[root@tick-mgs1 ~]# lctl get_param timeout
timeout=150
[root@tick-mgs1 ~]# lctl conf_param lustre.sys.timeout=70
[root@tick-mgs1 ~]# lctl get_param timeout
timeout=150
[root@tick-mgs1 ~]# lctl conf_param lustre.sys.timeout=70
[root@tick-mgs1 ~]# lctl get_param timeout
timeout=150

[root@tick-mgs1 lustre]# cat /proc/sys/lustre/timeout
150
[root@tick-mgs1 lustre]# echo "70" > /proc/sys/lustre/timeout
[root@tick-mgs1 lustre]# cat /proc/sys/lustre/timeout
70
[root@tick-mgs1 lustre]# lctl get_param timeout
timeout=70

As you can see something strange is going on with the lctl command. Also the first time you change the value with lctl it appears to work but after that it doesn't anymore.

Comment by Andreas Dilger [ 25/May/12 ]

That is because the previous timeout parameter to set the value to 150 is still in the config log, and sets the "obd_timeout_set" switch, and later entries in the log with lower values are ignored. Since it is not typical to twiddle with the MGS timeouts, you can still use "lctl set_param timeout=70" on the MGS (instead of "lctl conf_param") and it should still temporarily override the MGS obd_timeout value it got from the config log.

Comment by James A Simmons [ 13/Aug/12 ]

I have been using the patch at http://review.whamcloud.com/#change,2881 with great success for some time.

Comment by Andreas Dilger [ 22/Aug/12 ]

Move to 2.3/2.4, since the patch landed to 2.1/2.2 was later reverted.

Comment by Peter Jones [ 24/Aug/12 ]

Landed for 2.3 and 2.4

Generated at Sat Feb 10 01:12:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.