[LUDOC-247] recovery_time_soft and recovery_time_hard default value descriptions are misleading Created: 07/Jul/14  Updated: 16/Feb/16  Resolved: 16/Feb/16

Status: Resolved
Project: Lustre Documentation
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Ryan Haasken Assignee: Richard Henwood (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 14791

 Description   

The documentation for the OST mount options recovery_time_soft and recovery_time_hard is misleading in the way that it describes the default values for these parameters.

For recovery_time_soft, it says: "The default soft recovery timeout is 300 seconds (5 minutes)."

For recovery_time_hard, it says: "The default hard recovery timeout is set to 900 seconds (15 minutes)."

The default (if the recovery_time_{soft,hard} mount options are not given) is actually dependent upon the value of the RPC timeout value which is described in section 32.5.2 of the Lustre manual as "The time that a client waits for a server to complete an RPC....".

The recovery_time_{soft,hard} values are set at mount time in the function server_calc_timeout(). If no recovery_time_{soft,hard} options are given to the mount command, the values are set as follows:

        if (soft == 0)
                soft = OBD_RECOVERY_TIME_SOFT;
        if (hard == 0)
                hard = OBD_RECOVERY_TIME_HARD;

Those macros are defined as follows:

#define OBD_RECOVERY_TIME_HARD          (obd_timeout * 9)
#define OBD_RECOVERY_TIME_SOFT          (obd_timeout * 3)

Thus, the default values of recovery_time_soft and recovery_time_hard are actually 3 times the RPC timeout and 9 times the RPC timeout, respectively. If the RPC timeout is set to the default 100 seconds (at OST mount time), then the default values of recovery_time_soft and recovery_time_hard are 300 and 900 seconds, respectively.



 Comments   
Comment by Ryan Haasken [ 07/Jul/14 ]

I would suggest the following changes to section 37.15.3.

For recovery_time_soft: "The default soft recovery timeout is 3 times the RPC timeout value (see section 32.5.2). The default RPC timeout is 100 seconds, which would make the soft recovery timeout default to 300 seconds (5 minutes). The soft recovery timeout is set at mount time and will not change if the RPC timeout is changed after mount time."

For recovery_time_hard: "The default hard recovery timeout is 9 times the RPC timeout value (see section 32.5.2). The default RPC timeout is 100 seconds, which would make the hard recovery timeout default to 900 seconds (15 minutes). The hard recovery timeout is set at mount time and will not change if the RPC timeout is changed after mount time."

Comment by Ryan Haasken [ 07/Jul/14 ]

I'm working on a patch, but I don't see anywhere on the Intel HPDD wiki that describes how to make a link to another section in the Lustre manual. I'm looking at examples, but I don't understand how an id for a section is generated.

For example, in the See Also section, 37.15.5, there is this link

          <para>  <xref linkend="dbdoclet.50438219_75432"/></para>

This is a link to section 37.14 "mkfs.lustre". That section has the matching id in the XML:

  <section xml:id="dbdoclet.50438219_75432">

How is the id generated?

It seems like a common enough task that is should be documented here: https://wiki.hpdd.intel.com/display/PUB/Making+changes+to+the+Lustre+Manual+source

Comment by Ryan Haasken [ 11/Sep/14 ]

I guess there was already a unique id assigned to section 32.5.2 of the manual, so I was able to just use that id in the link.

Here is a patch:

http://review.whamcloud.com/#/c/11885/

Comment by Gerrit Updater [ 16/Feb/16 ]

Richard Henwood (richard.henwood@intel.com) merged in patch http://review.whamcloud.com/11885/
Subject: LUDOC-247 mount: Clarify recovery_time_

{soft,hard}

defaults
Project: doc/manual
Branch: master
Current Patch Set:
Commit: ebee985b34dcbef204bc7fd609da9a525467ce67

Comment by Richard Henwood (Inactive) [ 16/Feb/16 ]

Thanks for the patch Ryan!

Generated at Sat Feb 10 03:41:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.