[LUDOC-247] recovery_time_soft and recovery_time_hard default value descriptions are misleading Created: 07/Jul/14 Updated: 16/Feb/16 Resolved: 16/Feb/16 |
|
| Status: | Resolved |
| Project: | Lustre Documentation |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Ryan Haasken | Assignee: | Richard Henwood (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 14791 |
| Description |
|
The documentation for the OST mount options recovery_time_soft and recovery_time_hard is misleading in the way that it describes the default values for these parameters. For recovery_time_soft, it says: "The default soft recovery timeout is 300 seconds (5 minutes)." For recovery_time_hard, it says: "The default hard recovery timeout is set to 900 seconds (15 minutes)." The default (if the recovery_time_{soft,hard} mount options are not given) is actually dependent upon the value of the RPC timeout value which is described in section 32.5.2 of the Lustre manual as "The time that a client waits for a server to complete an RPC....". The recovery_time_{soft,hard} values are set at mount time in the function server_calc_timeout(). If no recovery_time_{soft,hard} options are given to the mount command, the values are set as follows: if (soft == 0) soft = OBD_RECOVERY_TIME_SOFT; if (hard == 0) hard = OBD_RECOVERY_TIME_HARD; Those macros are defined as follows: #define OBD_RECOVERY_TIME_HARD (obd_timeout * 9) #define OBD_RECOVERY_TIME_SOFT (obd_timeout * 3) Thus, the default values of recovery_time_soft and recovery_time_hard are actually 3 times the RPC timeout and 9 times the RPC timeout, respectively. If the RPC timeout is set to the default 100 seconds (at OST mount time), then the default values of recovery_time_soft and recovery_time_hard are 300 and 900 seconds, respectively. |
| Comments |
| Comment by Ryan Haasken [ 07/Jul/14 ] |
|
I would suggest the following changes to section 37.15.3. For recovery_time_soft: "The default soft recovery timeout is 3 times the RPC timeout value (see section 32.5.2). The default RPC timeout is 100 seconds, which would make the soft recovery timeout default to 300 seconds (5 minutes). The soft recovery timeout is set at mount time and will not change if the RPC timeout is changed after mount time." For recovery_time_hard: "The default hard recovery timeout is 9 times the RPC timeout value (see section 32.5.2). The default RPC timeout is 100 seconds, which would make the hard recovery timeout default to 900 seconds (15 minutes). The hard recovery timeout is set at mount time and will not change if the RPC timeout is changed after mount time." |
| Comment by Ryan Haasken [ 07/Jul/14 ] |
|
I'm working on a patch, but I don't see anywhere on the Intel HPDD wiki that describes how to make a link to another section in the Lustre manual. I'm looking at examples, but I don't understand how an id for a section is generated. For example, in the See Also section, 37.15.5, there is this link <para> <xref linkend="dbdoclet.50438219_75432"/></para> This is a link to section 37.14 "mkfs.lustre". That section has the matching id in the XML: <section xml:id="dbdoclet.50438219_75432"> How is the id generated? It seems like a common enough task that is should be documented here: https://wiki.hpdd.intel.com/display/PUB/Making+changes+to+the+Lustre+Manual+source |
| Comment by Ryan Haasken [ 11/Sep/14 ] |
|
I guess there was already a unique id assigned to section 32.5.2 of the manual, so I was able to just use that id in the link. Here is a patch: |
| Comment by Gerrit Updater [ 16/Feb/16 ] |
|
Richard Henwood (richard.henwood@intel.com) merged in patch http://review.whamcloud.com/11885/ defaults |
| Comment by Richard Henwood (Inactive) [ 16/Feb/16 ] |
|
Thanks for the patch Ryan! |