[LU-4533] rpc_stats histogram does not support max_rpcs_in_flight greater than 31 Created: 23/Jan/14 Updated: 26/Jan/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Ryan Haasken | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | easy, patch | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 12395 | ||||||||||||||||||||
| Description |
|
The "rpcs in flight" histogram which is displayed by reading the proc file /proc/fs/lustre/osc/*/rpc_stats does not show values higher than 31. When max_rpcs_in_flight is set to a value greater than 31, we should see rows for "rpcs in flight" values greater than 31. Instead, all rpcs which are sent when there are 31 or more rpcs in flight are accounted for in the 31st bucket of the histogram. read write rpcs in flight rpcs % cum % | rpcs % cum % 0: 0 0 0 | 0 0 0 1: 504 5 5 | 621 30 30 2: 330 3 8 | 405 20 51 3: 337 3 12 | 1 0 51 4: 349 3 16 | 1 0 51 5: 338 3 19 | 1 0 51 6: 325 3 23 | 1 0 51 7: 327 3 26 | 1 0 51 8: 324 3 30 | 1 0 51 9: 307 3 33 | 1 0 51 10: 306 3 36 | 1 0 51 11: 306 3 40 | 1 0 51 12: 301 3 43 | 1 0 51 13: 291 3 46 | 1 0 51 14: 283 3 49 | 1 0 51 15: 278 2 52 | 2 0 51 16: 276 2 55 | 1 0 51 17: 270 2 58 | 1 0 51 18: 270 2 61 | 1 0 52 19: 266 2 63 | 1 0 52 20: 265 2 66 | 1 0 52 21: 262 2 69 | 2 0 52 22: 263 2 72 | 4 0 52 23: 262 2 75 | 3 0 52 24: 263 2 77 | 1 0 52 25: 262 2 80 | 2 0 52 26: 261 2 83 | 1 0 52 27: 260 2 86 | 1 0 52 28: 259 2 89 | 3 0 52 29: 256 2 91 | 2 0 53 30: 256 2 94 | 1 0 53 31: 512 5 100 | 939 46 100 According to the current version of the Lustre manual, the valid range for max_rpcs_in_flight is between 1 and 256. Those values should be supported by this histogram. The maximum value for max_rpcs_in_flight is determined by the value of this preprocessor macro: #define OSC_MAX_RIF_MAX 256 The size of the obd_histogram struct is determined by a preprocessor macro as well: /* if we find more consumers this could be generalized */ #define OBD_HIST_MAX 32 struct obd_histogram { spinlock_t oh_lock; unsigned long oh_buckets[OBD_HIST_MAX]; }; It looks like the histogram for recording the number of rpcs in flight has the greatest space requirements, so it would be a sufficient fix if we defined OBD_HIST_MAX to OSC_MAX_RIF_MAX. However, this would increase the size of every obd_histogram by about a factor of 8. I'm not sure yet if this would be a significant increase. Another option would be to generalize the obd_histogram struct to use a flexible array for oh_buckets, but this would require a lot more work, and all obd_histogram structures would need to be dynamically allocated. |
| Comments |
| Comment by Ryan Haasken [ 10/Apr/14 ] |
|
Here is a patch which just sets OBD_HIST_MAX to 256 to match the value of OSC_MAX_RIF_MAX. This may not be the best solution because of the extra space used by every obd_histogram, but it fixes this bug. http://review.whamcloud.com/#/c/9930/ Here is an example of the output from reading /proc/fs/lustre/osc/*/rpc_stats when max_rpcs_in_flight=64: read write rpcs in flight rpcs % cum % | rpcs % cum % 0: 0 0 0 | 0 0 0 1: 2 100 100 | 4592 25 25 2: 0 0 100 | 3216 17 43 3: 0 0 100 | 2390 13 56 4: 0 0 100 | 1966 10 67 5: 0 0 100 | 1663 9 76 6: 0 0 100 | 1292 7 84 7: 0 0 100 | 907 5 89 8: 0 0 100 | 592 3 92 9: 0 0 100 | 414 2 94 10: 0 0 100 | 254 1 96 11: 0 0 100 | 155 0 96 12: 0 0 100 | 107 0 97 13: 0 0 100 | 78 0 97 14: 0 0 100 | 56 0 98 15: 0 0 100 | 38 0 98 16: 0 0 100 | 32 0 98 17: 0 0 100 | 28 0 98 18: 0 0 100 | 23 0 98 19: 0 0 100 | 22 0 99 20: 0 0 100 | 22 0 99 21: 0 0 100 | 17 0 99 22: 0 0 100 | 14 0 99 23: 0 0 100 | 11 0 99 24: 0 0 100 | 12 0 99 25: 0 0 100 | 11 0 99 26: 0 0 100 | 10 0 99 27: 0 0 100 | 14 0 99 28: 0 0 100 | 16 0 99 29: 0 0 100 | 9 0 99 30: 0 0 100 | 7 0 99 31: 0 0 100 | 5 0 99 32: 0 0 100 | 7 0 99 33: 0 0 100 | 4 0 99 34: 0 0 100 | 3 0 99 35: 0 0 100 | 2 0 99 36: 0 0 100 | 2 0 99 37: 0 0 100 | 1 0 99 38: 0 0 100 | 1 0 99 39: 0 0 100 | 1 0 100 As you can see, now the histogram goes past 31 and all values of rpcs_in_flight which occurred are reported properly in the histogram. |
| Comment by Ryan Haasken [ 14/Apr/14 ] |
|
I have abandoned this patch: http://review.whamcloud.com/#/c/9930/ Andreas Dilger says he has a patch in the works to do dynamic allocation of the obd_histogram buckets. That will be a much better long-term solution. |
| Comment by Gerrit Updater [ 08/Feb/18 ] |
|
Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/31236 |
| Comment by Chris Horn [ 08/Feb/18 ] |
|
I am re-upping this patch since we never saw the dynamic obd_histogram allocation that was the reason we abandoned this change originally. |
| Comment by Colin Faber [X] (Inactive) [ 29/May/19 ] |
|
Has there been any further progress here? |
| Comment by Andreas Dilger [ 08/Dec/20 ] |
|
Attached my old_prototype_dynamic_histogram.patch |