[LU-1937] Fix thread names to be unique within limit Created: 13/Sep/12 Updated: 04/Jul/20 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | easy | ||
| Environment: |
Single node client, 3 OSTs on one OSS |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 8790 | ||||||||
| Description |
|
It appears that there are multiple create threads being started for each OST on an OSS that have duplicate names: 5927 root 20 0 0 0 0 S 13.4 0.0 7:14.03 ll_ost_create00 5928 root 20 0 0 0 0 S 13.1 0.0 6:56.52 ll_ost_create00 9212 root 20 0 0 0 0 R 12.1 0.0 7:52.05 ll_ost_create00 5929 root 20 0 0 0 0 S 6.9 0.0 3:59.30 ll_ost_create01 9216 root 20 0 0 0 0 S 6.9 0.0 3:45.86 ll_ost_create01 5930 root 20 0 0 0 0 S 6.2 0.0 3:45.33 ll_ost_create01 |
| Comments |
| Comment by Andreas Dilger [ 07/Dec/16 ] |
|
This also highlights an issue with the thread names that are being used - they are too long for what the kernel allows (current->comm[] is only 16 chars long but needs a trailing NUL, so only allows 15 chars for the process names). The Lustre service thread names should be shortened to fit within this constraint - in this case either ll_ost_cr_NNN or ll_ost_create_NN. |
| Comment by Andreas Dilger [ 25/Jun/19 ] |
|
At the current time, it also appears that the ost_create threads are only used for OST_STATFS RPCs, with the OST_CREATE and OST_DESTROY RPCs being sent to the regular RPC handling portal, as can be seen by the RPC stats: $ lctl get_param ost.OSS.*.stats ost.OSS.ost.stats= ldlm_glimpse_enqueue 1319987 samples [reqs] 1 1 1319987 1319987 ldlm_extent_enqueue 250 samples [reqs] 1 1 250 250 ost_setattr 256 samples [usec] 56 22164 46836 493686124 ost_create 19 samples [usec] 5228 10726959 15560849 117269236403285 ost_destroy 1146 samples [usec] 128 30404551 92768007 1605335175432143 ost_connect 26 samples [usec] 38 211 2507 300179 ost_disconnect 5 samples [usec] 19532 72884 233115 13147821185 ost_sync 310632 samples [usec] 19 228476 11203022082 441094477371512 obd_ping 36231 samples [usec] 6 70 637716 14482026 ost.OSS.ost_create.stats= ost_statfs 1669029 samples [usec] 8 491 39417780 1026270498 ost.OSS.ost_io.stats= ost_read 1158936 samples [usec] 469 4687463 11670896590 8274105796184486 ost_write 484603 samples [usec] 3365 25593258 69322915882 123449464540168200 In that regard, it would make more sense to rename this thread the ost_statfs thread, and potentially move the ost_create and ost_destroy RPCs to a new portal, so that they are serviced more promptly instead of being blocked behind other RPCs and/or blocking other RPCs (though since there are so few concurrent create/destroy RPCs they are unlikely to block other RPCs being processed). |