[LU-1937] Fix thread names to be unique within limit Created: 13/Sep/12  Updated: 04/Jul/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Andreas Dilger Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: easy
Environment:

Single node client, 3 OSTs on one OSS


Issue Links:
Duplicate
is duplicated by LU-7458 Thread names truncated in sysrq-t output Resolved
Severity: 3
Rank (Obsolete): 8790

 Description   

It appears that there are multiple create threads being started for each OST on an OSS that have duplicate names:

 5927 root      20   0     0    0    0 S 13.4  0.0   7:14.03 ll_ost_create00    
 5928 root      20   0     0    0    0 S 13.1  0.0   6:56.52 ll_ost_create00    
 9212 root      20   0     0    0    0 R 12.1  0.0   7:52.05 ll_ost_create00    
 5929 root      20   0     0    0    0 S  6.9  0.0   3:59.30 ll_ost_create01    
 9216 root      20   0     0    0    0 S  6.9  0.0   3:45.86 ll_ost_create01    
 5930 root      20   0     0    0    0 S  6.2  0.0   3:45.33 ll_ost_create01    


 Comments   
Comment by Andreas Dilger [ 07/Dec/16 ]

This also highlights an issue with the thread names that are being used - they are too long for what the kernel allows (current->comm[] is only 16 chars long but needs a trailing NUL, so only allows 15 chars for the process names). The Lustre service thread names should be shortened to fit within this constraint - in this case either ll_ost_cr_NNN or ll_ost_create_NN.

Comment by Andreas Dilger [ 25/Jun/19 ]

At the current time, it also appears that the ost_create threads are only used for OST_STATFS RPCs, with the OST_CREATE and OST_DESTROY RPCs being sent to the regular RPC handling portal, as can be seen by the RPC stats:

$ lctl get_param ost.OSS.*.stats
ost.OSS.ost.stats=
ldlm_glimpse_enqueue      1319987 samples [reqs] 1 1 1319987 1319987
ldlm_extent_enqueue       250 samples [reqs] 1 1 250 250
ost_setattr               256 samples [usec] 56 22164 46836 493686124
ost_create                19 samples [usec] 5228 10726959 15560849 117269236403285
ost_destroy               1146 samples [usec] 128 30404551 92768007 1605335175432143
ost_connect               26 samples [usec] 38 211 2507 300179
ost_disconnect            5 samples [usec] 19532 72884 233115 13147821185
ost_sync                  310632 samples [usec] 19 228476 11203022082 441094477371512
obd_ping                  36231 samples [usec] 6 70 637716 14482026
ost.OSS.ost_create.stats=
ost_statfs                1669029 samples [usec] 8 491 39417780 1026270498
ost.OSS.ost_io.stats=
ost_read                  1158936 samples [usec] 469 4687463 11670896590 8274105796184486
ost_write                 484603 samples [usec] 3365 25593258 69322915882 123449464540168200

In that regard, it would make more sense to rename this thread the ost_statfs thread, and potentially move the ost_create and ost_destroy RPCs to a new portal, so that they are serviced more promptly instead of being blocked behind other RPCs and/or blocking other RPCs (though since there are so few concurrent create/destroy RPCs they are unlikely to block other RPCs being processed).

Generated at Sat Feb 10 01:20:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.