Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1937

Fix thread names to be unique within limit

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.4.0
    • Single node client, 3 OSTs on one OSS
    • 3
    • 8790

    Description

      It appears that there are multiple create threads being started for each OST on an OSS that have duplicate names:

       5927 root      20   0     0    0    0 S 13.4  0.0   7:14.03 ll_ost_create00    
       5928 root      20   0     0    0    0 S 13.1  0.0   6:56.52 ll_ost_create00    
       9212 root      20   0     0    0    0 R 12.1  0.0   7:52.05 ll_ost_create00    
       5929 root      20   0     0    0    0 S  6.9  0.0   3:59.30 ll_ost_create01    
       9216 root      20   0     0    0    0 S  6.9  0.0   3:45.86 ll_ost_create01    
       5930 root      20   0     0    0    0 S  6.2  0.0   3:45.33 ll_ost_create01    
      

      Attachments

        Issue Links

          Activity

            [LU-1937] Fix thread names to be unique within limit

            At the current time, it also appears that the ost_create threads are only used for OST_STATFS RPCs, with the OST_CREATE and OST_DESTROY RPCs being sent to the regular RPC handling portal, as can be seen by the RPC stats:

            $ lctl get_param ost.OSS.*.stats
            ost.OSS.ost.stats=
            ldlm_glimpse_enqueue      1319987 samples [reqs] 1 1 1319987 1319987
            ldlm_extent_enqueue       250 samples [reqs] 1 1 250 250
            ost_setattr               256 samples [usec] 56 22164 46836 493686124
            ost_create                19 samples [usec] 5228 10726959 15560849 117269236403285
            ost_destroy               1146 samples [usec] 128 30404551 92768007 1605335175432143
            ost_connect               26 samples [usec] 38 211 2507 300179
            ost_disconnect            5 samples [usec] 19532 72884 233115 13147821185
            ost_sync                  310632 samples [usec] 19 228476 11203022082 441094477371512
            obd_ping                  36231 samples [usec] 6 70 637716 14482026
            ost.OSS.ost_create.stats=
            ost_statfs                1669029 samples [usec] 8 491 39417780 1026270498
            ost.OSS.ost_io.stats=
            ost_read                  1158936 samples [usec] 469 4687463 11670896590 8274105796184486
            ost_write                 484603 samples [usec] 3365 25593258 69322915882 123449464540168200
            

            In that regard, it would make more sense to rename this thread the ost_statfs thread, and potentially move the ost_create and ost_destroy RPCs to a new portal, so that they are serviced more promptly instead of being blocked behind other RPCs and/or blocking other RPCs (though since there are so few concurrent create/destroy RPCs they are unlikely to block other RPCs being processed).

            adilger Andreas Dilger added a comment - At the current time, it also appears that the ost_create threads are only used for OST_STATFS RPCs, with the OST_CREATE and OST_DESTROY RPCs being sent to the regular RPC handling portal, as can be seen by the RPC stats: $ lctl get_param ost.OSS.*.stats ost.OSS.ost.stats= ldlm_glimpse_enqueue 1319987 samples [reqs] 1 1 1319987 1319987 ldlm_extent_enqueue 250 samples [reqs] 1 1 250 250 ost_setattr 256 samples [usec] 56 22164 46836 493686124 ost_create 19 samples [usec] 5228 10726959 15560849 117269236403285 ost_destroy 1146 samples [usec] 128 30404551 92768007 1605335175432143 ost_connect 26 samples [usec] 38 211 2507 300179 ost_disconnect 5 samples [usec] 19532 72884 233115 13147821185 ost_sync 310632 samples [usec] 19 228476 11203022082 441094477371512 obd_ping 36231 samples [usec] 6 70 637716 14482026 ost.OSS.ost_create.stats= ost_statfs 1669029 samples [usec] 8 491 39417780 1026270498 ost.OSS.ost_io.stats= ost_read 1158936 samples [usec] 469 4687463 11670896590 8274105796184486 ost_write 484603 samples [usec] 3365 25593258 69322915882 123449464540168200 In that regard, it would make more sense to rename this thread the ost_statfs thread, and potentially move the ost_create and ost_destroy RPCs to a new portal, so that they are serviced more promptly instead of being blocked behind other RPCs and/or blocking other RPCs (though since there are so few concurrent create/destroy RPCs they are unlikely to block other RPCs being processed).

            This also highlights an issue with the thread names that are being used - they are too long for what the kernel allows (current->comm[] is only 16 chars long but needs a trailing NUL, so only allows 15 chars for the process names). The Lustre service thread names should be shortened to fit within this constraint - in this case either ll_ost_cr_NNN or ll_ost_create_NN.

            adilger Andreas Dilger added a comment - This also highlights an issue with the thread names that are being used - they are too long for what the kernel allows (current->comm[] is only 16 chars long but needs a trailing NUL, so only allows 15 chars for the process names). The Lustre service thread names should be shortened to fit within this constraint - in this case either ll_ost_cr_NNN or ll_ost_create_NN .

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: