Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10514

all metadata operations take 1+ minutes thanks to libtool's l_getidentity

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.11.0
    • Lustre 2.11.0
    • None
    • 3
    • 9223372036854775807

    Description

      n:lustre-release# bash $LUSTRE/tests/llmount.sh
      ...
      n:lustre-release# time touch /mnt/lustre/f0
      
      real    1m30.060s
      user    0m0.001s
      sys    0m0.004s
      n:lustre-release# time touch /mnt/lustre/f1
      
      real	2m0.058s
      user	0m0.000s
      sys	0m0.004s
      

      libtoolizing lustre/utils creates a wrapper script for l_getidentity that doesn't work.

      When invoked the wrapped script prints the following to stderr

      /root/lustre-release/lustre/utils/l_getidentity: line 151: ls: command not found
      /root/lustre-release/lustre/utils/l_getidentity: line 198: rm: command not found
      /root/lustre-release/lustre/utils/l_getidentity: line 212: rm: command not found
      /root/lustre-release/lustre/utils/l_getidentity: line 213: mv: command not found
      /root/lustre-release/lustre/utils/l_getidentity: line 214: rm: command not found
      /root/lustre-release/lustre/utils/l_getidentity: error: `/root/lustre-release/lustre/utils/.libs/lt-l_getidentity' does not exist
      This script is just a wrapper for lt-l_getidentity.
      See the libtool documentation for more information.
      

      But this doesn't go anywhere because stderr is to connected to anything when l_getidentity is run.

      l_getidentity should not depend on liblustreapi. We should factor out whatever it needs in separate .c files and add them to l_getidentity dependencies.

      Also why do it take 2 minutes to for the operation to complete. It seems like we're not handling failure from the identity downcall very well. It sets stuck at:

      n:~# stack1 mdt
      8856 mdt00_003
      [<ffffffffc0921e80>] upcall_cache_get_entry+0x1d0/0x8f0 [obdclass]
      [<ffffffffc10120c7>] mdt_identity_get+0x17/0x50 [mdt]
      [<ffffffffc0ff36eb>] old_init_ucred_common+0xdb/0x290 [mdt]
      [<ffffffffc0ff39c7>] old_init_ucred+0x127/0x240 [mdt]
      [<ffffffffc0ff5405>] mdt_init_ucred_intent_getattr+0x85/0xa0 [mdt]
      [<ffffffffc0ff04f5>] mdt_intent_getattr+0xc5/0x470 [mdt]
      [<ffffffffc0fe60b2>] mdt_intent_opc+0x442/0xad0 [mdt]
      [<ffffffffc0fedc73>] mdt_intent_policy+0x1a3/0x360 [mdt]
      [<ffffffffc0d042fa>] ldlm_lock_enqueue+0x38a/0x970 [ptlrpc]
      [<ffffffffc0d2da33>] ldlm_handle_enqueue0+0x8f3/0x1400 [ptlrpc]
      [<ffffffffc0db3752>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [<ffffffffc0dbb965>] tgt_request_handle+0x925/0x13b0 [ptlrpc]
      [<ffffffffc0d5fc7e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc]
      [<ffffffffc0d63422>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      [<ffffffff810b252f>] kthread+0xcf/0xe0
      [<ffffffff816b8798>] ret_from_fork+0x58/0x90
      [<ffffffffffffffff>] 0xffffffffffffffff
      

      It's sleeping at left = schedule_timeout(expiry) in upcall_cache_get_entry().

      BTW, libtool is terrible.

      Attachments

        Issue Links

          Activity

            People

              jhammond John Hammond
              jhammond John Hammond
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: