Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10514

all metadata operations take 1+ minutes thanks to libtool's l_getidentity

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.11.0
    • Lustre 2.11.0
    • None
    • 3
    • 9223372036854775807

    Description

      n:lustre-release# bash $LUSTRE/tests/llmount.sh
      ...
      n:lustre-release# time touch /mnt/lustre/f0
      
      real    1m30.060s
      user    0m0.001s
      sys    0m0.004s
      n:lustre-release# time touch /mnt/lustre/f1
      
      real	2m0.058s
      user	0m0.000s
      sys	0m0.004s
      

      libtoolizing lustre/utils creates a wrapper script for l_getidentity that doesn't work.

      When invoked the wrapped script prints the following to stderr

      /root/lustre-release/lustre/utils/l_getidentity: line 151: ls: command not found
      /root/lustre-release/lustre/utils/l_getidentity: line 198: rm: command not found
      /root/lustre-release/lustre/utils/l_getidentity: line 212: rm: command not found
      /root/lustre-release/lustre/utils/l_getidentity: line 213: mv: command not found
      /root/lustre-release/lustre/utils/l_getidentity: line 214: rm: command not found
      /root/lustre-release/lustre/utils/l_getidentity: error: `/root/lustre-release/lustre/utils/.libs/lt-l_getidentity' does not exist
      This script is just a wrapper for lt-l_getidentity.
      See the libtool documentation for more information.
      

      But this doesn't go anywhere because stderr is to connected to anything when l_getidentity is run.

      l_getidentity should not depend on liblustreapi. We should factor out whatever it needs in separate .c files and add them to l_getidentity dependencies.

      Also why do it take 2 minutes to for the operation to complete. It seems like we're not handling failure from the identity downcall very well. It sets stuck at:

      n:~# stack1 mdt
      8856 mdt00_003
      [<ffffffffc0921e80>] upcall_cache_get_entry+0x1d0/0x8f0 [obdclass]
      [<ffffffffc10120c7>] mdt_identity_get+0x17/0x50 [mdt]
      [<ffffffffc0ff36eb>] old_init_ucred_common+0xdb/0x290 [mdt]
      [<ffffffffc0ff39c7>] old_init_ucred+0x127/0x240 [mdt]
      [<ffffffffc0ff5405>] mdt_init_ucred_intent_getattr+0x85/0xa0 [mdt]
      [<ffffffffc0ff04f5>] mdt_intent_getattr+0xc5/0x470 [mdt]
      [<ffffffffc0fe60b2>] mdt_intent_opc+0x442/0xad0 [mdt]
      [<ffffffffc0fedc73>] mdt_intent_policy+0x1a3/0x360 [mdt]
      [<ffffffffc0d042fa>] ldlm_lock_enqueue+0x38a/0x970 [ptlrpc]
      [<ffffffffc0d2da33>] ldlm_handle_enqueue0+0x8f3/0x1400 [ptlrpc]
      [<ffffffffc0db3752>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [<ffffffffc0dbb965>] tgt_request_handle+0x925/0x13b0 [ptlrpc]
      [<ffffffffc0d5fc7e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc]
      [<ffffffffc0d63422>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      [<ffffffff810b252f>] kthread+0xcf/0xe0
      [<ffffffff816b8798>] ret_from_fork+0x58/0x90
      [<ffffffffffffffff>] 0xffffffffffffffff
      

      It's sleeping at left = schedule_timeout(expiry) in upcall_cache_get_entry().

      BTW, libtool is terrible.

      Attachments

        Issue Links

          Activity

            [LU-10514] all metadata operations take 1+ minutes thanks to libtool's l_getidentity
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30872/
            Subject: LU-10514 utils: statically link l_getidentity with libcfs.a
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 14e8e8e8a6b1bfe370e3e1f9fef6e37ccfc23290

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30872/ Subject: LU-10514 utils: statically link l_getidentity with libcfs.a Project: fs/lustre-release Branch: master Current Patch Set: Commit: 14e8e8e8a6b1bfe370e3e1f9fef6e37ccfc23290

            James, getting rid of /etc/perm.conf would be great. As you write, we have nodemap already. That was part of the goal of the LU-7533 cleanup.

            adilger Andreas Dilger added a comment - James, getting rid of /etc/perm.conf would be great. As you write, we have nodemap already. That was part of the goal of the LU-7533 cleanup.

            You could also do l_getidentity_LDFLAGS := -static. We should look at making l_getidentity independent of libcfs.a as well. Currently we have to have libcfs.a due to need for libcfs_nid2str() used by /etc/perm.conf support. Do we need /etc/perm.conf anymore since we have nodemap which is way better. Second libcfs.a use is cfs_get_param() but we could just test for /proc and /sys indentity_info file and write to it directly.

            simmonsja James A Simmons added a comment - You could also do l_getidentity_LDFLAGS := -static. We should look at making l_getidentity independent of libcfs.a as well. Currently we have to have libcfs.a due to need for libcfs_nid2str() used by /etc/perm.conf support. Do we need /etc/perm.conf anymore since we have nodemap which is way better. Second libcfs.a use is cfs_get_param() but we could just test for /proc and /sys indentity_info file and write to it directly.

            John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/30872
            Subject: LU-10514 utils: statically link l_getidentity with libcfs.a
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e53f0067d9f8b4ab2d7bf8df75b5fb02aae03727

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/30872 Subject: LU-10514 utils: statically link l_getidentity with libcfs.a Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e53f0067d9f8b4ab2d7bf8df75b5fb02aae03727

            James,

            Anything that breaks when ./lustre/tests/llmount.sh is run from the lustre-release directory should be a blocker. Many developers depend on this working to reduce turn around time for testing locally.

            jhammond John Hammond added a comment - James, Anything that breaks when ./lustre/tests/llmount.sh is run from the lustre-release directory should be a blocker. Many developers depend on this working to reduce turn around time for testing locally.
            simmonsja James A Simmons added a comment - - edited

            None of the tests are showing this regression. This looks like a corner case. What are the exact steps you did to reproduce this problem?

            /usr/src/lustre-2.10.56/lustre/utils$ ./l_getidentity

            usage: l_getidentity {-d|mdtname} {uid}
            Normally invoked as an upcall from Lustre, set via:
            lctl set_param mdt.${mdtname}.identity_upcall={path to upcall}
            -d: debug, print values to stdout instead of Lustre

            /usr/src/lustre-2.10.56/lustre/utils$ ldd ./l_getidentity
            not a dynamic executable

            Well the other option besides libtool is cmake

            I wouldn't consider this a blocker since under normal conditions things do work.

            simmonsja James A Simmons added a comment - - edited None of the tests are showing this regression. This looks like a corner case. What are the exact steps you did to reproduce this problem? /usr/src/lustre-2.10.56/lustre/utils$ ./l_getidentity usage: l_getidentity {-d|mdtname} {uid} Normally invoked as an upcall from Lustre, set via: lctl set_param mdt.${mdtname}.identity_upcall={path to upcall} -d: debug, print values to stdout instead of Lustre /usr/src/lustre-2.10.56/lustre/utils$ ldd ./l_getidentity not a dynamic executable Well the other option besides libtool is cmake I wouldn't consider this a blocker since under normal conditions things do work.
            jhammond John Hammond added a comment -

            Sorry, forgot to mention that the libtoolizing of lustre/utils/ was done by https://review.whamcloud.com/30562 LU-5541 build: move libcfs and liblustreapi over to libtool.

            jhammond John Hammond added a comment - Sorry, forgot to mention that the libtoolizing of lustre/utils/ was done by https://review.whamcloud.com/30562 LU-5541 build: move libcfs and liblustreapi over to libtool.

            People

              jhammond John Hammond
              jhammond John Hammond
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: