[LU-10514] all metadata operations take 1+ minutes thanks to libtool's l_getidentity Created: 15/Jan/18 Updated: 28/Dec/19 Resolved: 25/Jan/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | John Hammond | Assignee: | John Hammond |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
n:lustre-release# bash $LUSTRE/tests/llmount.sh ... n:lustre-release# time touch /mnt/lustre/f0 real 1m30.060s user 0m0.001s sys 0m0.004s n:lustre-release# time touch /mnt/lustre/f1 real 2m0.058s user 0m0.000s sys 0m0.004s libtoolizing lustre/utils creates a wrapper script for l_getidentity that doesn't work. When invoked the wrapped script prints the following to stderr /root/lustre-release/lustre/utils/l_getidentity: line 151: ls: command not found /root/lustre-release/lustre/utils/l_getidentity: line 198: rm: command not found /root/lustre-release/lustre/utils/l_getidentity: line 212: rm: command not found /root/lustre-release/lustre/utils/l_getidentity: line 213: mv: command not found /root/lustre-release/lustre/utils/l_getidentity: line 214: rm: command not found /root/lustre-release/lustre/utils/l_getidentity: error: `/root/lustre-release/lustre/utils/.libs/lt-l_getidentity' does not exist This script is just a wrapper for lt-l_getidentity. See the libtool documentation for more information. But this doesn't go anywhere because stderr is to connected to anything when l_getidentity is run. l_getidentity should not depend on liblustreapi. We should factor out whatever it needs in separate .c files and add them to l_getidentity dependencies. Also why do it take 2 minutes to for the operation to complete. It seems like we're not handling failure from the identity downcall very well. It sets stuck at: n:~# stack1 mdt 8856 mdt00_003 [<ffffffffc0921e80>] upcall_cache_get_entry+0x1d0/0x8f0 [obdclass] [<ffffffffc10120c7>] mdt_identity_get+0x17/0x50 [mdt] [<ffffffffc0ff36eb>] old_init_ucred_common+0xdb/0x290 [mdt] [<ffffffffc0ff39c7>] old_init_ucred+0x127/0x240 [mdt] [<ffffffffc0ff5405>] mdt_init_ucred_intent_getattr+0x85/0xa0 [mdt] [<ffffffffc0ff04f5>] mdt_intent_getattr+0xc5/0x470 [mdt] [<ffffffffc0fe60b2>] mdt_intent_opc+0x442/0xad0 [mdt] [<ffffffffc0fedc73>] mdt_intent_policy+0x1a3/0x360 [mdt] [<ffffffffc0d042fa>] ldlm_lock_enqueue+0x38a/0x970 [ptlrpc] [<ffffffffc0d2da33>] ldlm_handle_enqueue0+0x8f3/0x1400 [ptlrpc] [<ffffffffc0db3752>] tgt_enqueue+0x62/0x210 [ptlrpc] [<ffffffffc0dbb965>] tgt_request_handle+0x925/0x13b0 [ptlrpc] [<ffffffffc0d5fc7e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc] [<ffffffffc0d63422>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [<ffffffff810b252f>] kthread+0xcf/0xe0 [<ffffffff816b8798>] ret_from_fork+0x58/0x90 [<ffffffffffffffff>] 0xffffffffffffffff It's sleeping at left = schedule_timeout(expiry) in upcall_cache_get_entry(). BTW, libtool is terrible. |
| Comments |
| Comment by John Hammond [ 15/Jan/18 ] |
|
Sorry, forgot to mention that the libtoolizing of lustre/utils/ was done by https://review.whamcloud.com/30562 |
| Comment by James A Simmons [ 15/Jan/18 ] |
|
None of the tests are showing this regression. This looks like a corner case. What are the exact steps you did to reproduce this problem? /usr/src/lustre-2.10.56/lustre/utils$ ./l_getidentity usage: l_getidentity {-d|mdtname} {uid} /usr/src/lustre-2.10.56/lustre/utils$ ldd ./l_getidentity Well the other option besides libtool is cmake I wouldn't consider this a blocker since under normal conditions things do work. |
| Comment by John Hammond [ 15/Jan/18 ] |
|
James, Anything that breaks when ./lustre/tests/llmount.sh is run from the lustre-release directory should be a blocker. Many developers depend on this working to reduce turn around time for testing locally. |
| Comment by Gerrit Updater [ 16/Jan/18 ] |
|
John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/30872 |
| Comment by James A Simmons [ 16/Jan/18 ] |
|
You could also do l_getidentity_LDFLAGS := -static. We should look at making l_getidentity independent of libcfs.a as well. Currently we have to have libcfs.a due to need for libcfs_nid2str() used by /etc/perm.conf support. Do we need /etc/perm.conf anymore since we have nodemap which is way better. Second libcfs.a use is cfs_get_param() but we could just test for /proc and /sys indentity_info file and write to it directly. |
| Comment by Andreas Dilger [ 22/Jan/18 ] |
|
James, getting rid of /etc/perm.conf would be great. As you write, we have nodemap already. That was part of the goal of the |
| Comment by Gerrit Updater [ 25/Jan/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30872/ |
| Comment by Peter Jones [ 25/Jan/18 ] |
|
Landed for 2.11 |