Details

    • New Feature
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 9223372036854775807

    Description

      Batched metadata processing may get a big performance boost.
      Batched statahead can also increase the performance for directory listing operations such as ls.

      Attachments

        Issue Links

          Activity

            [LU-14139] batched statahead processing

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/43707/
            Subject: LU-14139 ptlrpc: grow PtlRPC properly when prepare sub request
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 5a2dfd36f9c2b6c10ab7ba44b0e9e86372623fde

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/43707/ Subject: LU-14139 ptlrpc: grow PtlRPC properly when prepare sub request Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5a2dfd36f9c2b6c10ab7ba44b0e9e86372623fde

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/40945/
            Subject: LU-14139 ptlrpc: grow reply buffer properly for batch request
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 5edb883b44ac707528ce2c0bc812d65b9ffb4a50

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/40945/ Subject: LU-14139 ptlrpc: grow reply buffer properly for batch request Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5edb883b44ac707528ce2c0bc812d65b9ffb4a50

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/40943/
            Subject: LU-14139 statahead: add stats for batch RPC requests
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a20f25d24b5f0ce7b5e77f7c596bffd0450cbdae

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/40943/ Subject: LU-14139 statahead: add stats for batch RPC requests Project: fs/lustre-release Branch: master Current Patch Set: Commit: a20f25d24b5f0ce7b5e77f7c596bffd0450cbdae

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/40720/
            Subject: LU-14139 statahead: batched statahead processing
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 4435d0121f72aac3ad01c98a33b265a496359890

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/40720/ Subject: LU-14139 statahead: batched statahead processing Project: fs/lustre-release Branch: master Current Patch Set: Commit: 4435d0121f72aac3ad01c98a33b265a496359890

            Is there a patch that converts ldlm to use rhashtable?

            adilger Andreas Dilger added a comment - Is there a patch that converts ldlm to use rhashtable?
            qian_wc Qian Yingjin added a comment - - edited

            From the attaching FG, lots of time costs are:
            __ldlm_handle2lock () ->class_handle2object()
            ldlm_resource_get() >cfs_hash_bd_lookup_intent()>ldlm_res_hop_keycmp()

            Each of them costs 5% for data and metadata DLM lock matching on the client side. All takes about 20% of the total sample...
            It seems that the hash for looking up the lock handle and resource take lots of time.
            And ldlm_res_hop_keycmp() reaches 5%, it means there are lots of elements in the same bucket of the hash table. We should increase the hash table and use the resizable hash table.

            For class_handle2object for locks, we should maintain the lock handle per target (osc/mdc) lock namespace, not shared a single global hash table for handle on the client or even the server.

            qian_wc Qian Yingjin added a comment - - edited From the attaching FG, lots of time costs are: __ldlm_handle2lock () ->class_handle2object() ldlm_resource_get() >cfs_hash_bd_lookup_intent() >ldlm_res_hop_keycmp() Each of them costs 5% for data and metadata DLM lock matching on the client side. All takes about 20% of the total sample... It seems that the hash for looking up the lock handle and resource take lots of time. And ldlm_res_hop_keycmp() reaches 5%, it means there are lots of elements in the same bucket of the hash table. We should increase the hash table and use the resizable hash table. For class_handle2object for locks, we should maintain the lock handle per target (osc/mdc) lock namespace, not shared a single global hash table for handle on the client or even the server.

            there are no LNET messages go out in 2nd 'ls -l'.
            I've opened a separate ticket for more investigations. https://jira.whamcloud.com/browse/LU-16365

            sihara Shuichi Ihara added a comment - there are no LNET messages go out in 2nd 'ls -l'. I've opened a separate ticket for more investigations. https://jira.whamcloud.com/browse/LU-16365

            it might be not related to LU-14139 direclty, but still interested in why Lustre still takes 25sec for second 'ls -l'.

            It looks like the client is still sending 125k x 8=1M glimpse RPCs to the OSTs to fetch the file size. I guess this is because the client is not caching all 1M of the OST DLM locks on the objects between "ls -l" calls? Did you check OSC LRU size? I suspect this would have a sudden jump in speed once the number of files is small enough that all the OST DLM locks are cached. 

            As far as I observed, all locks seems to be kept as well. nothing shrinks or re-granting locks between two 'ls -l'.

            [root@ec01 ~]# lctl get_param ldlm.namespaces.*.lock_count ldlm.namespaces.*.lru_size
            ldlm.namespaces.MGC10.0.11.208@o2ib12.lock_count=4
            ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.MGC10.0.11.208@o2ib12.lru_size=3200
            ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lru_size=0
            
            [sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null
            
            real	0m27.249s
            user	0m9.111s
            sys	0m13.544s
            
            [root@ec01 ~]# lctl get_param ldlm.namespaces.*.lock_count ldlm.namespaces.*.lru_size
            ldlm.namespaces.MGC10.0.11.208@o2ib12.lock_count=4
            ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lock_count=2
            ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lock_count=4
            ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lock_count=1000004
            ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lock_count=125089
            ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lock_count=125073
            ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lock_count=125091
            ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lock_count=125085
            ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lock_count=125295
            ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lock_count=125285
            ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lock_count=125103
            ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lock_count=125099
            ldlm.namespaces.MGC10.0.11.208@o2ib12.lru_size=3200
            ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lru_size=2
            ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lru_size=4
            ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lru_size=1000004
            ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lru_size=125089
            ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lru_size=125073
            ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lru_size=125091
            ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lru_size=125085
            ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lru_size=125295
            ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lru_size=125285
            ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lru_size=125103
            ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lru_size=125099
            
            [sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null
            
            real	0m25.615s
            user	0m8.982s
            sys	0m16.588s
            
            [root@ec01 ~]# lctl get_param ldlm.namespaces.*.lock_count ldlm.namespaces.*.lru_size
            ldlm.namespaces.MGC10.0.11.208@o2ib12.lock_count=4
            ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lock_count=2
            ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lock_count=4
            ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lock_count=1000004
            ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lock_count=125089
            ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lock_count=125073
            ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lock_count=125091
            ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lock_count=125085
            ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lock_count=125295
            ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lock_count=125285
            ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lock_count=125103
            ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lock_count=125099
            ldlm.namespaces.MGC10.0.11.208@o2ib12.lru_size=3200
            ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lru_size=2
            ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lru_size=4
            ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lru_size=1000004
            ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lru_size=125089
            ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lru_size=125073
            ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lru_size=125091
            ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lru_size=125085
            ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lru_size=125295
            ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lru_size=125285
            ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lru_size=125103
            ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lru_size=125099
            

            Here is same 1M files tests for ext4 on local disk of client.

            [root@ec01 ~]# echo 3 > /proc/sys/vm/drop_caches
            [sihara@ec01 ~]$ time ls -l /tmp/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/  > /dev/null
            
            real	0m16.999s
            user	0m8.956s
            sys	0m5.855s
            [sihara@ec01 ~]$ time ls -l /tmp/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/  > /dev/null
            
            real	0m11.832s
            user	0m8.765s
            sys	0m3.051s
            

            I'm also attaching FG which captured at second 'ls -l' on Lustre.

            sihara Shuichi Ihara added a comment - it might be not related to LU-14139 direclty, but still interested in why Lustre still takes 25sec for second 'ls -l'. It looks like the client is still sending 125k x 8=1M glimpse RPCs to the OSTs to fetch the file size. I guess this is because the client is not caching all 1M of the OST DLM locks on the objects between "ls -l" calls? Did you check OSC LRU size? I suspect this would have a sudden jump in speed once the number of files is small enough that all the OST DLM locks are cached.  As far as I observed, all locks seems to be kept as well. nothing shrinks or re-granting locks between two 'ls -l'. [root@ec01 ~]# lctl get_param ldlm.namespaces.*.lock_count ldlm.namespaces.*.lru_size ldlm.namespaces.MGC10.0.11.208@o2ib12.lock_count=4 ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.MGC10.0.11.208@o2ib12.lru_size=3200 ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lru_size=0 [sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null real 0m27.249s user 0m9.111s sys 0m13.544s [root@ec01 ~]# lctl get_param ldlm.namespaces.*.lock_count ldlm.namespaces.*.lru_size ldlm.namespaces.MGC10.0.11.208@o2ib12.lock_count=4 ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lock_count=2 ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lock_count=4 ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lock_count=1000004 ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lock_count=125089 ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lock_count=125073 ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lock_count=125091 ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lock_count=125085 ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lock_count=125295 ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lock_count=125285 ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lock_count=125103 ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lock_count=125099 ldlm.namespaces.MGC10.0.11.208@o2ib12.lru_size=3200 ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lru_size=2 ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lru_size=4 ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lru_size=1000004 ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lru_size=125089 ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lru_size=125073 ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lru_size=125091 ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lru_size=125085 ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lru_size=125295 ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lru_size=125285 ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lru_size=125103 ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lru_size=125099 [sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null real 0m25.615s user 0m8.982s sys 0m16.588s [root@ec01 ~]# lctl get_param ldlm.namespaces.*.lock_count ldlm.namespaces.*.lru_size ldlm.namespaces.MGC10.0.11.208@o2ib12.lock_count=4 ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lock_count=2 ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lock_count=4 ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lock_count=1000004 ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lock_count=125089 ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lock_count=125073 ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lock_count=125091 ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lock_count=125085 ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lock_count=125295 ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lock_count=125285 ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lock_count=125103 ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lock_count=125099 ldlm.namespaces.MGC10.0.11.208@o2ib12.lru_size=3200 ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lru_size=2 ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lru_size=4 ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lru_size=1000004 ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lru_size=125089 ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lru_size=125073 ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lru_size=125091 ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lru_size=125085 ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lru_size=125295 ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lru_size=125285 ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lru_size=125103 ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lru_size=125099 Here is same 1M files tests for ext4 on local disk of client. [root@ec01 ~]# echo 3 > /proc/sys/vm/drop_caches [sihara@ec01 ~]$ time ls -l /tmp/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null real 0m16.999s user 0m8.956s sys 0m5.855s [sihara@ec01 ~]$ time ls -l /tmp/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null real 0m11.832s user 0m8.765s sys 0m3.051s I'm also attaching FG which captured at second 'ls -l' on Lustre.
            sihara Shuichi Ihara added a comment - - edited

            So, somehow, "ls -l" for 1M files take 25sec even everything comes from caches.

            'ls -l' is doing other things after recived results. sorting, writing output, etc. sorting some impacts, but majority of time are still spent 1M times for statx() and getxattr() even all data in the cache.

            1670051344.758188 getxattr("/exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/file.mdtest.0.945973", "system.posix_acl_access", NULL, 0) = -1 ENODATA (No data available) <0.000008>
            1670051344.758214 statx(AT_FDCWD, "/exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/file.mdtest.0.877046", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_MODE|STATX_NLINK|STATX_MTIME|STATX_SIZE, {stx_mask=STATX_MODE|STATX_NLINK|STATX_MTIME|STATX_SIZE, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=47001, ...}) = 0 <0.000013>
            

            Just in case, this is a test resutls with/without caches, but ls doesn't sort for result list.

            [root@ec01 ~]# clush -w ai400x2-1-vm[1-4],ec01 " echo 3 > /proc/sys/vm/drop_caches "
            [sihara@ec01 ~]$ time ls -fl /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null
            
            real	0m20.023s
            user	0m1.903s
            sys	0m13.363s
            [sihara@ec01 ~]$ time ls -fl /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null
            
            real	0m18.164s
            user	0m1.703s
            sys	0m16.417s
            

            still same 2 sec overheads for client and server roundtrips. So, batched statahead reduced that overheads  to 2sec from 8 sec which means 4x speedup.

            sihara Shuichi Ihara added a comment - - edited So, somehow, "ls -l" for 1M files take 25sec even everything comes from caches. 'ls -l' is doing other things after recived results. sorting, writing output, etc. sorting some impacts, but majority of time are still spent 1M times for statx() and getxattr() even all data in the cache. 1670051344.758188 getxattr("/exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/file.mdtest.0.945973", "system.posix_acl_access", NULL, 0) = -1 ENODATA (No data available) <0.000008> 1670051344.758214 statx(AT_FDCWD, "/exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/file.mdtest.0.877046", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_MODE|STATX_NLINK|STATX_MTIME|STATX_SIZE, {stx_mask=STATX_MODE|STATX_NLINK|STATX_MTIME|STATX_SIZE, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=47001, ...}) = 0 <0.000013> Just in case, this is a test resutls with/without caches, but ls doesn't sort for result list. [root@ec01 ~]# clush -w ai400x2-1-vm[1-4],ec01 " echo 3 > /proc/sys/vm/drop_caches " [sihara@ec01 ~]$ time ls -fl /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null real 0m20.023s user 0m1.903s sys 0m13.363s [sihara@ec01 ~]$ time ls -fl /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null real 0m18.164s user 0m1.703s sys 0m16.417s still same 2 sec overheads for client and server roundtrips. So, batched statahead reduced that overheads  to 2sec from 8 sec which means 4x speedup.

            It looks like the client is still sending 125k x 8=1M glimpse RPCs to the OSTs to fetch the file size. I guess this is because the client is not caching all 1M of the OST DLM locks on the objects between "ls -l" calls? Did you check OSC LRU size? I suspect this would have a sudden jump in speed once the number of files is small enough that all the OST DLM locks are cached.

            adilger Andreas Dilger added a comment - It looks like the client is still sending 125k x 8=1M glimpse RPCs to the OSTs to fetch the file size. I guess this is because the client is not caching all 1M of the OST DLM locks on the objects between "ls -l" calls? Did you check OSC LRU size? I suspect this would have a sudden jump in speed once the number of files is small enough that all the OST DLM locks are cached.

            I did re-ran test "ls -l" (1M x 47001 byte file in single directory) with latest patch serices https://review.whamcloud.com/#/c/fs/lustre-release/+/49291/
            master (commit:c74c630)

            [root@ec01 ~]# clush -w ec01,ai400x2-1-vm[1-4] "echo 3 > /proc/sys/vm/drop_caches"
            [sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null
            real	0m33.757s
            user	0m9.363s
            sys	0m14.982s
            

            patched

            [root@ec01 ~]# lctl set_param llite.*.statahead_max=1024 llite.*.statahead_batch_max=64
            [root@ec01 ~]# clush -w ec01,ai400x2-1-vm[1-4] "echo 3 > /proc/sys/vm/drop_caches"
            [sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null
            
            real	0m27.296s
            user	0m9.089s
            sys	0m13.051s
            

            I see elapse time downed from 33sec to 27sec with patch.

            subreqs per batch   batchs   % cum %
            1:		     92322  76  76
            2:		      4742   3  80
            4:		       115   0  80
            8:		       657   0  80
            16:		     10736   8  89
            32:		       107   0  89
            64:		     12168  10 100
            

            as far as I see batch_stats, more than 950K (64 * 12168 + 32 * 107 + 16 * 10736) of 1M requests can be batched with more than 16 request aggregations. Maybe, this is not so bad? However, we are still spending more than 27 sec here.

            Here is additinal tests something like "drop cache; ls -l; ls -l"

            [root@ec01 ~]# clush -w ec01,ai400x2-1-vm[1-4] "echo 3 > /proc/sys/vm/drop_caches"
            [sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null
            
            real	0m27.385s
            user	0m8.994s
            sys	0m13.131s
            
            
            llite.exafs-ffff9b96b8157800.stats=
            snapshot_time             482063.475018642 secs.nsecs
            start_time                0.000000000 secs.nsecs
            elapsed_time              482063.475018642 secs.nsecs
            open                      1 samples [usecs] 154 154 154 23716
            close                     1 samples [usecs] 131 131 131 17161
            readdir                   1223 samples [usecs] 0 38879 258067 1731301375
            getattr                   1000002 samples [usecs] 2 9770 5776755 3873882865
            getxattr                  1 samples [usecs] 70 70 70 4900
            inode_permission          10000021 samples [usecs] 0 393 1848868 2164824
            opencount                 1 samples [reqs] 1 1 1 1
            
            
            mdc.exafs-MDT0002-mdc-ffff9b96b8157800.stats=
            snapshot_time             482063.474876615 secs.nsecs
            start_time                0.000000000 secs.nsecs
            elapsed_time              482063.474876615 secs.nsecs
            req_waittime              181839 samples [usecs] 57 2043896 22268808747 10620189011411911
            req_active                181839 samples [reqs] 1 23615 644666200 9255887581600
            ldlm_ibits_enqueue        6 samples [reqs] 1 1 6 6
            mds_close                 1 samples [usecs] 110 110 110 12100
            mds_readpage              56 samples [usecs] 250 38375 464045 5002627263
            mds_batch                 119274 samples [usecs] 61 84527 353967107 2052370493893
            ldlm_cancel               62501 samples [usecs] 182 725561 21912324500 10613954056275484
            obd_ping                  1 samples [usecs] 90 90 90 8100
            
            osc.exafs-OST0000-osc-ffff9b96b8157800.stats=
            snapshot_time             482063.474252382 secs.nsecs
            start_time                0.000000000 secs.nsecs
            elapsed_time              482063.474252382 secs.nsecs
            req_waittime              132963 samples [usecs] 29 408417 1867410218 500106174718090
            req_active                132963 samples [reqs] 1 3612 17237320 44241112018
            ldlm_glimpse_enqueue      125139 samples [reqs] 1 1 125139 125139
            ost_connect               1 samples [usecs] 97 97 97 9409
            ost_disconnect            1 samples [usecs] 253 253 253 64009
            ldlm_cancel               7822 samples [usecs] 263 408417 1857609504 500103314555706
            osc.exafs-OST0001-osc-ffff9b96b8157800.stats=
            snapshot_time             482063.474322305 secs.nsecs
            start_time                0.000000000 secs.nsecs
            elapsed_time              482063.474322305 secs.nsecs
            req_waittime              132958 samples [usecs] 31 400200 1868293807 500417445772075
            req_active                132958 samples [reqs] 1 3620 17260697 44413040365
            ldlm_glimpse_enqueue      125134 samples [reqs] 1 1 125134 125134
            ost_connect               1 samples [usecs] 134 134 134 17956
            ost_disconnect            1 samples [usecs] 255 255 255 65025
            ldlm_cancel               7822 samples [usecs] 249 400200 1858535688 500414552572698
            osc.exafs-OST0002-osc-ffff9b96b8157800.stats=
            snapshot_time             482063.474385742 secs.nsecs
            start_time                0.000000000 secs.nsecs
            elapsed_time              482063.474385742 secs.nsecs
            req_waittime              132922 samples [usecs] 30 403354 1585074597 373963525463297
            req_active                132922 samples [reqs] 1 3139 14993568 33140340186
            ldlm_glimpse_enqueue      125100 samples [reqs] 1 1 125100 125100
            ost_connect               1 samples [usecs] 104 104 104 10816
            ost_disconnect            1 samples [usecs] 229 229 229 52441
            ldlm_cancel               7820 samples [usecs] 137 403354 1576064547 373959847577499
            osc.exafs-OST0003-osc-ffff9b96b8157800.stats=
            snapshot_time             482063.474439284 secs.nsecs
            start_time                0.000000000 secs.nsecs
            elapsed_time              482063.474439284 secs.nsecs
            req_waittime              132923 samples [usecs] 29 426088 1586055175 373991604658201
            req_active                132923 samples [reqs] 1 3131 15010819 33188857463
            ldlm_glimpse_enqueue      125100 samples [reqs] 1 1 125100 125100
            ost_connect               1 samples [usecs] 140 140 140 19600
            ost_disconnect            1 samples [usecs] 230 230 230 52900
            ldlm_cancel               7821 samples [usecs] 188 426088 1576987697 373987536028513
            osc.exafs-OST0004-osc-ffff9b96b8157800.stats=
            snapshot_time             482063.474493870 secs.nsecs
            start_time                0.000000000 secs.nsecs
            elapsed_time              482063.474493870 secs.nsecs
            req_waittime              133220 samples [usecs] 30 402683 1938227551 496692342133501
            req_active                133220 samples [reqs] 1 3741 17164793 41132712891
            ldlm_glimpse_enqueue      125381 samples [reqs] 1 1 125381 125381
            ost_connect               1 samples [usecs] 106 106 106 11236
            ost_disconnect            1 samples [usecs] 236 236 236 55696
            ldlm_cancel               7837 samples [usecs] 327 402683 1791475030 496280422422484
            osc.exafs-OST0005-osc-ffff9b96b8157800.stats=
            snapshot_time             482063.474549629 secs.nsecs
            start_time                0.000000000 secs.nsecs
            elapsed_time              482063.474549629 secs.nsecs
            req_waittime              133213 samples [usecs] 29 393951 1936411387 496417875801293
            req_active                133213 samples [reqs] 1 3740 17146480 41076362838
            ldlm_glimpse_enqueue      125376 samples [reqs] 1 1 125376 125376
            ost_connect               1 samples [usecs] 118 118 118 13924
            ost_disconnect            1 samples [usecs] 199 199 199 39601
            ldlm_cancel               7835 samples [usecs] 315 393951 1790421213 496012032268869
            osc.exafs-OST0006-osc-ffff9b96b8157800.stats=
            snapshot_time             482063.474603969 secs.nsecs
            start_time                0.000000000 secs.nsecs
            elapsed_time              482063.474603969 secs.nsecs
            req_waittime              132941 samples [usecs] 27 394743 1649085257 398814721042191
            req_active                132941 samples [reqs] 1 3382 15661909 36181334593
            ldlm_glimpse_enqueue      125116 samples [reqs] 1 1 125116 125116
            ost_connect               1 samples [usecs] 99 99 99 9801
            ost_disconnect            1 samples [usecs] 187 187 187 34969
            ldlm_cancel               7823 samples [usecs] 347 394743 1639760732 398808893873090
            osc.exafs-OST0007-osc-ffff9b96b8157800.stats=
            snapshot_time             482063.474659084 secs.nsecs
            start_time                0.000000000 secs.nsecs
            elapsed_time              482063.474659084 secs.nsecs
            req_waittime              132954 samples [usecs] 31 392261 1644646149 397155067488985
            req_active                132954 samples [reqs] 1 3364 15624468 36037631194
            ldlm_glimpse_enqueue      125131 samples [reqs] 1 1 125131 125131
            ost_connect               1 samples [usecs] 121 121 121 14641
            ost_disconnect            1 samples [usecs] 169 169 169 28561
            
            

            In fact, we still spent 25sec even without drop cache. there were nothing requests from client to servers.
            Because everything came from caches on local client.

            [sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null
            
            real	0m25.309s
            user	0m8.937s
            sys	0m16.327s
            
            llite.exafs-ffff9b96b8157800.stats=
            snapshot_time             482212.579971327 secs.nsecs
            start_time                0.000000000 secs.nsecs
            elapsed_time              482212.579971327 secs.nsecs
            open                      1 samples [usecs] 189 189 189 35721
            close                     1 samples [usecs] 159 159 159 25281
            readdir                   1223 samples [usecs] 0 149 171544 24104442
            getattr                   1000002 samples [usecs] 3 34 9690974 103987264
            getxattr                  1 samples [usecs] 2 2 2 4
            getxattr_hits             1 samples [reqs]
            inode_permission          10000021 samples [usecs] 0 17 2009221 2041697
            opencount                 1 samples [reqs] 1 1 1 1
            openclosetime             1 samples [usecs] 102201708 102201708 102201708 10445189118117264
            
            mdc.exafs-MDT0002-mdc-ffff9b96b8157800.stats=
            snapshot_time             482212.579840759 secs.nsecs
            start_time                0.000000000 secs.nsecs
            elapsed_time              482212.579840759 secs.nsecs
            req_waittime              5 samples [usecs] 67 132 484 50542
            req_active                5 samples [reqs] 1 1 5 5
            ldlm_ibits_enqueue        1 samples [reqs] 1 1 1 1
            mds_close                 1 samples [usecs] 126 126 126 15876
            obd_ping                  3 samples [usecs] 67 87 226 17242
            
            osc.exafs-OST0000-osc-ffff9b96b8157800.stats=
            snapshot_time             482212.579286532 secs.nsecs
            start_time                0.000000000 secs.nsecs
            elapsed_time              482212.579286532 secs.nsecs
            req_waittime              3 samples [usecs] 76 83 235 18441
            req_active                3 samples [reqs] 1 1 3 3
            obd_ping                  3 samples [usecs] 76 83 235 18441
            osc.exafs-OST0001-osc-ffff9b96b8157800.stats=
            snapshot_time             482212.579346652 secs.nsecs
            start_time                0.000000000 secs.nsecs
            elapsed_time              482212.579346652 secs.nsecs
            req_waittime              3 samples [usecs] 69 78 222 16470
            req_active                3 samples [reqs] 1 1 3 3
            obd_ping                  3 samples [usecs] 69 78 222 16470
            osc.exafs-OST0002-osc-ffff9b96b8157800.stats=
            snapshot_time             482212.579396346 secs.nsecs
            start_time                0.000000000 secs.nsecs
            elapsed_time              482212.579396346 secs.nsecs
            req_waittime              3 samples [usecs] 50 61 164 9030
            req_active                3 samples [reqs] 1 1 3 3
            obd_ping                  3 samples [usecs] 50 61 164 9030
            osc.exafs-OST0003-osc-ffff9b96b8157800.stats=
            snapshot_time             482212.579440245 secs.nsecs
            start_time                0.000000000 secs.nsecs
            elapsed_time              482212.579440245 secs.nsecs
            req_waittime              3 samples [usecs] 57 79 196 13090
            req_active                3 samples [reqs] 1 1 3 3
            obd_ping                  3 samples [usecs] 57 79 196 13090
            osc.exafs-OST0004-osc-ffff9b96b8157800.stats=
            snapshot_time             482212.579489618 secs.nsecs
            start_time                0.000000000 secs.nsecs
            elapsed_time              482212.579489618 secs.nsecs
            req_waittime              3 samples [usecs] 76 84 240 19232
            req_active                3 samples [reqs] 1 1 3 3
            obd_ping                  3 samples [usecs] 76 84 240 19232
            osc.exafs-OST0005-osc-ffff9b96b8157800.stats=
            snapshot_time             482212.579536314 secs.nsecs
            start_time                0.000000000 secs.nsecs
            elapsed_time              482212.579536314 secs.nsecs
            req_waittime              3 samples [usecs] 76 82 236 18584
            req_active                3 samples [reqs] 1 1 3 3
            obd_ping                  3 samples [usecs] 76 82 236 18584
            osc.exafs-OST0006-osc-ffff9b96b8157800.stats=
            snapshot_time             482212.579579660 secs.nsecs
            start_time                0.000000000 secs.nsecs
            elapsed_time              482212.579579660 secs.nsecs
            req_waittime              3 samples [usecs] 46 56 158 8388
            req_active                3 samples [reqs] 1 1 3 3
            obd_ping                  3 samples [usecs] 46 56 158 8388
            osc.exafs-OST0007-osc-ffff9b96b8157800.stats=
            snapshot_time             482212.579628342 secs.nsecs
            start_time                0.000000000 secs.nsecs
            elapsed_time              482212.579628342 secs.nsecs
            req_waittime              3 samples [usecs] 60 71 198 13130
            req_active                3 samples [reqs] 1 1 3 3
            obd_ping                  3 samples [usecs] 60 71 198 13130
            

            So, somehow, "ls -l" for 1M files take 25sec even everything comes from caches.
            Without patch, we needed another 8sec for round-trip between clients and servers, but it can downed to 2sec by batched statahead patch.

            sihara Shuichi Ihara added a comment - I did re-ran test "ls -l" (1M x 47001 byte file in single directory) with latest patch serices https://review.whamcloud.com/#/c/fs/lustre-release/+/49291/ master (commit:c74c630) [root@ec01 ~]# clush -w ec01,ai400x2-1-vm[1-4] "echo 3 > /proc/sys/vm/drop_caches" [sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null real 0m33.757s user 0m9.363s sys 0m14.982s patched [root@ec01 ~]# lctl set_param llite.*.statahead_max=1024 llite.*.statahead_batch_max=64 [root@ec01 ~]# clush -w ec01,ai400x2-1-vm[1-4] "echo 3 > /proc/sys/vm/drop_caches" [sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null real 0m27.296s user 0m9.089s sys 0m13.051s I see elapse time downed from 33sec to 27sec with patch. subreqs per batch batchs % cum % 1: 92322 76 76 2: 4742 3 80 4: 115 0 80 8: 657 0 80 16: 10736 8 89 32: 107 0 89 64: 12168 10 100 as far as I see batch_stats, more than 950K (64 * 12168 + 32 * 107 + 16 * 10736) of 1M requests can be batched with more than 16 request aggregations. Maybe, this is not so bad? However, we are still spending more than 27 sec here. Here is additinal tests something like "drop cache; ls -l; ls -l" [root@ec01 ~]# clush -w ec01,ai400x2-1-vm[1-4] "echo 3 > /proc/sys/vm/drop_caches" [sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null real 0m27.385s user 0m8.994s sys 0m13.131s llite.exafs-ffff9b96b8157800.stats= snapshot_time 482063.475018642 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 482063.475018642 secs.nsecs open 1 samples [usecs] 154 154 154 23716 close 1 samples [usecs] 131 131 131 17161 readdir 1223 samples [usecs] 0 38879 258067 1731301375 getattr 1000002 samples [usecs] 2 9770 5776755 3873882865 getxattr 1 samples [usecs] 70 70 70 4900 inode_permission 10000021 samples [usecs] 0 393 1848868 2164824 opencount 1 samples [reqs] 1 1 1 1 mdc.exafs-MDT0002-mdc-ffff9b96b8157800.stats= snapshot_time 482063.474876615 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 482063.474876615 secs.nsecs req_waittime 181839 samples [usecs] 57 2043896 22268808747 10620189011411911 req_active 181839 samples [reqs] 1 23615 644666200 9255887581600 ldlm_ibits_enqueue 6 samples [reqs] 1 1 6 6 mds_close 1 samples [usecs] 110 110 110 12100 mds_readpage 56 samples [usecs] 250 38375 464045 5002627263 mds_batch 119274 samples [usecs] 61 84527 353967107 2052370493893 ldlm_cancel 62501 samples [usecs] 182 725561 21912324500 10613954056275484 obd_ping 1 samples [usecs] 90 90 90 8100 osc.exafs-OST0000-osc-ffff9b96b8157800.stats= snapshot_time 482063.474252382 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 482063.474252382 secs.nsecs req_waittime 132963 samples [usecs] 29 408417 1867410218 500106174718090 req_active 132963 samples [reqs] 1 3612 17237320 44241112018 ldlm_glimpse_enqueue 125139 samples [reqs] 1 1 125139 125139 ost_connect 1 samples [usecs] 97 97 97 9409 ost_disconnect 1 samples [usecs] 253 253 253 64009 ldlm_cancel 7822 samples [usecs] 263 408417 1857609504 500103314555706 osc.exafs-OST0001-osc-ffff9b96b8157800.stats= snapshot_time 482063.474322305 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 482063.474322305 secs.nsecs req_waittime 132958 samples [usecs] 31 400200 1868293807 500417445772075 req_active 132958 samples [reqs] 1 3620 17260697 44413040365 ldlm_glimpse_enqueue 125134 samples [reqs] 1 1 125134 125134 ost_connect 1 samples [usecs] 134 134 134 17956 ost_disconnect 1 samples [usecs] 255 255 255 65025 ldlm_cancel 7822 samples [usecs] 249 400200 1858535688 500414552572698 osc.exafs-OST0002-osc-ffff9b96b8157800.stats= snapshot_time 482063.474385742 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 482063.474385742 secs.nsecs req_waittime 132922 samples [usecs] 30 403354 1585074597 373963525463297 req_active 132922 samples [reqs] 1 3139 14993568 33140340186 ldlm_glimpse_enqueue 125100 samples [reqs] 1 1 125100 125100 ost_connect 1 samples [usecs] 104 104 104 10816 ost_disconnect 1 samples [usecs] 229 229 229 52441 ldlm_cancel 7820 samples [usecs] 137 403354 1576064547 373959847577499 osc.exafs-OST0003-osc-ffff9b96b8157800.stats= snapshot_time 482063.474439284 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 482063.474439284 secs.nsecs req_waittime 132923 samples [usecs] 29 426088 1586055175 373991604658201 req_active 132923 samples [reqs] 1 3131 15010819 33188857463 ldlm_glimpse_enqueue 125100 samples [reqs] 1 1 125100 125100 ost_connect 1 samples [usecs] 140 140 140 19600 ost_disconnect 1 samples [usecs] 230 230 230 52900 ldlm_cancel 7821 samples [usecs] 188 426088 1576987697 373987536028513 osc.exafs-OST0004-osc-ffff9b96b8157800.stats= snapshot_time 482063.474493870 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 482063.474493870 secs.nsecs req_waittime 133220 samples [usecs] 30 402683 1938227551 496692342133501 req_active 133220 samples [reqs] 1 3741 17164793 41132712891 ldlm_glimpse_enqueue 125381 samples [reqs] 1 1 125381 125381 ost_connect 1 samples [usecs] 106 106 106 11236 ost_disconnect 1 samples [usecs] 236 236 236 55696 ldlm_cancel 7837 samples [usecs] 327 402683 1791475030 496280422422484 osc.exafs-OST0005-osc-ffff9b96b8157800.stats= snapshot_time 482063.474549629 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 482063.474549629 secs.nsecs req_waittime 133213 samples [usecs] 29 393951 1936411387 496417875801293 req_active 133213 samples [reqs] 1 3740 17146480 41076362838 ldlm_glimpse_enqueue 125376 samples [reqs] 1 1 125376 125376 ost_connect 1 samples [usecs] 118 118 118 13924 ost_disconnect 1 samples [usecs] 199 199 199 39601 ldlm_cancel 7835 samples [usecs] 315 393951 1790421213 496012032268869 osc.exafs-OST0006-osc-ffff9b96b8157800.stats= snapshot_time 482063.474603969 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 482063.474603969 secs.nsecs req_waittime 132941 samples [usecs] 27 394743 1649085257 398814721042191 req_active 132941 samples [reqs] 1 3382 15661909 36181334593 ldlm_glimpse_enqueue 125116 samples [reqs] 1 1 125116 125116 ost_connect 1 samples [usecs] 99 99 99 9801 ost_disconnect 1 samples [usecs] 187 187 187 34969 ldlm_cancel 7823 samples [usecs] 347 394743 1639760732 398808893873090 osc.exafs-OST0007-osc-ffff9b96b8157800.stats= snapshot_time 482063.474659084 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 482063.474659084 secs.nsecs req_waittime 132954 samples [usecs] 31 392261 1644646149 397155067488985 req_active 132954 samples [reqs] 1 3364 15624468 36037631194 ldlm_glimpse_enqueue 125131 samples [reqs] 1 1 125131 125131 ost_connect 1 samples [usecs] 121 121 121 14641 ost_disconnect 1 samples [usecs] 169 169 169 28561 In fact, we still spent 25sec even without drop cache. there were nothing requests from client to servers. Because everything came from caches on local client. [sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null real 0m25.309s user 0m8.937s sys 0m16.327s llite.exafs-ffff9b96b8157800.stats= snapshot_time 482212.579971327 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 482212.579971327 secs.nsecs open 1 samples [usecs] 189 189 189 35721 close 1 samples [usecs] 159 159 159 25281 readdir 1223 samples [usecs] 0 149 171544 24104442 getattr 1000002 samples [usecs] 3 34 9690974 103987264 getxattr 1 samples [usecs] 2 2 2 4 getxattr_hits 1 samples [reqs] inode_permission 10000021 samples [usecs] 0 17 2009221 2041697 opencount 1 samples [reqs] 1 1 1 1 openclosetime 1 samples [usecs] 102201708 102201708 102201708 10445189118117264 mdc.exafs-MDT0002-mdc-ffff9b96b8157800.stats= snapshot_time 482212.579840759 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 482212.579840759 secs.nsecs req_waittime 5 samples [usecs] 67 132 484 50542 req_active 5 samples [reqs] 1 1 5 5 ldlm_ibits_enqueue 1 samples [reqs] 1 1 1 1 mds_close 1 samples [usecs] 126 126 126 15876 obd_ping 3 samples [usecs] 67 87 226 17242 osc.exafs-OST0000-osc-ffff9b96b8157800.stats= snapshot_time 482212.579286532 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 482212.579286532 secs.nsecs req_waittime 3 samples [usecs] 76 83 235 18441 req_active 3 samples [reqs] 1 1 3 3 obd_ping 3 samples [usecs] 76 83 235 18441 osc.exafs-OST0001-osc-ffff9b96b8157800.stats= snapshot_time 482212.579346652 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 482212.579346652 secs.nsecs req_waittime 3 samples [usecs] 69 78 222 16470 req_active 3 samples [reqs] 1 1 3 3 obd_ping 3 samples [usecs] 69 78 222 16470 osc.exafs-OST0002-osc-ffff9b96b8157800.stats= snapshot_time 482212.579396346 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 482212.579396346 secs.nsecs req_waittime 3 samples [usecs] 50 61 164 9030 req_active 3 samples [reqs] 1 1 3 3 obd_ping 3 samples [usecs] 50 61 164 9030 osc.exafs-OST0003-osc-ffff9b96b8157800.stats= snapshot_time 482212.579440245 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 482212.579440245 secs.nsecs req_waittime 3 samples [usecs] 57 79 196 13090 req_active 3 samples [reqs] 1 1 3 3 obd_ping 3 samples [usecs] 57 79 196 13090 osc.exafs-OST0004-osc-ffff9b96b8157800.stats= snapshot_time 482212.579489618 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 482212.579489618 secs.nsecs req_waittime 3 samples [usecs] 76 84 240 19232 req_active 3 samples [reqs] 1 1 3 3 obd_ping 3 samples [usecs] 76 84 240 19232 osc.exafs-OST0005-osc-ffff9b96b8157800.stats= snapshot_time 482212.579536314 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 482212.579536314 secs.nsecs req_waittime 3 samples [usecs] 76 82 236 18584 req_active 3 samples [reqs] 1 1 3 3 obd_ping 3 samples [usecs] 76 82 236 18584 osc.exafs-OST0006-osc-ffff9b96b8157800.stats= snapshot_time 482212.579579660 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 482212.579579660 secs.nsecs req_waittime 3 samples [usecs] 46 56 158 8388 req_active 3 samples [reqs] 1 1 3 3 obd_ping 3 samples [usecs] 46 56 158 8388 osc.exafs-OST0007-osc-ffff9b96b8157800.stats= snapshot_time 482212.579628342 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 482212.579628342 secs.nsecs req_waittime 3 samples [usecs] 60 71 198 13130 req_active 3 samples [reqs] 1 1 3 3 obd_ping 3 samples [usecs] 60 71 198 13130 So, somehow, "ls -l" for 1M files take 25sec even everything comes from caches. Without patch, we needed another 8sec for round-trip between clients and servers, but it can downed to 2sec by batched statahead patch.

            People

              qian_wc Qian Yingjin
              qian_wc Qian Yingjin
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: