Details

    • New Feature
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 9223372036854775807

    Description

      Batched metadata processing may get a big performance boost.
      Batched statahead can also increase the performance for directory listing operations such as ls.

      Attachments

        Issue Links

          Activity

            [LU-14139] batched statahead processing
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/41220/
            Subject: LU-14139 statahead: add test for batch statahead processing
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: cd1e7bdd903ada4467064eaf8613ec62a358fa1c

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/41220/ Subject: LU-14139 statahead: add test for batch statahead processing Project: fs/lustre-release Branch: master Current Patch Set: Commit: cd1e7bdd903ada4467064eaf8613ec62a358fa1c

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/43707/
            Subject: LU-14139 ptlrpc: grow PtlRPC properly when prepare sub request
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 5a2dfd36f9c2b6c10ab7ba44b0e9e86372623fde

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/43707/ Subject: LU-14139 ptlrpc: grow PtlRPC properly when prepare sub request Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5a2dfd36f9c2b6c10ab7ba44b0e9e86372623fde

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/40945/
            Subject: LU-14139 ptlrpc: grow reply buffer properly for batch request
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 5edb883b44ac707528ce2c0bc812d65b9ffb4a50

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/40945/ Subject: LU-14139 ptlrpc: grow reply buffer properly for batch request Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5edb883b44ac707528ce2c0bc812d65b9ffb4a50

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/40943/
            Subject: LU-14139 statahead: add stats for batch RPC requests
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a20f25d24b5f0ce7b5e77f7c596bffd0450cbdae

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/40943/ Subject: LU-14139 statahead: add stats for batch RPC requests Project: fs/lustre-release Branch: master Current Patch Set: Commit: a20f25d24b5f0ce7b5e77f7c596bffd0450cbdae

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/40720/
            Subject: LU-14139 statahead: batched statahead processing
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 4435d0121f72aac3ad01c98a33b265a496359890

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/40720/ Subject: LU-14139 statahead: batched statahead processing Project: fs/lustre-release Branch: master Current Patch Set: Commit: 4435d0121f72aac3ad01c98a33b265a496359890

            Is there a patch that converts ldlm to use rhashtable?

            adilger Andreas Dilger added a comment - Is there a patch that converts ldlm to use rhashtable?
            qian_wc Qian Yingjin added a comment - - edited

            From the attaching FG, lots of time costs are:
            __ldlm_handle2lock () ->class_handle2object()
            ldlm_resource_get() >cfs_hash_bd_lookup_intent()>ldlm_res_hop_keycmp()

            Each of them costs 5% for data and metadata DLM lock matching on the client side. All takes about 20% of the total sample...
            It seems that the hash for looking up the lock handle and resource take lots of time.
            And ldlm_res_hop_keycmp() reaches 5%, it means there are lots of elements in the same bucket of the hash table. We should increase the hash table and use the resizable hash table.

            For class_handle2object for locks, we should maintain the lock handle per target (osc/mdc) lock namespace, not shared a single global hash table for handle on the client or even the server.

            qian_wc Qian Yingjin added a comment - - edited From the attaching FG, lots of time costs are: __ldlm_handle2lock () ->class_handle2object() ldlm_resource_get() >cfs_hash_bd_lookup_intent() >ldlm_res_hop_keycmp() Each of them costs 5% for data and metadata DLM lock matching on the client side. All takes about 20% of the total sample... It seems that the hash for looking up the lock handle and resource take lots of time. And ldlm_res_hop_keycmp() reaches 5%, it means there are lots of elements in the same bucket of the hash table. We should increase the hash table and use the resizable hash table. For class_handle2object for locks, we should maintain the lock handle per target (osc/mdc) lock namespace, not shared a single global hash table for handle on the client or even the server.

            there are no LNET messages go out in 2nd 'ls -l'.
            I've opened a separate ticket for more investigations. https://jira.whamcloud.com/browse/LU-16365

            sihara Shuichi Ihara added a comment - there are no LNET messages go out in 2nd 'ls -l'. I've opened a separate ticket for more investigations. https://jira.whamcloud.com/browse/LU-16365

            it might be not related to LU-14139 direclty, but still interested in why Lustre still takes 25sec for second 'ls -l'.

            It looks like the client is still sending 125k x 8=1M glimpse RPCs to the OSTs to fetch the file size. I guess this is because the client is not caching all 1M of the OST DLM locks on the objects between "ls -l" calls? Did you check OSC LRU size? I suspect this would have a sudden jump in speed once the number of files is small enough that all the OST DLM locks are cached. 

            As far as I observed, all locks seems to be kept as well. nothing shrinks or re-granting locks between two 'ls -l'.

            [root@ec01 ~]# lctl get_param ldlm.namespaces.*.lock_count ldlm.namespaces.*.lru_size
            ldlm.namespaces.MGC10.0.11.208@o2ib12.lock_count=4
            ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.MGC10.0.11.208@o2ib12.lru_size=3200
            ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lru_size=0
            
            [sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null
            
            real	0m27.249s
            user	0m9.111s
            sys	0m13.544s
            
            [root@ec01 ~]# lctl get_param ldlm.namespaces.*.lock_count ldlm.namespaces.*.lru_size
            ldlm.namespaces.MGC10.0.11.208@o2ib12.lock_count=4
            ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lock_count=2
            ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lock_count=4
            ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lock_count=1000004
            ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lock_count=125089
            ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lock_count=125073
            ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lock_count=125091
            ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lock_count=125085
            ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lock_count=125295
            ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lock_count=125285
            ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lock_count=125103
            ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lock_count=125099
            ldlm.namespaces.MGC10.0.11.208@o2ib12.lru_size=3200
            ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lru_size=2
            ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lru_size=4
            ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lru_size=1000004
            ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lru_size=125089
            ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lru_size=125073
            ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lru_size=125091
            ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lru_size=125085
            ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lru_size=125295
            ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lru_size=125285
            ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lru_size=125103
            ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lru_size=125099
            
            [sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null
            
            real	0m25.615s
            user	0m8.982s
            sys	0m16.588s
            
            [root@ec01 ~]# lctl get_param ldlm.namespaces.*.lock_count ldlm.namespaces.*.lru_size
            ldlm.namespaces.MGC10.0.11.208@o2ib12.lock_count=4
            ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lock_count=2
            ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lock_count=4
            ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lock_count=1000004
            ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lock_count=0
            ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lock_count=125089
            ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lock_count=125073
            ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lock_count=125091
            ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lock_count=125085
            ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lock_count=125295
            ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lock_count=125285
            ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lock_count=125103
            ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lock_count=125099
            ldlm.namespaces.MGC10.0.11.208@o2ib12.lru_size=3200
            ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lru_size=2
            ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lru_size=4
            ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lru_size=1000004
            ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lru_size=0
            ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lru_size=125089
            ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lru_size=125073
            ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lru_size=125091
            ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lru_size=125085
            ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lru_size=125295
            ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lru_size=125285
            ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lru_size=125103
            ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lru_size=125099
            

            Here is same 1M files tests for ext4 on local disk of client.

            [root@ec01 ~]# echo 3 > /proc/sys/vm/drop_caches
            [sihara@ec01 ~]$ time ls -l /tmp/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/  > /dev/null
            
            real	0m16.999s
            user	0m8.956s
            sys	0m5.855s
            [sihara@ec01 ~]$ time ls -l /tmp/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/  > /dev/null
            
            real	0m11.832s
            user	0m8.765s
            sys	0m3.051s
            

            I'm also attaching FG which captured at second 'ls -l' on Lustre.

            sihara Shuichi Ihara added a comment - it might be not related to LU-14139 direclty, but still interested in why Lustre still takes 25sec for second 'ls -l'. It looks like the client is still sending 125k x 8=1M glimpse RPCs to the OSTs to fetch the file size. I guess this is because the client is not caching all 1M of the OST DLM locks on the objects between "ls -l" calls? Did you check OSC LRU size? I suspect this would have a sudden jump in speed once the number of files is small enough that all the OST DLM locks are cached.  As far as I observed, all locks seems to be kept as well. nothing shrinks or re-granting locks between two 'ls -l'. [root@ec01 ~]# lctl get_param ldlm.namespaces.*.lock_count ldlm.namespaces.*.lru_size ldlm.namespaces.MGC10.0.11.208@o2ib12.lock_count=4 ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.MGC10.0.11.208@o2ib12.lru_size=3200 ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lru_size=0 [sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null real 0m27.249s user 0m9.111s sys 0m13.544s [root@ec01 ~]# lctl get_param ldlm.namespaces.*.lock_count ldlm.namespaces.*.lru_size ldlm.namespaces.MGC10.0.11.208@o2ib12.lock_count=4 ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lock_count=2 ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lock_count=4 ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lock_count=1000004 ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lock_count=125089 ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lock_count=125073 ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lock_count=125091 ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lock_count=125085 ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lock_count=125295 ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lock_count=125285 ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lock_count=125103 ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lock_count=125099 ldlm.namespaces.MGC10.0.11.208@o2ib12.lru_size=3200 ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lru_size=2 ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lru_size=4 ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lru_size=1000004 ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lru_size=125089 ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lru_size=125073 ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lru_size=125091 ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lru_size=125085 ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lru_size=125295 ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lru_size=125285 ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lru_size=125103 ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lru_size=125099 [sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null real 0m25.615s user 0m8.982s sys 0m16.588s [root@ec01 ~]# lctl get_param ldlm.namespaces.*.lock_count ldlm.namespaces.*.lru_size ldlm.namespaces.MGC10.0.11.208@o2ib12.lock_count=4 ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lock_count=2 ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lock_count=4 ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lock_count=1000004 ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lock_count=0 ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lock_count=125089 ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lock_count=125073 ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lock_count=125091 ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lock_count=125085 ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lock_count=125295 ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lock_count=125285 ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lock_count=125103 ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lock_count=125099 ldlm.namespaces.MGC10.0.11.208@o2ib12.lru_size=3200 ldlm.namespaces.exafs-MDT0000-mdc-ffff9b96b8157800.lru_size=2 ldlm.namespaces.exafs-MDT0001-mdc-ffff9b96b8157800.lru_size=4 ldlm.namespaces.exafs-MDT0002-mdc-ffff9b96b8157800.lru_size=1000004 ldlm.namespaces.exafs-MDT0003-mdc-ffff9b96b8157800.lru_size=0 ldlm.namespaces.exafs-OST0000-osc-ffff9b96b8157800.lru_size=125089 ldlm.namespaces.exafs-OST0001-osc-ffff9b96b8157800.lru_size=125073 ldlm.namespaces.exafs-OST0002-osc-ffff9b96b8157800.lru_size=125091 ldlm.namespaces.exafs-OST0003-osc-ffff9b96b8157800.lru_size=125085 ldlm.namespaces.exafs-OST0004-osc-ffff9b96b8157800.lru_size=125295 ldlm.namespaces.exafs-OST0005-osc-ffff9b96b8157800.lru_size=125285 ldlm.namespaces.exafs-OST0006-osc-ffff9b96b8157800.lru_size=125103 ldlm.namespaces.exafs-OST0007-osc-ffff9b96b8157800.lru_size=125099 Here is same 1M files tests for ext4 on local disk of client. [root@ec01 ~]# echo 3 > /proc/sys/vm/drop_caches [sihara@ec01 ~]$ time ls -l /tmp/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null real 0m16.999s user 0m8.956s sys 0m5.855s [sihara@ec01 ~]$ time ls -l /tmp/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null real 0m11.832s user 0m8.765s sys 0m3.051s I'm also attaching FG which captured at second 'ls -l' on Lustre.
            sihara Shuichi Ihara added a comment - - edited

            So, somehow, "ls -l" for 1M files take 25sec even everything comes from caches.

            'ls -l' is doing other things after recived results. sorting, writing output, etc. sorting some impacts, but majority of time are still spent 1M times for statx() and getxattr() even all data in the cache.

            1670051344.758188 getxattr("/exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/file.mdtest.0.945973", "system.posix_acl_access", NULL, 0) = -1 ENODATA (No data available) <0.000008>
            1670051344.758214 statx(AT_FDCWD, "/exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/file.mdtest.0.877046", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_MODE|STATX_NLINK|STATX_MTIME|STATX_SIZE, {stx_mask=STATX_MODE|STATX_NLINK|STATX_MTIME|STATX_SIZE, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=47001, ...}) = 0 <0.000013>
            

            Just in case, this is a test resutls with/without caches, but ls doesn't sort for result list.

            [root@ec01 ~]# clush -w ai400x2-1-vm[1-4],ec01 " echo 3 > /proc/sys/vm/drop_caches "
            [sihara@ec01 ~]$ time ls -fl /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null
            
            real	0m20.023s
            user	0m1.903s
            sys	0m13.363s
            [sihara@ec01 ~]$ time ls -fl /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null
            
            real	0m18.164s
            user	0m1.703s
            sys	0m16.417s
            

            still same 2 sec overheads for client and server roundtrips. So, batched statahead reduced that overheads  to 2sec from 8 sec which means 4x speedup.

            sihara Shuichi Ihara added a comment - - edited So, somehow, "ls -l" for 1M files take 25sec even everything comes from caches. 'ls -l' is doing other things after recived results. sorting, writing output, etc. sorting some impacts, but majority of time are still spent 1M times for statx() and getxattr() even all data in the cache. 1670051344.758188 getxattr("/exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/file.mdtest.0.945973", "system.posix_acl_access", NULL, 0) = -1 ENODATA (No data available) <0.000008> 1670051344.758214 statx(AT_FDCWD, "/exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/file.mdtest.0.877046", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_MODE|STATX_NLINK|STATX_MTIME|STATX_SIZE, {stx_mask=STATX_MODE|STATX_NLINK|STATX_MTIME|STATX_SIZE, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=47001, ...}) = 0 <0.000013> Just in case, this is a test resutls with/without caches, but ls doesn't sort for result list. [root@ec01 ~]# clush -w ai400x2-1-vm[1-4],ec01 " echo 3 > /proc/sys/vm/drop_caches " [sihara@ec01 ~]$ time ls -fl /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null real 0m20.023s user 0m1.903s sys 0m13.363s [sihara@ec01 ~]$ time ls -fl /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ > /dev/null real 0m18.164s user 0m1.703s sys 0m16.417s still same 2 sec overheads for client and server roundtrips. So, batched statahead reduced that overheads  to 2sec from 8 sec which means 4x speedup.

            People

              qian_wc Qian Yingjin
              qian_wc Qian Yingjin
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: