[LU-1236] Large directory feature isnot enabled on this filesystem Created: 19/Mar/12  Updated: 20/Mar/12  Resolved: 20/Mar/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0, Lustre 2.3.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Richard Henwood (Inactive) Assignee: Liang Zhen (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 6429

 Description   

I'm benchmarking a SSD device using mds-survey. If I run with:

$ dir_count=1 thrlo=8 thrhi=10000 file_count=100000 sh mds-survey

The following is seen in dmesg immediatly before the machine hangs:

...
1332181376 206.76.192.63 6665 Directory (ino: 130547715) index full, reach max htree level :2
1332181376 206.76.192.63 6665 LDISKFS-fs warning (device rsXX0): ldiskfs_dx_add_entry:
1332181376 206.76.192.63 6665 Large directory feature isnot enabled on this filesystem
1332181376 206.76.192.63 6665 LDISKFS-fs warning (device rsXX0): ldiskfs_dx_add_entry: 
1332181376 206.76.192.63 6665 Directory (ino: 130547715) index full, reach max htree level :2
1332181376 206.76.192.63 6665 LDISKFS-fs warning (device rsXX0): ldiskfs_dx_add_entry:
1332181376 206.76.192.63 6665 Large directory feature isnot enabled on this filesystem
1332181398 206.76.192.63 6665 LustreError: 5111:0:(echo_client.c:1768:echo_lookup_object()) Can not lookup child 436236872: rc = -2
1332181398 206.76.192.63 6665 LustreError: 5111:0:(echo_client.c:1768:echo_lookup_object()) Skipped 1 previous similar message
1332181399 206.76.192.63 6665 LustreError: 5144:0:(echo_client.c:1768:echo_lookup_object()) Can not lookup child 989887001: rc = -2
1332181400 206.76.192.63 6665 LustreError: 5087:0:(echo_client.c:1768:echo_lookup_object()) Can not lookup child 33586437: rc = -2
1332181400 206.76.192.63 6665 LustreError: 5087:0:(echo_client.c:1768:echo_lookup_object()) Skipped 9 previous similar messages
1332181402 206.76.192.63 6665 LustreError: 5197:0:(echo_client.c:1768:echo_lookup_object()) Can not lookup child 1879083690: rc = -2
1332181402 206.76.192.63 6665 LustreError: 5197:0:(echo_client.c:1768:echo_lookup_object()) Skipped 57 previous similar messages
1332181970 206.76.192.63 6665 LustreError: 5486:0:(echo_client.c:1592:echo_md_lookup()) lookup 18446744071562098618: rc = -2
1332181970 206.76.192.63 6665 LustreError: 5486:0:(echo_client.c:1592:echo_md_lookup()) Skipped 127 previous similar messages
1332181970 206.76.192.63 6665 LustreError: 5486:0:(echo_client.c:1706:echo_getattr_object()) Can't find child 18446744071562098618: rc = -2
1332181970 206.76.192.63 6665 LustreError: 5486:0:(echo_client.c:1706:echo_getattr_object()) Skipped 4 previous similar messages
1332181974 206.76.192.63 6665 LustreError: 5571:0:(echo_client.c:1592:echo_md_lookup()) lookup 18446744072988161445: rc = -2
1332181974 206.76.192.63 6665 LustreError: 5571:0:(echo_client.c:1706:echo_getattr_object()) Can't find child 18446744072988161445: rc = -2
1332181988 206.76.192.63 6665 LustreError: 5538:0:(echo_client.c:1592:echo_md_lookup()) lookup 18446744072434514081: rc = -2
1332181988 206.76.192.63 6665 LustreError: 5538:0:(echo_client.c:1592:echo_md_lookup()) Skipped 1 previous similar message
1332181988 206.76.192.63 6665 LustreError: 5538:0:(echo_client.c:1706:echo_getattr_object()) Can't find child 18446744072434514081: rc = -2
1332181988 206.76.192.63 6665 LustreError: 5538:0:(echo_client.c:1706:echo_getattr_object()) Skipped 1 previous similar message
1332182004 206.76.192.63 6665 LustreError: 5531:0:(echo_client.c:1592:echo_md_lookup()) lookup 18446744072317071604: rc = -2
1332182004 206.76.192.63 6665 LustreError: 5531:0:(echo_client.c:1592:echo_md_lookup()) Skipped 1 previous similar message
1332182004 206.76.192.63 6665 LustreError: 5531:0:(echo_client.c:1706:echo_getattr_object()) Can't find child 18446744072317071604: rc = -2
1332182004 206.76.192.63 6665 LustreError: 5531:0:(echo_client.c:1706:echo_getattr_object()) Skipped 1 previous similar message
1332182023 206.76.192.63 6665 LustreError: 5554:0:(echo_client.c:1706:echo_getattr_object()) Can't find child 18446744072702952620: rc = -2
1332182023 206.76.192.63 6665 LustreError: 5554:0:(echo_client.c:1706:echo_getattr_object()) Skipped 3 previous similar messages
1332182038 206.76.192.63 6665 LustreError: 5589:0:(echo_client.c:1592:echo_md_lookup()) lookup 18446744073290153182: rc = -2
1332182038 206.76.192.63 6665 LustreError: 5589:0:(echo_client.c:1592:echo_md_lookup()) Skipped 20 previous similar messages
1332182049 206.76.192.63 6665 LustreError: 5451:0:(echo_client.c:1706:echo_getattr_object()) Can't find child 1560309174: rc = -2
1332182049 206.76.192.63 6665 LustreError: 5451:0:(echo_client.c:1706:echo_getattr_object()) Skipped 18 previous similar messages
1332182081 206.76.192.63 6665 LustreError: 5436:0:(echo_client.c:1706:echo_getattr_object()) Can't find child 1308658240: rc = -2
1332182081 206.76.192.63 6665 LustreError: 5436:0:(echo_client.c:1706:echo_getattr_object()) Skipped 110 previous similar messages
1332182105 206.76.192.63 6665 LustreError: 5500:0:(echo_client.c:1592:echo_md_lookup()) lookup 18446744071796981746: rc = -2 
1332182105 206.76.192.63 6665 LustreError: 5500:0:(echo_client.c:1592:echo_md_lookup()) Skipped 226 previous similar messages


This log message originates from:
./ldiskfs/kernel_patches/patches/ext4_pdirop-rhel6.patch



 Comments   
Comment by Richard Henwood (Inactive) [ 19/Mar/12 ]

The mds-survey command creates a total of 100000 files in a single directory. It steps through a thread count from 8 to 10000, timing how long with a given number of threads, Create, Lookup, getAttr and setxAttr take.

Seems the problem occurs at the 128 thread mark:

~# dir_count=1 thrlo=8 thrhi=10000 file_count=100000 sh /usr/bin/mds-survey
Mon Mar 19 14:24:18 CDT 2012 /usr/bin/mds-survey from mds51.ls4.tacc.utexas.edu
mdt 1 file  100000 dir    1 thr    8 create 49207.37 [34994.58,55993.90] lookup 560559.72 [560559.72,560559.72] md_getattr 589089.04 [589089.04,589089.04] setxattr 341067.03 [359985.60,359985.60] destroy 132062.28 [124984.88,158982.04] 
mdt 1 file  100000 dir    1 thr   16 create 90094.03 [57996.58,165989.21] lookup 445579.31 [442979.62,442979.62] md_getattr 437745.60 [436985.14,436985.14] setxattr 158132.21 [123991.94,169978.24] destroy 72703.35 [53996.65,97988.54] 
mdt 1 file  100000 dir    1 thr   32 create 82619.38 [62995.15,106993.37] lookup 429201.01 [419986.14,435964.25] md_getattr 405683.99 [401974.68,406984.13] setxattr 102304.06 [82989.54,153975.52] destroy 49015.01 [35995.10,64992.07] 
mdt 1 file  100000 dir    1 thr   64 create 57911.35 [43997.01,74995.13] lookup 411225.64 [404976.92,410975.75] md_getattr 127483.38 [   0.00,414142.72] setxattr 110405.23 [90977.98,136990.41] destroy 43722.84 [31996.03,59969.24] 
mdt 1 file  100000 dir    1 thr  128 create             ERROR lookup             ERROR md_getattr 

The host was rest during the crash, but the kernel messages again looked like:

1332185881 206.76.192.63 6665 LustreError: 3209:0:(echo_client.c:1451:echo_md_create_internal()) Can not create child [0x2000004c6:0x123f7:0x0]: rc = -28
1332185881 206.76.192.63 6665 LustreError: 3209:0:(echo_client.c:1562:echo_create_md_object()) Can not create child 18446744072199676837: rc = -28
1332185881 206.76.192.63 6665 LustreError: 3218:0:(echo_client.c:1451:echo_md_create_internal()) Can not create child [0x20000047a:0x120a9:0x0]: rc = -28
1332185881 206.76.192.63 6665 LustreError: 3218:0:(echo_client.c:1562:echo_create_md_object()) Can not create child 18446744072501665870: rc = -28
1332185881 206.76.192.63 6665 LustreError: 3218:0:(echo_client.c:1562:echo_create_md_object()) Skipped 1 previous similar message
1332185882 206.76.192.63 6665 LustreError: 3236:0:(echo_client.c:1451:echo_md_create_internal()) Can not create child [0x200000498:0x1237f:0x0]: rc = -28
1332185882 206.76.192.63 6665 LustreError: 3236:0:(echo_client.c:1451:echo_md_create_internal()) Skipped 6 previous similar messages
1332185882 206.76.192.63 6665 LustreError: 3236:0:(echo_client.c:1562:echo_create_md_object()) Can not create child 18446744073105646354: rc = -28
1332185882 206.76.192.63 6665 LustreError: 3236:0:(echo_client.c:1562:echo_create_md_object()) Skipped 5 previous similar messages
1332185883 206.76.192.63 6665 LustreError: 3213:0:(echo_client.c:1451:echo_md_create_internal()) Can not create child [0x2000004d4:0x124d0:0x0]: rc = -28
1332185883 206.76.192.63 6665 LustreError: 3234:0:(echo_client.c:1562:echo_create_md_object()) Can not create child 18446744073038537898: rc = -28
1332185883 206.76.192.63 6665 LustreError: 3234:0:(echo_client.c:1562:echo_create_md_object()) Skipped 15 previous similar messages
1332185883 206.76.192.63 6665 LustreError: 3213:0:(echo_client.c:1451:echo_md_create_internal()) Skipped 20 previous similar messages
1332185885 206.76.192.63 6665 LustreError: 3153:0:(echo_client.c:1451:echo_md_create_internal()) Can not create child [0x200000491:0x128c2:0x0]: rc = -28
1332185885 206.76.192.63 6665 LustreError: 3153:0:(echo_client.c:1451:echo_md_create_internal()) Skipped 63 previous similar messages
1332185885 206.76.192.63 6665 LustreError: 3230:0:(echo_client.c:1562:echo_create_md_object()) Can not create child 18446744072904321222: rc = -28
1332185885 206.76.192.63 6665 LustreError: 3230:0:(echo_client.c:1562:echo_create_md_object()) Skipped 69 previous similar messages
1332185909 206.76.192.63 6665 LustreError: 3505:0:(echo_client.c:1768:echo_lookup_object()) Can not lookup child 18446744072669436954: rc = -2
1332185909 206.76.192.63 6665 LustreError: 3487:0:(echo_client.c:1768:echo_lookup_object()) Can not lookup child 18446744072065458648: rc = -2
1332185910 206.76.192.63 6665 LustreError: 3485:0:(echo_client.c:1768:echo_lookup_object()) Can not lookup child 18446744071998350893: rc = -2
1332185910 206.76.192.63 6665 LustreError: 3485:0:(echo_client.c:1768:echo_lookup_object()) Skipped 5 previous similar messages
1332185911 206.76.192.63 6665 LustreError: 3413:0:(echo_client.c:1768:echo_lookup_object()) Can not lookup child 167846909: rc = -2
1332185911 206.76.192.63 6665 LustreError: 3413:0:(echo_client.c:1768:echo_lookup_object()) Skipped 17 previous similar messages
1332186150 206.76.192.63 6665 LustreError: 3585:0:(echo_client.c:1592:echo_md_lookup()) lookup 1140923015: rc = -2
1332186150 206.76.192.63 6665 LustreError: 3585:0:(echo_client.c:1706:echo_getattr_object()) Can't find child 1140923015: rc = -2
1332186150 206.76.192.63 6665 LustreError: 3599:0:(echo_client.c:1592:echo_md_lookup()) lookup 1610686777: rc = -2
1332186150 206.76.192.63 6665 LustreError: 3599:0:(echo_client.c:1706:echo_getattr_object()) Can't find child 1610686777: rc = -2
1332186151 206.76.192.63 6665 LustreError: 3556:0:(echo_client.c:1592:echo_md_lookup()) lookup 167846909: rc = -2
1332186151 206.76.192.63 6665 LustreError: 3556:0:(echo_client.c:1592:echo_md_lookup()) Skipped 13 previous similar messages
1332186151 206.76.192.63 6665 LustreError: 3556:0:(echo_client.c:1706:echo_getattr_object()) Can't find child 167846909: rc = -2
1332186151 206.76.192.63 6665 LustreError: 3556:0:(echo_client.c:1706:echo_getattr_object()) Skipped 13 previous similar messages
1332186152 206.76.192.63 6665 LustreError: 3664:0:(echo_client.c:1592:echo_md_lookup()) lookup 18446744073206311328: rc = -2
1332186152 206.76.192.63 6665 LustreError: 3664:0:(echo_client.c:1592:echo_md_lookup()) Skipped 33 previous similar messages
1332186152 206.76.192.63 6665 LustreError: 3583:0:(echo_client.c:1706:echo_getattr_object()) Can't find child 1073819696: rc = -2
1332186152 206.76.192.63 6665 LustreError: 3583:0:(echo_client.c:1706:echo_getattr_object()) Skipped 33 previous similar messages
1332186154 206.76.192.63 6665 LustreError: 3643:0:(echo_client.c:1592:echo_md_lookup()) lookup 18446744072501665870: rc = -2
1332186154 206.76.192.63 6665 LustreError: 3582:0:(echo_client.c:1706:echo_getattr_object()) Can't find child 1040263450: rc = -2
1332186154 206.76.192.63 6665 LustreError: 3582:0:(echo_client.c:1706:echo_getattr_object()) Skipped 65 previous similar messages
1332186154 206.76.192.63 6665 LustreError: 3643:0:(echo_client.c:1592:echo_md_lookup()) Skipped 66 previous similar messages
Comment by Andreas Dilger [ 19/Mar/12 ]

Are you sure it isn't creating 10k files per thread? That would make this a 1B file test, which is definitely beyond the default limits for htree directories.

The reason that we implemented the check/limit for directories over 2 levels is because the e2fsck support for >2 htree levels and >2GB size is not implemented yet (LU-896).

That functionality needs to be implemented before this can be enabled in production, but it isn't a huge problem in most cases since the current limit of ~10-15M files is enough for most users.

Comment by Liang Zhen (Inactive) [ 20/Mar/12 ]

As Andreas said, it will be enabled in LU-896, so we can just close it.

Comment by Minh Diep [ 20/Mar/12 ]

The file_count is per threads. make sure you have enough inode for $thr * $file_count

Generated at Sat Feb 10 01:14:49 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.