Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4543

Failure on test suite conf-sanity test_32b: list verification failed

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0
    • None
    • 3
    • 12417

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/055b1aba-842f-11e3-a854-52540035b04c.

      The sub-test test_32b failed with the following error:

      --- /tmp/t32/list.orig	2014-01-22 21:23:34.000000000 -0800
      +++ /tmp/t32/list	2014-01-22 21:23:35.000000000 -0800
      @@ -5,7 +5,6 @@
       
       
       
      -
       .:
       ./init.d:
       ./rc0.d:
      @@ -14,10 +13,8 @@
       ./rc3.d:
       ./rc4.d:
       ./rc5.d:
      -./rc6.d:
      -total 100
       total 100
      -total 102893
      +total 102883
       total 7582
       total 98
       total 98
      @@ -31,7 +28,6 @@
       144115205255725157 drwxr-xr-x 2     0     0     10752 1381265574 rc3.d
       144115205255725320 drwxr-xr-x 2     0     0     10752 1381265574 rc4.d
       144115205255725057 drwxr-xr-x 2     0     0     10752 1381265574 rc5.d
      -144115205255725422 drwxr-xr-x 2     0     0     10752 1381265574 rc6.d
       144115205255725565 lrwxrwxrwx 1 0 0 11 1381266338 S99local -> ../rc.local
       144115205255725615 lrwxrwxrwx 1 0 0 11 1381266339 S99local -> ../rc.local
       144115205255725714 lrwxrwxrwx 1 0 0 11 1381266340 S99local -> ../rc.local
      @@ -41,10 +37,7 @@
       144115205255725242 -rwxr-xr-x 1 0 0  1228 1340359954 sysstat
       144115205255725265 -rwxr-xr-x 1 0 0 12412 1361491346 rdma
       144115205255725266 -rwxr-xr-x 1 0 0  1288 1361594386 abrt-ccpp
      -144115205255725487 lrwxrwxrwx 1 0 0 13 1381266337 K05atd -> ../init.d/atd
       144115205255725535 lrwxrwxrwx 1 0 0 13 1381266338 K05atd -> ../init.d/atd
      -144115205255725492 lrwxrwxrwx 1 0 0 13 1381266338 K60nfs -> ../init.d/nfs
      -144115205255725518 lrwxrwxrwx 1 0 0 13 1381266338 K99zfs -> ../init.d/zfs
       144115205255725536 lrwxrwxrwx 1 0 0 13 1381266338 S01zfs -> ../init.d/zfs
       144115205255725587 lrwxrwxrwx 1 0 0 13 1381266338 S01zfs -> ../init.d/zfs
       144115205255725549 lrwxrwxrwx 1 0 0 13 1381266338 S30nfs -> ../init.d/nfs
      @@ -63,13 +56,7 @@
       144115205255725789 lrwxrwxrwx 1 0 0 13 1381266340 S95atd -> ../init.d/atd
       144115205255725797 lrwxrwxrwx 1 0 0 13 1381266341 S30nfs -> ../init.d/nfs
       144115205255725262 -r-xr-xr-x 1 0 0  1340 1361514075 blk-availability
      -144115205255725483 lrwxrwxrwx 1 0 0 14 1381266337 K74ntpd -> ../init.d/ntpd
      -144115205255725485 lrwxrwxrwx 1 0 0 14 1381266337 K88sssd -> ../init.d/sssd
      -144115205255725481 lrwxrwxrwx 1 0 0 14 1381266337 K95rdma -> ../init.d/rdma
      -144115205255725517 lrwxrwxrwx 1 0 0 14 1381266338 K25sshd -> ../init.d/sshd
      -144115205255725501 lrwxrwxrwx 1 0 0 14 1381266338 K99rngd -> ../init.d/rngd
       144115205255725548 lrwxrwxrwx 1 0 0 14 1381266338 K99rngd -> ../init.d/rngd
      -144115205255725512 lrwxrwxrwx 1 0 0 14 1381266338 S01reboot -> ../init.d/halt
       144115205255725552 lrwxrwxrwx 1 0 0 14 1381266338 S05rdma -> ../init.d/rdma
       144115205255725527 lrwxrwxrwx 1 0 0 14 1381266338 S13sssd -> ../init.d/sssd
       144115205255725576 lrwxrwxrwx 1 0 0 14 1381266338 S13sssd -> ../init.d/sssd
      @@ -103,17 +90,9 @@
       144115205255725255 -rwxr-xr-x 1 0 0 14175 1378328753 powerman
       144115205255725237 -rwxr-xr-x 1 0 0 14578 1298801889 munge
       144115205255725218 -rwxr-xr-x 1 0 0  1513 1369892875 rdisc
      -144115205255725478 lrwxrwxrwx 1 0 0 15 1381266337 K75ibacm -> ../init.d/ibacm
      -144115205255725474 lrwxrwxrwx 1 0 0 15 1381266337 K75netfs -> ../init.d/netfs
      -144115205255725511 lrwxrwxrwx 1 0 0 15 1381266338 K16abrtd -> ../init.d/abrtd
       144115205255725560 lrwxrwxrwx 1 0 0 15 1381266338 K16abrtd -> ../init.d/abrtd
      -144115205255725508 lrwxrwxrwx 1 0 0 15 1381266338 K60crond -> ../init.d/crond
      -144115205255725499 lrwxrwxrwx 1 0 0 15 1381266338 K60munge -> ../init.d/munge
      -144115205255725513 lrwxrwxrwx 1 0 0 15 1381266338 K74acpid -> ../init.d/acpid
       144115205255725525 lrwxrwxrwx 1 0 0 15 1381266338 K75netfs -> ../init.d/netfs
      -144115205255725507 lrwxrwxrwx 1 0 0 15 1381266338 K80kdump -> ../init.d/kdump
       144115205255725557 lrwxrwxrwx 1 0 0 15 1381266338 K80kdump -> ../init.d/kdump
      -144115205255725502 lrwxrwxrwx 1 0 0 15 1381266338 K89rdisc -> ../init.d/rdisc
       144115205255725550 lrwxrwxrwx 1 0 0 15 1381266338 K89rdisc -> ../init.d/rdisc
       144115205255725526 lrwxrwxrwx 1 0 0 15 1381266338 S25ibacm -> ../init.d/ibacm
       144115205255725574 lrwxrwxrwx 1 0 0 15 1381266338 S25ibacm -> ../init.d/ibacm
      @@ -161,17 +140,11 @@
       144115205255725814 lrwxrwxrwx 1 0 0 15 1381266341 S90crond -> ../init.d/crond
       144115205255725227 -rwxr-xr-x 1 0 0  1556 1342512140 psacct
       144115205255725219 -rwxr-xr-x 1 0 0 16034 1370426815 kdump
      -144115205255725488 lrwxrwxrwx 1 0 0 16 1381266337 K72autofs -> ../init.d/autofs
      -144115205255725519 lrwxrwxrwx 1 0 0 16 1381266338 K01smartd -> ../init.d/smartd
       144115205255725568 lrwxrwxrwx 1 0 0 16 1381266338 K01smartd -> ../init.d/smartd
      -144115205255725495 lrwxrwxrwx 1 0 0 16 1381266338 K10psacct -> ../init.d/psacct
       144115205255725545 lrwxrwxrwx 1 0 0 16 1381266338 K10psacct -> ../init.d/psacct
      -144115205255725503 lrwxrwxrwx 1 0 0 16 1381266338 K50xinetd -> ../init.d/xinetd
       144115205255725555 lrwxrwxrwx 1 0 0 16 1381266338 K50xinetd -> ../init.d/xinetd
       144115205255725538 lrwxrwxrwx 1 0 0 16 1381266338 K72autofs -> ../init.d/autofs
      -144115205255725510 lrwxrwxrwx 1 0 0 16 1381266338 K85opensm -> ../init.d/opensm
       144115205255725559 lrwxrwxrwx 1 0 0 16 1381266338 K85opensm -> ../init.d/opensm
      -144115205255725494 lrwxrwxrwx 1 0 0 16 1381266338 K88auditd -> ../init.d/auditd
       144115205255725531 lrwxrwxrwx 1 0 0 16 1381266338 S11auditd -> ../init.d/auditd
       144115205255725583 lrwxrwxrwx 1 0 0 16 1381266338 S11auditd -> ../init.d/auditd
       144115205255725581 lrwxrwxrwx 1 0 0 16 1381266338 S28autofs -> ../init.d/autofs
      @@ -208,17 +181,9 @@
       144115205255725211 -rwxr-xr-x 1 0 0  1642 1361594386 abrt-oops
       144115205255725228 -rwxr-xr-x 1 0 0 16501 1380941397 lustre
       144115205255725267 -rwxr-xr-x 1 0 0  1698 1361511534 sandbox
      -144115205255725514 lrwxrwxrwx 1 0 0 17 1381266338 K75ntpdate -> ../init.d/ntpdate
       144115205255725562 lrwxrwxrwx 1 0 0 17 1381266338 K75ntpdate -> ../init.d/ntpdate
      -144115205255725516 lrwxrwxrwx 1 0 0 17 1381266338 K85rpcgssd -> ../init.d/rpcgssd
       144115205255725564 lrwxrwxrwx 1 0 0 17 1381266338 K85rpcgssd -> ../init.d/rpcgssd
      -144115205255725500 lrwxrwxrwx 1 0 0 17 1381266338 K86nfslock -> ../init.d/nfslock
       144115205255725547 lrwxrwxrwx 1 0 0 17 1381266338 K86nfslock -> ../init.d/nfslock
      -144115205255725496 lrwxrwxrwx 1 0 0 17 1381266338 K87rpcbind -> ../init.d/rpcbind
      -144115205255725498 lrwxrwxrwx 1 0 0 17 1381266338 K88rsyslog -> ../init.d/rsyslog
      -144115205255725506 lrwxrwxrwx 1 0 0 17 1381266338 K90network -> ../init.d/network
      -144115205255725522 lrwxrwxrwx 1 0 0 17 1381266338 K99sysstat -> ../init.d/sysstat
      -144115205255725497 lrwxrwxrwx 1 0 0 17 1381266338 S00killall -> ../init.d/killall
       144115205255725539 lrwxrwxrwx 1 0 0 17 1381266338 S01sysstat -> ../init.d/sysstat
       144115205255725590 lrwxrwxrwx 1 0 0 17 1381266338 S01sysstat -> ../init.d/sysstat
       144115205255725561 lrwxrwxrwx 1 0 0 17 1381266338 S10network -> ../init.d/network
      @@ -263,10 +228,6 @@
       144115205255725244 -rwxr-xr-x 1 0 0  1791 1361493846 ibacm
       144115205255725217 -rwxr-xr-x 1 0 0  1801 1311100147 haldaemon
       144115205255725225 -rwxr-xr-x 1 0 0  1808 1324140347 rngd
      -144115205255725484 lrwxrwxrwx 1 0 0 18 1381266337 K05powerman -> ../init.d/powerman
      -144115205255725490 lrwxrwxrwx 1 0 0 18 1381266337 K30sendmail -> ../init.d/sendmail
      -144115205255725480 lrwxrwxrwx 1 0 0 18 1381266337 K92iptables -> ../init.d/iptables
      -144115205255725477 lrwxrwxrwx 1 0 0 18 1381266337 K99cpuspeed -> ../init.d/cpuspeed
       144115205255725537 lrwxrwxrwx 1 0 0 18 1381266338 S08iptables -> ../init.d/iptables
       144115205255725588 lrwxrwxrwx 1 0 0 18 1381266338 S08iptables -> ../init.d/iptables
       144115205255725541 lrwxrwxrwx 1 0 0 18 1381266338 S13cpuspeed -> ../init.d/cpuspeed
      @@ -295,20 +256,12 @@
       144115205255725209 -rwxr-xr-x 1 0 0  1822 1361511534 restorecond
       144115205255725234 -rwxr-xr-x 1 0 0  1866 1357830871 ntpdate
       144115205255725221 -rwxr-xr-x 1 0 0  1908 1373367869 irqbalance
      -144115205255725486 lrwxrwxrwx 1 0 0 19 1381266337 K16abrt-ccpp -> ../init.d/abrt-ccpp
      -144115205255725473 lrwxrwxrwx 1 0 0 19 1381266337 K75udev-post -> ../init.d/udev-post
      -144115205255725491 lrwxrwxrwx 1 0 0 19 1381266337 K85rpcidmapd -> ../init.d/rpcidmapd
      -144115205255725515 lrwxrwxrwx 1 0 0 19 1381266338 K10saslauthd -> ../init.d/saslauthd
       144115205255725563 lrwxrwxrwx 1 0 0 19 1381266338 K10saslauthd -> ../init.d/saslauthd
       144115205255725533 lrwxrwxrwx 1 0 0 19 1381266338 K16abrt-ccpp -> ../init.d/abrt-ccpp
       144115205255725585 lrwxrwxrwx 1 0 0 19 1381266338 K16abrt-ccpp -> ../init.d/abrt-ccpp
      -144115205255725520 lrwxrwxrwx 1 0 0 19 1381266338 K74haldaemon -> ../init.d/haldaemon
       144115205255725569 lrwxrwxrwx 1 0 0 19 1381266338 K74haldaemon -> ../init.d/haldaemon
      -144115205255725505 lrwxrwxrwx 1 0 0 19 1381266338 K75quota_nld -> ../init.d/quota_nld
       144115205255725556 lrwxrwxrwx 1 0 0 19 1381266338 K75quota_nld -> ../init.d/quota_nld
      -144115205255725521 lrwxrwxrwx 1 0 0 19 1381266338 K85mdmonitor -> ../init.d/mdmonitor
       144115205255725542 lrwxrwxrwx 1 0 0 19 1381266338 K85rpcidmapd -> ../init.d/rpcidmapd
      -144115205255725493 lrwxrwxrwx 1 0 0 19 1381266338 K92ip6tables -> ../init.d/ip6tables
       144115205255725553 lrwxrwxrwx 1 0 0 19 1381266338 S08ip6tables -> ../init.d/ip6tables
       144115205255725571 lrwxrwxrwx 1 0 0 19 1381266338 S15mdmonitor -> ../init.d/mdmonitor
       144115205255725579 lrwxrwxrwx 1 0 0 19 1381266338 S18rpcidmapd -> ../init.d/rpcidmapd
      @@ -354,10 +307,6 @@
       144115205255725226 -rwxr-xr-x 1 0 0  1923 1357830871 ntpd
       144115205255725319 -rwxr-xr-x 1     0     0     19472 1375708884 rc.sysinit
       144115205255725214 -rwxr-xr-x 1 0 0  2011 1357749811 rsyslog
      -144115205255725479 lrwxrwxrwx 1 0 0 20 1381266337 K50netconsole -> ../init.d/netconsole
      -144115205255725476 lrwxrwxrwx 1 0 0 20 1381266337 K69rpcsvcgssd -> ../init.d/rpcsvcgssd
      -144115205255725475 lrwxrwxrwx 1 0 0 20 1381266337 K85messagebus -> ../init.d/messagebus
      -144115205255725489 lrwxrwxrwx 1 0 0 20 1381266337 K87irqbalance -> ../init.d/irqbalance
       144115205255725530 lrwxrwxrwx 1 0 0 20 1381266338 K50netconsole -> ../init.d/netconsole
       144115205255725582 lrwxrwxrwx 1 0 0 20 1381266338 K50netconsole -> ../init.d/netconsole
       144115205255725528 lrwxrwxrwx 1 0 0 20 1381266338 K69rpcsvcgssd -> ../init.d/rpcsvcgssd
      @@ -386,7 +335,6 @@
       144115205255725212 -rwxr-xr-x 1 0 0  2056 1353412378 saslauthd
       144115205255725253 -rwxr-xr-x 1 0 0  2062 1327931794 atd
       144115205255725247 -rwxr-xr-x 1 0 0  2073 1361495981 rpcbind
      -144115205255725509 lrwxrwxrwx 1 0 0 21 1381266338 K87restorecond -> ../init.d/restorecond
       144115205255725558 lrwxrwxrwx 1 0 0 21 1381266338 K87restorecond -> ../init.d/restorecond
       144115205255725609 lrwxrwxrwx 1 0 0 21 1381266339 K87restorecond -> ../init.d/restorecond
       144115205255725658 lrwxrwxrwx 1 0 0 21 1381266339 K87restorecond -> ../init.d/restorecond
      @@ -396,7 +344,6 @@
       144115205255725261 -r-xr-xr-x 1 0 0  2134 1361514075 lvm2-lvmetad
       144115205255725216 -rwxr-xr-x 1 0 0  2200 1347555868 messagebus
       144115205255725421 -rwxr-xr-x 1     0     0       220 1375708884 rc.local
      -144115205255725482 lrwxrwxrwx 1 0 0 22 1381266337 K99lvm2-monitor -> ../init.d/lvm2-monitor
       144115205255725543 lrwxrwxrwx 1 0 0 22 1381266338 S02lvm2-monitor -> ../init.d/lvm2-monitor
       144115205255725630 lrwxrwxrwx 1 0 0 22 1381266339 K99lvm2-monitor -> ../init.d/lvm2-monitor
       144115205255725594 lrwxrwxrwx 1 0 0 22 1381266339 S02lvm2-monitor -> ../init.d/lvm2-monitor
      @@ -408,7 +355,6 @@
       144115205255725233 -rwxr-xr-x 1 0 0  2464 1361508966 rpcsvcgssd
       144115205255725243 -rwxr-xr-x 1 0 0  2518 1361508966 rpcgssd
       144115205255725220 -rwxr-xr-x 1 0 0  2571 1367936965 mdmonitor
      -144115205255725504 lrwxrwxrwx 1 0 0 26 1381266338 K75blk-availability -> ../init.d/blk-availability
       144115205255725551 lrwxrwxrwx 1 0 0 26 1381266338 S25blk-availability -> ../init.d/blk-availability
       144115205255725653 lrwxrwxrwx 1 0 0 26 1381266339 K75blk-availability -> ../init.d/blk-availability
       144115205255725602 lrwxrwxrwx 1 0 0 26 1381266339 S25blk-availability -> ../init.d/blk-availability
       conf-sanity test_32b: @@@@@@ FAIL: list verification failed 
      
      test_32b failed with 1
      

      Attachments

        Issue Links

          Activity

            [LU-4543] Failure on test suite conf-sanity test_32b: list verification failed

            Patch landed to Master. Please reopen ticket if more work is needed under this ticket.

            jlevi Jodi Levi (Inactive) added a comment - Patch landed to Master. Please reopen ticket if more work is needed under this ticket.

            Thanks Sarah.

            bzzz Alex Zhuravlev added a comment - Thanks Sarah.
            sarah Sarah Liu added a comment - Hello Alex, The patch works: https://maloo.whamcloud.com/test_sessions/4cc368a8-9352-11e3-9f1b-52540035b04c
            bzzz Alex Zhuravlev added a comment - please, try with http://review.whamcloud.com/9218
            di.wang Di Wang added a comment -

            ah, just realize this is an upgrade test, i.e. the image is created b2_4, probably means we have to fix it on the client side. Andreas, Could we still add fix to b2_4, which will simplify things?

            di.wang Di Wang added a comment - ah, just realize this is an upgrade test, i.e. the image is created b2_4, probably means we have to fix it on the client side. Andreas, Could we still add fix to b2_4, which will simplify things?
            di.wang Di Wang added a comment -

            I just checked the debug log, though the failure happens after LU-3531. But the real reason is that recent changes seems messed the hash order when building the dir entry page. According to my investigation, it seems the first entry is ., whose hash value is 1, then some entry with real name, whose hash value is 0, then the following does follow the hash order. Because the new readdir code assume all enries are sorted in the page returned from MDT, and this unsorted entries make readdir(mdc_read_entry) skip some entries, which cause this problem. So we have two options,

            1. fix the server side, all entries page are sorted.
            2. Or mdc_read_entry does not assume the page are sorted, try to search the whole page to find the least bigger entry by given hash.

            I prefer to fix it by 1.

            di.wang Di Wang added a comment - I just checked the debug log, though the failure happens after LU-3531 . But the real reason is that recent changes seems messed the hash order when building the dir entry page. According to my investigation, it seems the first entry is ., whose hash value is 1, then some entry with real name, whose hash value is 0, then the following does follow the hash order. Because the new readdir code assume all enries are sorted in the page returned from MDT, and this unsorted entries make readdir(mdc_read_entry) skip some entries, which cause this problem. So we have two options, 1. fix the server side, all entries page are sorted. 2. Or mdc_read_entry does not assume the page are sorted, try to search the whole page to find the least bigger entry by given hash. I prefer to fix it by 1.
            sarah Sarah Liu added a comment -

            It turns out this issue is caused by LU-3531, I tried both the build before it lands(test passed) and the build include LU-3531(test failed)

            sarah Sarah Liu added a comment - It turns out this issue is caused by LU-3531 , I tried both the build before it lands(test passed) and the build include LU-3531 (test failed)
            sarah Sarah Liu added a comment -

            Finally I found in tag 2.5.54 this test failed, so the problem is introduced between tag-2.5.53 and 2.5.54. Now trying revert the suspicious patch and see what happens

            sarah Sarah Liu added a comment - Finally I found in tag 2.5.54 this test failed, so the problem is introduced between tag-2.5.53 and 2.5.54. Now trying revert the suspicious patch and see what happens
            sarah Sarah Liu added a comment -

            To isolate this problem, I did following things:

            1. Reverted patch from LU-3489, reset master to the version before LU-3489 lands and run the same test with the same 2.4-zfs image, it can pass; the patch link: http://review.whamcloud.com/#/c/9167

            2. Reset master to tag-2.5.52(which includes the patch from LU-3489) and with the fix of LU-4406, this test can pass too.

            3. Now trying to reset to tag-2.5.53; the patch link: http://review.whamcloud.com/#/c/9169

            sarah Sarah Liu added a comment - To isolate this problem, I did following things: 1. Reverted patch from LU-3489 , reset master to the version before LU-3489 lands and run the same test with the same 2.4-zfs image, it can pass; the patch link: http://review.whamcloud.com/#/c/9167 2. Reset master to tag-2.5.52(which includes the patch from LU-3489 ) and with the fix of LU-4406 , this test can pass too. 3. Now trying to reset to tag-2.5.53; the patch link: http://review.whamcloud.com/#/c/9169
            sarah Sarah Liu added a comment -

            The duplicate entries are because several files under different directories are using same names(symbol link). The original image was made by /etc/rc.d, and there are a lot same name files under that directory

            test_32newtarball() {
                    local version
                    local dst=.
                    local src=/etc/rc.d
                    local tmp=$TMP/t32_image_create
                    .......
            }
            

            The list from tarball

            ...
            ./rc0.d:
            total 100
            144115205255725667 lrwxrwxrwx 1 0 0 16 1381266339 K01smartd -> ../init.d/smartd
            144115205255725635 lrwxrwxrwx 1 0 0 13 1381266339 K05atd -> ../init.d/atd
            ....
            ./rc1.d:
            total 98
            144115205255725766 lrwxrwxrwx 1 0 0 16 1381266340 K01smartd -> ../init.d/smartd
            144115205255725731 lrwxrwxrwx 1 0 0 13 1381266340 K05atd -> ../init.d/atd
            .....
            

            I will check patches landed during this period to see if anyone may affect this problem.

            sarah Sarah Liu added a comment - The duplicate entries are because several files under different directories are using same names(symbol link). The original image was made by /etc/rc.d, and there are a lot same name files under that directory test_32newtarball() { local version local dst=. local src=/etc/rc.d local tmp=$TMP/t32_image_create ....... } The list from tarball ... ./rc0.d: total 100 144115205255725667 lrwxrwxrwx 1 0 0 16 1381266339 K01smartd -> ../init.d/smartd 144115205255725635 lrwxrwxrwx 1 0 0 13 1381266339 K05atd -> ../init.d/atd .... ./rc1.d: total 98 144115205255725766 lrwxrwxrwx 1 0 0 16 1381266340 K01smartd -> ../init.d/smartd 144115205255725731 lrwxrwxrwx 1 0 0 13 1381266340 K05atd -> ../init.d/atd ..... I will check patches landed during this period to see if anyone may affect this problem.

            People

              sarah Sarah Liu
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: