Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3228

fc18: sanity test_183: @@@@@@ FAIL: test_183 failed with 1

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0, Lustre 2.4.1
    • 3
    • 7889

    Description

      == sanity test 183: No crash or request leak in case of strange dispositions ========================= 20:23:20 (1366860200)
      CMD: client-24vm3 /usr/sbin/lctl set_param fail_loc=0x148
      fail_loc=0x148
      ls: cannot open directory /mnt/lustre/d0.sanity/d183: No such file or directory
      cat: /mnt/lustre/d0.sanity/d183/f.sanity.183: No such file or directory
      CMD: client-24vm3 /usr/sbin/lctl set_param fail_loc=0
      fail_loc=0
      touch: cannot touch ‘/mnt/lustre/d0.sanity/d183/f.sanity.183’: No such file or directory
      rm: cannot remove ‘/mnt/lustre/d0.sanity/d183’: Directory not empty
      sanity test_183: @@@@@@ FAIL: test_183 failed with 1
      Trace dump:
      = /usr/lib64/lustre/tests/test-framework.sh:4024:error_noexit()
      = /usr/lib64/lustre/tests/test-framework.sh:4047:error()
      = /usr/lib64/lustre/tests/test-framework.sh:4301:run_one()
      = /usr/lib64/lustre/tests/test-framework.sh:4334:run_one_logged()
      = /usr/lib64/lustre/tests/test-framework.sh:4189:run_test()
      = /usr/lib64/lustre/tests/sanity.sh:9626:main()
      Dumping lctl log to /logdir/test_logs/2013-04-24/lustre-reviews-el6-x86_64-vs-lustre-reviews-fc18-x86_64-review-2_4_1_15074_-70194495514420-185046/sanity.test_183.*.1366860201.log
      CMD: client-24vm1,client-24vm2.lab.whamcloud.com,client-24vm3,client-24vm4 /usr/sbin/lctl dk > /logdir/test_logs/2013-04-24/lustre-reviews-el6-x86_64-vs-lustre-reviews-fc18-x86_64-review-2_4_1_15074_-70194495514420-185046/sanity.test_183.debug_log.$(hostname -s).1366860201.log;
      dmesg > /logdir/test_logs/2013-04-24/lustre-reviews-el6-x86_64-vs-lustre-reviews-fc18-x86_64-review-2_4_1_15074_-70194495514420-185046/sanity.test_183.dmesg.$(hostname -s).1366860201.log

      https://maloo.whamcloud.com/test_sets/83b66856-ad8e-11e2-bbea-52540035b04c

      Attachments

        Issue Links

          Activity

            [LU-3228] fc18: sanity test_183: @@@@@@ FAIL: test_183 failed with 1
            green Oleg Drokin added a comment -

            I verified that the patch in http://review.whamcloud.com/8110 fixes the issue.

            green Oleg Drokin added a comment - I verified that the patch in http://review.whamcloud.com/8110 fixes the issue.
            green Oleg Drokin added a comment -

            Apparently this was fixed by http://review.whamcloud.com/8110

            green Oleg Drokin added a comment - Apparently this was fixed by http://review.whamcloud.com/8110
            green Oleg Drokin added a comment -

            Apparently this bug was introduced by commit 784cd144103871bd421c139c09bfbf4d5d29ca08 coming from patch http://review.whamcloud.com/4387 to support atomic_open.

            The problematic hunk is this:

            @@ -438,13 +443,21 @@ int ll_lookup_it_finish(struct ptlrpc_request *request,
                                Also see bug 7198. */
                    }
             
            -       *de = ll_splice_alias(inode, *de);
            +       /* Only hash *de if it is unhashed (new dentry).
            +        * Atoimc_open may passin hashed dentries for open.
            +        */
            +       if (d_unhashed(*de))
            +               *de = ll_splice_alias(inode, *de);
            

            When this hits (as in, dentry IS hashed), two things happen:
            1. we leak inode (and don't assign it to dentry, obviously)
            2. we return NULL from ll_lookup_it, so it uses passed in dentry, but it is negative, and leads to that no_open branch, where we pass in de, that's NULL (probably should pass in dentry at all times?).

            green Oleg Drokin added a comment - Apparently this bug was introduced by commit 784cd144103871bd421c139c09bfbf4d5d29ca08 coming from patch http://review.whamcloud.com/4387 to support atomic_open. The problematic hunk is this: @@ -438,13 +443,21 @@ int ll_lookup_it_finish(struct ptlrpc_request *request, Also see bug 7198. */ } - *de = ll_splice_alias(inode, *de); + /* Only hash *de if it is unhashed (new dentry). + * Atoimc_open may passin hashed dentries for open. + */ + if (d_unhashed(*de)) + *de = ll_splice_alias(inode, *de); When this hits (as in, dentry IS hashed), two things happen: 1. we leak inode (and don't assign it to dentry, obviously) 2. we return NULL from ll_lookup_it, so it uses passed in dentry, but it is negative, and leads to that no_open branch, where we pass in de, that's NULL (probably should pass in dentry at all times?).
            green Oleg Drokin added a comment -

            After failing like this, next unmount, that happens to be in test 223 crashes with reference count assertion.
            This happens to be LU-1480.

            This bug is also observed in mainline kernel.

            green Oleg Drokin added a comment - After failing like this, next unmount, that happens to be in test 223 crashes with reference count assertion. This happens to be LU-1480 . This bug is also observed in mainline kernel.
            yujian Jian Yu added a comment -

            Lustre build: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1)
            Distro/Arch: RHEL6.4/x86_64 + FC18/x86_64 (Server + Client)

            sanity test 183 hit the same failure:
            https://maloo.whamcloud.com/test_sets/0cbde1d0-14ee-11e3-ac48-52540035b04c

            yujian Jian Yu added a comment - Lustre build: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1) Distro/Arch: RHEL6.4/x86_64 + FC18/x86_64 (Server + Client) sanity test 183 hit the same failure: https://maloo.whamcloud.com/test_sets/0cbde1d0-14ee-11e3-ac48-52540035b04c
            yujian Jian Yu added a comment -

            Lustre Branch: b2_4
            Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/27/
            Distro/Arch: RHEL6.4/x86_64 + FC18/x86_64 (Server + Client)

            The failure occurred regularly on Lustre b2_4 branch on FC18 client:
            https://maloo.whamcloud.com/test_sets/996b2278-fd79-11e2-9fdb-52540035b04c

            yujian Jian Yu added a comment - Lustre Branch: b2_4 Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/27/ Distro/Arch: RHEL6.4/x86_64 + FC18/x86_64 (Server + Client) The failure occurred regularly on Lustre b2_4 branch on FC18 client: https://maloo.whamcloud.com/test_sets/996b2278-fd79-11e2-9fdb-52540035b04c
            yujian Jian Yu added a comment -

            Lustre Branch: master (tag 2.4.50)
            Lustre Build: http://build.whamcloud.com/job/lustre-master/1502/

            The sanity test 183 was added for LU-2275 in http://review.whamcloud.com/4458.

            It passed on RHEL6.4 client:
            https://maloo.whamcloud.com/test_sets/456be8de-c2db-11e2-b2eb-52540035b04c

            but failed on FC18 client:
            https://maloo.whamcloud.com/test_sets/a6a8e2d4-c2d4-11e2-b2eb-52540035b04c

            After fail_loc was set to 0 on MDS, the touch operation failed on client node:

            fail_loc=0
            touch: cannot touch '/mnt/lustre/d0.sanity/d183/f.sanity.183': No such file or directory
            

            Debug log on client node showed that:

            00000001:00000001:0.0:1369222356.453337:0:8789:0:(debug.c:444:libcfs_debug_mark_buffer()) ***************************************************
            00000001:02000400:0.0:1369222356.453339:0:8789:0:(debug.c:445:libcfs_debug_mark_buffer()) DEBUG MARKER: fail_loc=0
            00000001:00000001:0.0:1369222356.453345:0:8789:0:(debug.c:446:libcfs_debug_mark_buffer()) ***************************************************
            ......
            00000080:00000001:0.0:1369222356.699798:0:8803:0:(dcache.c:96:ll_dcompare()) Process entered
            00000080:00002000:0.0:1369222356.699799:0:8803:0:(dcache.c:119:ll_dcompare()) found name f.sanity.183(ffff88007a14f540) flags 0xce refc 0
            00000080:00000001:0.0:1369222356.699800:0:8803:0:(dcache.c:128:ll_dcompare()) Process leaving (rc=0 : 0 : 0)
            00000080:00000001:0.0:1369222356.699801:0:8803:0:(dcache.c:669:ll_revalidate_nd()) Process entered
            00000080:00200000:0.0:1369222356.699801:0:8803:0:(dcache.c:671:ll_revalidate_nd()) VFS Op:name=f.sanity.183,flags=769
            00000080:00000001:0.0:1369222356.699802:0:8803:0:(dcache.c:684:ll_revalidate_nd()) Process leaving (rc=1 : 1 : 1)
            00000080:00000001:0.0:1369222356.699803:0:8803:0:(file.c:2978:ll_inode_permission()) Process entered
            00000080:00200000:0.0:1369222356.699803:0:8803:0:(file.c:3001:ll_inode_permission()) VFS Op:inode=144115205255725060/33554436(ffff88007b35c9f8), inode mode 41ed mask 3
            00000080:00000001:0.0:1369222356.699804:0:8803:0:(file.c:3009:ll_inode_permission()) Process leaving (rc=0 : 0 : 0)
            00000080:00000001:0.0:1369222356.699806:0:8803:0:(namei.c:622:ll_atomic_open()) Process entered
            00000080:00200000:0.0:1369222356.699807:0:8803:0:(namei.c:627:ll_atomic_open()) VFS Op:name=f.sanity.183,dir=144115205255725060/33554436(ffff88007b35c9f8),file ffff88007adf5400,open_flags 8941,mode 81b6 opened 0
            00000080:00000010:0.0:1369222356.699809:0:8803:0:(namei.c:629:ll_atomic_open()) kmalloced 'it': 72 at ffff88007a8bb420.
            00000080:00000001:0.0:1369222356.699811:0:8803:0:(namei.c:496:ll_lookup_it()) Process entered
            00000080:00200000:0.0:1369222356.699811:0:8803:0:(namei.c:503:ll_lookup_it()) VFS Op:name=f.sanity.183,dir=144115205255725060/33554436(ffff88007b35c9f8),intent=open|creat
            ......
            00000080:00000001:0.0:1369222356.702127:0:8803:0:(namei.c:568:ll_lookup_it()) Process leaving via out (rc=0 : 0 : 0x0)
            00000100:00000001:0.0:1369222356.702128:0:8803:0:(client.c:2271:__ptlrpc_req_finished()) Process entered
            00000100:00000040:0.0:1369222356.702130:0:8803:0:(client.c:2283:__ptlrpc_req_finished()) @@@ refcount now 2 req@ffff88007b35b400 x1435733615182304/t4294967313(4294967313) o101->lustre-MDT0000-mdc-ffff88007c342000@10.10.4.208@tcp:12/10 lens 568/600 e 0 to 0 dl 1369222363 ref 3 fl Complete:RP/4/0 rc 0/0
            00000100:00000001:0.0:1369222356.702133:0:8803:0:(client.c:2290:__ptlrpc_req_finished()) Process leaving (rc=0 : 0 : 0)
            00000080:00000001:0.0:1369222356.702135:0:8803:0:(dcache.c:289:ll_intent_release()) Process entered
            ......
            00000080:00000001:0.0:1369222356.702158:0:8803:0:(dcache.c:304:ll_intent_release()) Process leaving
            00000080:00000010:0.0:1369222356.702158:0:8803:0:(namei.c:688:ll_atomic_open()) kfreed 'it': 72 at ffff88007a8bb420.
            00000080:00000001:0.0:1369222356.702168:0:8803:0:(namei.c:690:ll_atomic_open()) Process leaving (rc=1 : 1 : 1)

            The ll_atomic_open() failed with 1, which was returned from finish_no_open(). I'm still digging.

            yujian Jian Yu added a comment - Lustre Branch: master (tag 2.4.50) Lustre Build: http://build.whamcloud.com/job/lustre-master/1502/ The sanity test 183 was added for LU-2275 in http://review.whamcloud.com/4458 . It passed on RHEL6.4 client: https://maloo.whamcloud.com/test_sets/456be8de-c2db-11e2-b2eb-52540035b04c but failed on FC18 client: https://maloo.whamcloud.com/test_sets/a6a8e2d4-c2d4-11e2-b2eb-52540035b04c After fail_loc was set to 0 on MDS, the touch operation failed on client node: fail_loc=0 touch: cannot touch '/mnt/lustre/d0.sanity/d183/f.sanity.183': No such file or directory Debug log on client node showed that: 00000001:00000001:0.0:1369222356.453337:0:8789:0:(debug.c:444:libcfs_debug_mark_buffer()) *************************************************** 00000001:02000400:0.0:1369222356.453339:0:8789:0:(debug.c:445:libcfs_debug_mark_buffer()) DEBUG MARKER: fail_loc=0 00000001:00000001:0.0:1369222356.453345:0:8789:0:(debug.c:446:libcfs_debug_mark_buffer()) *************************************************** ...... 00000080:00000001:0.0:1369222356.699798:0:8803:0:(dcache.c:96:ll_dcompare()) Process entered 00000080:00002000:0.0:1369222356.699799:0:8803:0:(dcache.c:119:ll_dcompare()) found name f.sanity.183(ffff88007a14f540) flags 0xce refc 0 00000080:00000001:0.0:1369222356.699800:0:8803:0:(dcache.c:128:ll_dcompare()) Process leaving (rc=0 : 0 : 0) 00000080:00000001:0.0:1369222356.699801:0:8803:0:(dcache.c:669:ll_revalidate_nd()) Process entered 00000080:00200000:0.0:1369222356.699801:0:8803:0:(dcache.c:671:ll_revalidate_nd()) VFS Op:name=f.sanity.183,flags=769 00000080:00000001:0.0:1369222356.699802:0:8803:0:(dcache.c:684:ll_revalidate_nd()) Process leaving (rc=1 : 1 : 1) 00000080:00000001:0.0:1369222356.699803:0:8803:0:(file.c:2978:ll_inode_permission()) Process entered 00000080:00200000:0.0:1369222356.699803:0:8803:0:(file.c:3001:ll_inode_permission()) VFS Op:inode=144115205255725060/33554436(ffff88007b35c9f8), inode mode 41ed mask 3 00000080:00000001:0.0:1369222356.699804:0:8803:0:(file.c:3009:ll_inode_permission()) Process leaving (rc=0 : 0 : 0) 00000080:00000001:0.0:1369222356.699806:0:8803:0:(namei.c:622:ll_atomic_open()) Process entered 00000080:00200000:0.0:1369222356.699807:0:8803:0:(namei.c:627:ll_atomic_open()) VFS Op:name=f.sanity.183,dir=144115205255725060/33554436(ffff88007b35c9f8),file ffff88007adf5400,open_flags 8941,mode 81b6 opened 0 00000080:00000010:0.0:1369222356.699809:0:8803:0:(namei.c:629:ll_atomic_open()) kmalloced 'it': 72 at ffff88007a8bb420. 00000080:00000001:0.0:1369222356.699811:0:8803:0:(namei.c:496:ll_lookup_it()) Process entered 00000080:00200000:0.0:1369222356.699811:0:8803:0:(namei.c:503:ll_lookup_it()) VFS Op:name=f.sanity.183,dir=144115205255725060/33554436(ffff88007b35c9f8),intent=open|creat ...... 00000080:00000001:0.0:1369222356.702127:0:8803:0:(namei.c:568:ll_lookup_it()) Process leaving via out (rc=0 : 0 : 0x0) 00000100:00000001:0.0:1369222356.702128:0:8803:0:(client.c:2271:__ptlrpc_req_finished()) Process entered 00000100:00000040:0.0:1369222356.702130:0:8803:0:(client.c:2283:__ptlrpc_req_finished()) @@@ refcount now 2 req@ffff88007b35b400 x1435733615182304/t4294967313(4294967313) o101->lustre-MDT0000-mdc-ffff88007c342000@10.10.4.208@tcp:12/10 lens 568/600 e 0 to 0 dl 1369222363 ref 3 fl Complete:RP/4/0 rc 0/0 00000100:00000001:0.0:1369222356.702133:0:8803:0:(client.c:2290:__ptlrpc_req_finished()) Process leaving (rc=0 : 0 : 0) 00000080:00000001:0.0:1369222356.702135:0:8803:0:(dcache.c:289:ll_intent_release()) Process entered ...... 00000080:00000001:0.0:1369222356.702158:0:8803:0:(dcache.c:304:ll_intent_release()) Process leaving 00000080:00000010:0.0:1369222356.702158:0:8803:0:(namei.c:688:ll_atomic_open()) kfreed 'it': 72 at ffff88007a8bb420. 00000080:00000001:0.0:1369222356.702168:0:8803:0:(namei.c:690:ll_atomic_open()) Process leaving (rc=1 : 1 : 1) The ll_atomic_open() failed with 1, which was returned from finish_no_open(). I'm still digging.
            pjones Peter Jones added a comment -

            Yu, Jian

            Could you please look into this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Yu, Jian Could you please look into this one? Thanks Peter

            People

              yujian Jian Yu
              mdiep Minh Diep
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: