Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3201

lmv_locate_mds() must check return of lmv_find_target()

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 3
    • 7821

    Description

      lmv_locate_mds() calls lmv_find_target() and dereferences the result but does not check for an ERR_PTR().

      Attachments

        Activity

          [LU-3201] lmv_locate_mds() must check return of lmv_find_target()
          jhammond John Hammond added a comment -

          Patch landed to master.

          jhammond John Hammond added a comment - Patch landed to master.
          jhammond John Hammond added a comment -

          Should have mentioned before but this causes an easily reproducible Oops running racer with DNE:

          # MDSCOUNT=2 MOUNT_2=y llmount.sh
          # sh ./lustre/tests/racer.sh
          
          Lustre: DEBUG MARKER: == racer test 1: racer on clients: m DURATION=300 == 09:04:48 (1366725888)
          Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000280000400-0x00000002c0000400):0:mdt
          Lustre: cli-ctl-lustre-MDT0000-osp-MDT0001: Allocated super-sequence [0x00000002c0000400-0x0000000300000400):1:mdt]
          LustreError: 18097:0:(fld_handler.c:158:fld_server_lookup()) srv-lustre-MDT0000: FLD cache range [0x0000000280000400-0x00000002c0000400):0:mdt does not matchrequested flag ffff8801: rc = -5
          LustreError: 19895:0:(lmv_fld.c:78:lmv_fld_lookup()) Error while looking for mds number. Seq 0x280000400, err = -5
          BUG: unable to handle kernel NULL pointer dereference at 000000000000002b
          IP: [<ffffffffa0578c72>] lmv_locate_mds+0x92/0xb0 [lmv]
          PGD 16eb72067 PUD 17abf1067 PMD 0 
          Oops: 0000 [#1] SMP 
          last sysfs file: /sys/devices/system/cpu/possible
          CPU 1 
          Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) osd_ldiskfs(U) fsfilt_ldiskfs(U) ldiskfs(U) mdd(U) mgs(U) lquota(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) exportfs jbd sha512_generic sha256_generic autofs4 nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
          
          Pid: 19895, comm: mv Tainted: P           ---------------    2.6.32-279.19.1.el6_lustre_gcov.x86_64 #1 Bochs Bochs
          RIP: 0010:[<ffffffffa0578c72>]  [<ffffffffa0578c72>] lmv_locate_mds+0x92/0xb0 [lmv]
          RSP: 0018:ffff88017ea459d8  EFLAGS: 00010282
          RAX: fffffffffffffffb RBX: ffff88019931e800 RCX: 0000000000000000
          RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88018ce1b940
          RBP: ffff88017ea459f8 R08: ffffffff81c01b00 R09: 0000000000000000
          R10: 0000000000000001 R11: 0000000000000000 R12: ffff88016babe600
          R13: ffff88017d5a6c00 R14: 0000000000000001 R15: ffff88014e83a4c0
          FS:  00007f2a796d07a0(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
          CR2: 000000000000002b CR3: 000000016eb6f000 CR4: 00000000000006e0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
          Process mv (pid: 19895, threadinfo ffff88017ea44000, task ffff88017ea43500)
          Stack:
           0000000000000095 0000000000000080 ffff88016babe600 ffff88019931e2f8
          <d> ffff88017ea45a78 ffffffffa058cdc9 ffffffffffffffff ffffffffa0aa727b
          <d> 0000000051769500 ffff88018ce1b8c0 ffff880100000030 ffff88017ea45b28
          Call Trace:
           [<ffffffffa058cdc9>] lmv_intent_lookup+0x59/0x770 [lmv]
           [<ffffffffa0aa727b>] ? cfs_set_ptldebug_header+0x2b/0xc0 [libcfs]
           [<ffffffffa058e0ba>] lmv_intent_lock+0x31a/0x370 [lmv]
           [<ffffffffa0ee88d0>] ? ll_md_blocking_ast+0x0/0x750 [lustre]
           [<ffffffffa0ee799e>] ? ll_i2gids+0x2e/0xd0 [lustre]
           [<ffffffffa0ece3ca>] ? ll_prep_md_op_data+0xfa/0x3a0 [lustre]
           [<ffffffffa0eecf21>] ll_lookup_it+0x3a1/0xbf0 [lustre]
           [<ffffffffa0ee88d0>] ? ll_md_blocking_ast+0x0/0x750 [lustre]
           [<ffffffffa0eed7fc>] ll_lookup_nd+0x8c/0x430 [lustre]
           [<ffffffff81190087>] ? d_alloc+0x137/0x1b0
           [<ffffffff81185d45>] do_lookup+0x1a5/0x230
           [<ffffffff81186604>] __link_path_walk+0x734/0x1030
           [<ffffffff8113c307>] ? handle_pte_fault+0xf7/0xb50
           [<ffffffff8118718a>] path_walk+0x6a/0xe0
           [<ffffffff8118735b>] do_path_lookup+0x5b/0xa0
           [<ffffffff81187fc7>] user_path_at+0x57/0xa0
           [<ffffffff8126a2e1>] ? cpumask_any_but+0x31/0x50
           [<ffffffff8117cccc>] vfs_fstatat+0x3c/0x80
           [<ffffffff8117ce3b>] vfs_stat+0x1b/0x20
           [<ffffffff8117ce64>] sys_newstat+0x24/0x50
           [<ffffffff810d3d27>] ? audit_syscall_entry+0x1d7/0x200
           [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
          Code: 01 48 83 c2 08 39 cf 7f e8 8b 34 25 30 00 00 00 31 c0 41 89 74 24 40 48 83 c4 10 5b 41 5c 
          LustreError: 18096:0:(fld_handler.c:158:fld_server_lookup()) srv-lustre-MDT0000: FLD cache range [0x0000000280000400-0x00000002c0000400):0:mdt does not matchrequested flag ffff8801: rc = -5
          c9 c3 66 0f 1f 84 00 00 00 00 00 48 98 <8b> 70 30 41 89 74 24 40 48 83 c4 10 5b 41 5c c9 c3 0f 1f 44 00 
          RIP  [<ffffffffa0578c72>] lmv_locate_mds+0x92/0xb0 [lmv]
           RSP <ffff88017ea459d8>
          CR2: 000000000000002b
          
          jhammond John Hammond added a comment - Should have mentioned before but this causes an easily reproducible Oops running racer with DNE: # MDSCOUNT=2 MOUNT_2=y llmount.sh # sh ./lustre/tests/racer.sh Lustre: DEBUG MARKER: == racer test 1: racer on clients: m DURATION=300 == 09:04:48 (1366725888) Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000280000400-0x00000002c0000400):0:mdt Lustre: cli-ctl-lustre-MDT0000-osp-MDT0001: Allocated super-sequence [0x00000002c0000400-0x0000000300000400):1:mdt] LustreError: 18097:0:(fld_handler.c:158:fld_server_lookup()) srv-lustre-MDT0000: FLD cache range [0x0000000280000400-0x00000002c0000400):0:mdt does not matchrequested flag ffff8801: rc = -5 LustreError: 19895:0:(lmv_fld.c:78:lmv_fld_lookup()) Error while looking for mds number. Seq 0x280000400, err = -5 BUG: unable to handle kernel NULL pointer dereference at 000000000000002b IP: [<ffffffffa0578c72>] lmv_locate_mds+0x92/0xb0 [lmv] PGD 16eb72067 PUD 17abf1067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/possible CPU 1 Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) osd_ldiskfs(U) fsfilt_ldiskfs(U) ldiskfs(U) mdd(U) mgs(U) lquota(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) exportfs jbd sha512_generic sha256_generic autofs4 nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs] Pid: 19895, comm: mv Tainted: P --------------- 2.6.32-279.19.1.el6_lustre_gcov.x86_64 #1 Bochs Bochs RIP: 0010:[<ffffffffa0578c72>] [<ffffffffa0578c72>] lmv_locate_mds+0x92/0xb0 [lmv] RSP: 0018:ffff88017ea459d8 EFLAGS: 00010282 RAX: fffffffffffffffb RBX: ffff88019931e800 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88018ce1b940 RBP: ffff88017ea459f8 R08: ffffffff81c01b00 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88016babe600 R13: ffff88017d5a6c00 R14: 0000000000000001 R15: ffff88014e83a4c0 FS: 00007f2a796d07a0(0000) GS:ffff880028300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000000002b CR3: 000000016eb6f000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process mv (pid: 19895, threadinfo ffff88017ea44000, task ffff88017ea43500) Stack: 0000000000000095 0000000000000080 ffff88016babe600 ffff88019931e2f8 <d> ffff88017ea45a78 ffffffffa058cdc9 ffffffffffffffff ffffffffa0aa727b <d> 0000000051769500 ffff88018ce1b8c0 ffff880100000030 ffff88017ea45b28 Call Trace: [<ffffffffa058cdc9>] lmv_intent_lookup+0x59/0x770 [lmv] [<ffffffffa0aa727b>] ? cfs_set_ptldebug_header+0x2b/0xc0 [libcfs] [<ffffffffa058e0ba>] lmv_intent_lock+0x31a/0x370 [lmv] [<ffffffffa0ee88d0>] ? ll_md_blocking_ast+0x0/0x750 [lustre] [<ffffffffa0ee799e>] ? ll_i2gids+0x2e/0xd0 [lustre] [<ffffffffa0ece3ca>] ? ll_prep_md_op_data+0xfa/0x3a0 [lustre] [<ffffffffa0eecf21>] ll_lookup_it+0x3a1/0xbf0 [lustre] [<ffffffffa0ee88d0>] ? ll_md_blocking_ast+0x0/0x750 [lustre] [<ffffffffa0eed7fc>] ll_lookup_nd+0x8c/0x430 [lustre] [<ffffffff81190087>] ? d_alloc+0x137/0x1b0 [<ffffffff81185d45>] do_lookup+0x1a5/0x230 [<ffffffff81186604>] __link_path_walk+0x734/0x1030 [<ffffffff8113c307>] ? handle_pte_fault+0xf7/0xb50 [<ffffffff8118718a>] path_walk+0x6a/0xe0 [<ffffffff8118735b>] do_path_lookup+0x5b/0xa0 [<ffffffff81187fc7>] user_path_at+0x57/0xa0 [<ffffffff8126a2e1>] ? cpumask_any_but+0x31/0x50 [<ffffffff8117cccc>] vfs_fstatat+0x3c/0x80 [<ffffffff8117ce3b>] vfs_stat+0x1b/0x20 [<ffffffff8117ce64>] sys_newstat+0x24/0x50 [<ffffffff810d3d27>] ? audit_syscall_entry+0x1d7/0x200 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Code: 01 48 83 c2 08 39 cf 7f e8 8b 34 25 30 00 00 00 31 c0 41 89 74 24 40 48 83 c4 10 5b 41 5c LustreError: 18096:0:(fld_handler.c:158:fld_server_lookup()) srv-lustre-MDT0000: FLD cache range [0x0000000280000400-0x00000002c0000400):0:mdt does not matchrequested flag ffff8801: rc = -5 c9 c3 66 0f 1f 84 00 00 00 00 00 48 98 <8b> 70 30 41 89 74 24 40 48 83 c4 10 5b 41 5c c9 c3 0f 1f 44 00 RIP [<ffffffffa0578c72>] lmv_locate_mds+0x92/0xb0 [lmv] RSP <ffff88017ea459d8> CR2: 000000000000002b
          jhammond John Hammond added a comment - Please see http://review.whamcloud.com/6116 .

          People

            jhammond John Hammond
            jhammond John Hammond
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: