Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1539

NULL-pointer dereference in lnet_match_md()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.3.0
    • None
    • A single-node cluster (i.e., an x86 VM with four logical CPUs and 512 MB memory, running 2.6.32-220.4.2.el6.78b37c2 and master at the 2.3.56 tag).
    • 3
    • 4268

    Description

      I had been running sanity 103 in a loop for a hour or so before hitting this:

      CentOS Linux release 6.0 (Final)
      Kernel 2.6.32-220.4.2.el6.78b37c2 on an i686
      
      h221g login: LustreError: 11-0: an error occurred while communicating with 0@lo. The mds_connect operation failed with -11
      LustreError: 3013:0:(mdc_request.c:1429:mdc_quotactl()) ptlrpc_queue_wait failed, rc: -114LustreError: 3606:0:(mdc_request.c:1429:mdc_quotactl()) ptlrpc_queue_wait failed, rc: -114
      LustreError: 6704:0:(mdc_request.c:1429:mdc_quotactl()) ptlrpc_queue_wait failed, rc: -114
      LustreError: 9790:0:(mdc_request.c:1429:mdc_quotactl()) ptlrpc_queue_wait failed, rc: -114
      LustreError: 12877:0:(mdc_request.c:1429:mdc_quotactl()) ptlrpc_queue_wait failed, rc: -114
      LustreError: 15965:0:(mdc_request.c:1429:mdc_quotactl()) ptlrpc_queue_wait failed, rc: -114
      LustreError: 2809:0:(ldlm_resource.c:1104:ldlm_resource_get()) lvbo_init failed for resource 2422: rc -2
      LustreError: 19052:0:(mdc_request.c:1429:mdc_quotactl()) ptlrpc_queue_wait failed, rc: -114
      LustreError: 22138:0:(mdc_request.c:1429:mdc_quotactl()) ptlrpc_queue_wait failed, rc: -114
      LustreError: 2808:0:(ldlm_resource.c:1104:ldlm_resource_get()) lvbo_init failed for resource 3311: rc -2
      LustreError: 2807:0:(ldlm_resource.c:1104:ldlm_resource_get()) lvbo_init failed for resource 4053: rc -2
      LustreError: 28312:0:(mdc_request.c:1429:mdc_quotactl()) ptlrpc_queue_wait failed, rc: -114
      LustreError: 28312:0:(mdc_request.c:1429:mdc_quotactl()) Skipped 1 previous similar message
      LustreError: 2808:0:(ldlm_resource.c:1104:ldlm_resource_get()) lvbo_init failed for resource 4255: rc -2
      LustreError: 8444:0:(mdc_request.c:1429:mdc_quotactl()) ptlrpc_queue_wait failed, rc: -114
      LustreError: 8444:0:(mdc_request.c:1429:mdc_quotactl()) Skipped 3 previous similar messages
      LustreError: 3017:0:(ldlm_resource.c:1104:ldlm_resource_get()) lvbo_init failed for resource 10087: rc -2
      LustreError: 671:0:(mdc_request.c:1429:mdc_quotactl()) ptlrpc_queue_wait failed, rc: -114
      LustreError: 671:0:(mdc_request.c:1429:mdc_quotactl()) Skipped 7 previous similar messages
      LustreError: 2807:0:(ldlm_resource.c:1104:ldlm_resource_get()) lvbo_init failed for resource 11031: rc -2
      LustreError: 2809:0:(ldlm_resource.c:1104:ldlm_resource_get()) lvbo_init failed for resource 11568: rc -2
      LustreError: 3017:0:(ldlm_resource.c:1104:ldlm_resource_get()) lvbo_init failed for resource 13501: rc -2
      LustreError: 3017:0:(ldlm_resource.c:1104:ldlm_resource_get()) Skipped 1 previous similar message
      LustreError: 3017:0:(ldlm_resource.c:1104:ldlm_resource_get()) lvbo_init failed for resource 15694: rc -2
      BUG: unable to handle kernel NULL pointer dereference at 00000002
      IP: [<e089cc8e>] lnet_match_md+0x25e/0x410 [lnet]
      *pdpt = 000000001258c001 *pde = 0000000000000000 
      Oops: 0000 [#1] SMP 
      last sysfs file: /sys/devices/system/cpu/possible
      Modules linked in: lustre(U) obdfilter(U) ost(U) osd_ldiskfs(U) cmm(U) fsfilt_ldiskfs(U) mdt(U) mdd(U) mds(U) mgs(U) lquota(U) ldiskfs(U) exportfs jbd mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) sunrpc ipv6 zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate dm_mirror dm_region_hash dm_log ppdev parport_pc parport sg i2c_piix4 i2c_core snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix ahci dm_mod [last unloaded: mperf]
      
      Pid: 6682, comm: rm Tainted: P        W  ----------------   2.6.32-220.4.2.el6.78b37c2 #1 innotek GmbH VirtualBox
      EIP: 0060:[<e089cc8e>] EFLAGS: 00010246 CPU: 2
      EIP is at lnet_match_md+0x25e/0x410 [lnet]
      EAX: dce57948 EBX: 0000000c ECX: 00000002 EDX: c9638700
      ESI: ca675f40 EDI: 00000001 EBP: c9dc36f0 ESP: c9dc36a8
       DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      
      Process rm (pid: 6682, ti=c9dc2000 task=cb486ab0 task.ti=c9dc2000)
      Stack:
       00000000 00090000 00003039 00000240 00000000 f3b57d0e 0004fdfe ca675f40
      <0> ca86e200 c9dc377c c9dc3778 dce57948 00000002 00090000 dce57940 ca86e200
      <0> 00003039 e08c5b38 c9dc378c e08a7977 00000000 00090000 00003039 00000240
      Call Trace:
       [<e08a7977>] ? lnet_parse+0xdd7/0x1f40 [lnet]
       [<c05fe095>] ? string+0x35/0xd0
       [<c05ff38a>] ? vsnprintf+0x2ea/0x3f0
       [<e08a92c6>] ? lolnd_send+0x36/0xb0 [lnet]
       [<e089f087>] ? lnet_ni_send+0x37/0xf0 [lnet]
       [<e08a300d>] ? lnet_send+0x3bd/0xe20 [lnet]
       [<e08a6052>] ? LNetPut+0x332/0xe80 [lnet]
       [<e18c32b0>] ? ptl_send_buf+0x2a0/0x940 [ptlrpc]
       [<e18c77a6>] ? ptl_send_rpc+0x686/0xfa0 [ptlrpc]
       [<e18b7a0e>] ? ptlrpc_send_new_req+0x50e/0xa70 [ptlrpc]
       [<e18be38e>] ? ptlrpc_set_wait+0x5ee/0xac0 [ptlrpc]
       [<e18b7fd9>] ? ptlrpc_set_add_req+0x69/0xe0 [ptlrpc]
       [<e18be8e8>] ? ptlrpc_queue_wait+0x88/0x330 [ptlrpc]
       [<e188fcd2>] ? ldlm_cli_enqueue+0x402/0x960 [ptlrpc]
       [<e0a8e712>] ? mdc_enter_request+0x22/0x190 [mdc]
       [<e18924c0>] ? ldlm_completion_ast+0x0/0xa10 [ptlrpc]
       [<e0e72750>] ? ll_md_blocking_ast+0x0/0x9a0 [lustre]
       [<e0a93917>] ? mdc_enqueue+0x327/0x1330 [mdc]
       [<e0a94b6c>] ? mdc_intent_lock+0x24c/0x7cc [mdc]
       [<e0e72750>] ? ll_md_blocking_ast+0x0/0x9a0 [lustre]
       [<e18924c0>] ? ldlm_completion_ast+0x0/0xa10 [ptlrpc]
       [<e0a94920>] ? mdc_intent_lock+0x0/0x7cc [mdc]
       [<e0b94625>] ? md_intent_lock+0x115/0x5c0 [lmv]
       [<e0e72750>] ? ll_md_blocking_ast+0x0/0x9a0 [lustre]
       [<e0e72750>] ? ll_md_blocking_ast+0x0/0x9a0 [lustre]
       [<e0b982f4>] ? lmv_intent_open+0x364/0x1330 [lmv]
       [<e0e72750>] ? ll_md_blocking_ast+0x0/0x9a0 [lustre]
       [<e0b995bb>] ? lmv_intent_lock+0x2fb/0x4a0 [lmv]
       [<e0e72750>] ? ll_md_blocking_ast+0x0/0x9a0 [lustre]
       [<e0e72750>] ? ll_md_blocking_ast+0x0/0x9a0 [lustre]
       [<e0b992c0>] ? lmv_intent_lock+0x0/0x4a0 [lmv]
       [<e0e1f64f>] ? md_intent_lock.clone.0+0xff/0x5b0 [lustre]
       [<e0e72750>] ? ll_md_blocking_ast+0x0/0x9a0 [lustre]
       [<e0e21119>] ? ll_revalidate_it+0x189/0x1140 [lustre]
       [<e0e7889c>] ? ll_convert_intent+0xdc/0x310 [lustre]
       [<e0e2220b>] ? ll_revalidate_nd+0x13b/0x510 [lustre]
       [<c0536db0>] ? do_lookup+0x40/0x180
       [<c05375a3>] ? __link_path_walk+0x5e3/0xd60
       [<c051cc9d>] ? kmem_cache_alloc+0xfd/0x110
       [<c0537f31>] ? path_walk+0x51/0xc0
       [<c05380b9>] ? do_path_lookup+0x59/0x90
       [<c0538e0c>] ? do_filp_open+0xdc/0xb00
       [<c05057d1>] ? handle_mm_fault+0x131/0x1d0
       [<c05286a8>] ? do_sys_open+0x58/0x130
       [<c04afebc>] ? audit_syscall_entry+0x21c/0x240
       [<c05287b1>] ? sys_openat+0x31/0x50
       [<c0409a9f>] ? sysenter_do_call+0x12/0x28
      Code: 01 74 22 b8 0c 28 8c e0 c7 05 1c 28 8c e0 00 00 04 00 e8 76 b3 f7 ff 8d b6 00 00 00 00 83 c1 08 e9 4f ff ff ff 8b 4d e8 8b 45 e4 <8b> 11 39 c1 74 0c 89 c8 89 55 e8 e9 53 ff ff ff 66 90 83 ff 02 
      EIP: [<e089cc8e>] lnet_match_md+0x25e/0x410 [lnet] SS:ESP 0068:c9dc36a8
      CR2: 0000000000000002
      

      The test was at

      [...]
      performing setfacl...
      [3] $ mkdir d -- ok
      [4] $ chown bin:bin d -- ok
      [5] $ cd d -- ok
      [7] $ su bin -- ok
      [8] $ sg bin -- [(1,0)(1 1 2 3,1 1 2 3)]ok
      [9] $ umask 027 -- ok
      [10] $ touch g -- ok
      [11] $ ls -dl g | awk '{print $1}' -- ok
      [14] $ setfacl -m m:- g -- ok
      [15] $ ls -dl g | awk '{print $1}' -- ok
      [18] $ getfacl g -- ok
      [28] $ setfacl -x m g -- ok
      [29] $ getfacl g -- ok
      [38] $ setfacl -m u:daemon:rw g -- ok
      [39] $ getfacl g -- ok
      [50] $ setfacl -m u::rwx,g::r-x,o:- g -- ok
      [51] $ getfacl g -- ok
      [62] $ setfacl -m u::rwx,g::r-x,o:-,m:- g -- ok
      [63] $ getfacl g -- ok
      [74] $ setfacl -m u::rwx,g::r-x,o:-,u:root:-,m:- g -- ok
      [75] $ getfacl g -- ok
      [87] $ setfacl -m u::rwx,g::r-x,o:-,u:root:-,m:- g -- ok
      [88] $ getfacl g -- ok
      [100] $ setfacl -m u::rwx,g::r-x,o:-,u:root:- g -- ok
      [101] $ getfacl g -- ok
      [113] $ setfacl --test -x u: g -- ok
      [116] $ setfacl --test -x u:x -- ok
      [119] $ setfacl -m d:u:root:rwx g -- ok
      [122] $ setfacl -x m g -- ok
      [129] $ mkdir d -- ok
      [130] $ setfacl --test -m u::rwx,u:bin:rwx,g::r-x,o::--- d -- ok
      [133] $ setfacl --test -m u::rwx,u:bin:rwx,g::r-x,m::---,o::--- d -- ok
      [136] $ setfacl --test -d -m u::rwx,u:bin:rwx,g::r-x,o::--- d -- ok
      [139] $ setfacl --test -d -m u::rwx,u:bin:rwx,g::r-x,m::---,o::--- d -- ok
      [142] $ su -- ok
      [143] $ cd .. -- ok
      [144] $ rm -r d -- 
      

      Attachments

        Activity

          People

            liwei Li Wei (Inactive)
            liwei Li Wei (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: