Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5747

NULL pointer dereference in task_rq_lock when running mds-survey

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Trivial
    • None
    • Lustre 2.7.0
    • None
    • 3
    • 16140

    Description

      I can reliably hit it when running mds-survey (master at de24d3e0fe4e77654358ed7d5d672fa94e957ef5 on 2.6.32-358.18.1.el6_lustre.x86_64):

      BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      IP: [<ffffffff81055d52>] task_rq_lock+0x42/0xa0
      PGD 341879067 PUD 3571d1067 PMD 0
      Oops: 0000 [#1] SMP
      last sysfs file: /sys/devices/system/cpu/online
      CPU 6
      Modules linked in: obdecho(U) osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) nodemap(U) mgc(U) osd_zfs(U) lquota(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic crc32c_intel libcfs(U) netconsole configfs ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support r8169 mii sg snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i7core_edac edac_core shpchp ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic pata_jmicron ahci nouveau ttm drm_kms_helper drm i2c_algo_bit i2c_core mxm_wmi video output wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      
      Pid: 2157, comm: lctl Tainted: P           ---------------    2.6.32-358.18.1.el6_lustre.x86_64 #1 OEM OEM/132-BL-E758
      RIP: 0010:[<ffffffff81055d52>]  [<ffffffff81055d52>] task_rq_lock+0x42/0xa0
      RSP: 0018:ffff88033d3f56d8  EFLAGS: 00010086
      RAX: 0000000000000286 RBX: 0000000000016740 RCX: ffff880351405378
      RDX: 0000000000000286 RSI: ffff88033d3f5730 RDI: 0000000000000000
      RBP: ffff88033d3f56f8 R08: 0000000000000002 R09: 5a5a5a5a5a5a5a5a
      R10: 5a5a5a5a5a5a5a5a R11: 5a5a5a5a5a5a5a5a R12: 0000000000000000
      R13: ffff88033d3f5730 R14: 0000000000000006 R15: 000000000000000f
      FS:  00007f7e0f644700(0000) GS:ffff880028380000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000008 CR3: 000000033cc78000 CR4: 00000000000007e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process lctl (pid: 2157, threadinfo ffff88033d3f4000, task ffff880357118aa0)
      Stack:
       0000000000000000 ffff8803447c74a0 0000000000000000 0000000000000006                                                                                                                                                                                           
      <d> ffff88033d3f5768 ffffffff8106306c ffff88033d3f5728 ffffffffa0f0e219
      <d> ffff88033ce5aa70 ffff88033c2bc1c8 ffff88033d3f57a8 0000000000000286
      Call Trace:
       [<ffffffff8106306c>] try_to_wake_up+0x3c/0x3e0
       [<ffffffffa0f0e219>] ? echo_object_free+0x159/0x2f0 [obdecho]
       [<ffffffff81063465>] wake_up_process+0x15/0x20
       [<ffffffff8150f7e4>] __mutex_unlock_slowpath+0x44/0x60
       [<ffffffff8150f79b>] mutex_unlock+0x1b/0x20
       [<ffffffffa07a4907>] lu_site_purge+0x3f7/0x4e0 [obdclass]
       [<ffffffffa07a4e31>] lu_object_limit+0x71/0x80 [obdclass]
       [<ffffffffa07a4f93>] lu_object_find_try+0x153/0x2b0 [obdclass]
       [<ffffffffa07a51a3>] lu_object_find_at+0xb3/0x100 [obdclass]
       [<ffffffffa0b5d6ca>] ? mdd_lookup+0x12a/0x170 [mdd]
       [<ffffffffa0f10013>] echo_md_create_internal+0x153/0x640 [obdecho]
       [<ffffffffa0f18af3>] echo_md_handler+0x1383/0x1930 [obdecho]
       [<ffffffffa0f1c84e>] echo_client_iocontrol+0x1bae/0x30f0 [obdecho]
       [<ffffffff81281826>] ? vsnprintf+0x336/0x5e0
       [<ffffffffa063d27b>] ? cfs_set_ptldebug_header+0x2b/0xc0 [libcfs]
       [<ffffffffa0753ed5>] ? obd_ioctl_getdata+0x145/0x1150 [obdclass]
       [<ffffffffa076c77c>] class_handle_ioctl+0x163c/0x21c0 [obdclass]
       [<ffffffffa07532ab>] obd_class_ioctl+0x4b/0x190 [obdclass]
       [<ffffffff81195352>] vfs_ioctl+0x22/0xa0
       [<ffffffff81511365>] ? page_fault+0x25/0x30
       [<ffffffff811954f4>] do_vfs_ioctl+0x84/0x580
       [<ffffffff81195a71>] sys_ioctl+0x81/0xa0
       [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      This is likely a bug in the Linux kernel:
      https://bugzilla.kernel.org/show_bug.cgi?id=27142

      The mutex in question was introduced by http://review.whamcloud.com/#/c/11099/

      Attachments

        Activity

          People

            isaac Isaac Huang (Inactive)
            isaac Isaac Huang (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: