Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4728

NULL pointer dereference in ldlm_cli_enqueue_local when enabling hsm_control after LU-4727 happends

    XMLWordPrintable

Details

    • 3
    • 13002

    Description

      After LU-4727 happens, I tries to restart HSM. However, kernel crashes every time when I do that.

      1. ... The hang happens
        [root@server1 x86_64]# lhsmtool_posix --hsm-root /tmp/archive --archive=5 /mnt/lustre
        lhsmtool_posix[29899]: action=0 src=(null) dst=(null) mount_point=/mnt/lustre
        cannot start copytool on '/mnt/lustre': No such device or address (6)
        lhsmtool_posix[29899]: cannot start copytool interface: No such device or address (6)
        lhsmtool_posix[29899]: process finished, errs: 0 major, 0 minor, rc=-6 (No such device or address)
        [root@server1 x86_64]# lctl set_param mdt.server1-MDT0000.hsm_control=enabled
        mdt.server1-MDT0000.hsm_control=enabled
      2. ... The crash happens
        Lustre: HSM coordinator thread is not running - denying agent registration.
        LustreError: 29899:0:(lmv_obd.c:977:lmv_hsm_ct_register()) server1-clilmv-ffff880867a3ec00: iocontrol MDC server1-MDT0000_UUID on MDT idx 0 cmd 401866d5: err = -6
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
        IP: [<ffffffffa141ffc9>] ldlm_cli_enqueue_local+0x159/0x5e0 [ptlrpc]
        PGD 77c707067 PUD 86737b067 PMD 0
        Oops: 0000 1 SMP
        last sysfs file: /sys/kernel/mm/ksm/run
        CPU 7
        Modules linked in: osc ofd ost osp mdd lod mdt lfsck mgs mgc nodemap osd_ldiskfs lquota ldiskfs lustre lov mdc fid lmv fld ksocklnd ptlrpc obdclass lnet libcfs sha512_generic sha256_generic crc32c_intel zlib_deflate ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc acpi_cpufreq freq_table mperf bridge stp llc ipv6 ext4 jbd2 vhost_net macvtap macvlan tun kvm_intel kvm acpi_pad igb microcode serio_raw sb_edac edac_core i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support ioatdma dca shpchp ext3 jbd mbcache sd_mod crc_t10dif isci libsas scsi_transport_sas ahci wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]

      Pid: 29909, comm: lctl Tainted: G W --------------- 2.6.32 #13 Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F
      RIP: 0010:[<ffffffffa141ffc9>] [<ffffffffa141ffc9>] ldlm_cli_enqueue_local+0x159/0x5e0 [ptlrpc]
      RSP: 0018:ffff8808326ed818 EFLAGS: 00010202
      RAX: 0000000000000010 RBX: ffff88081cec5b38 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880856f4f780
      RBP: ffff8808326ed888 R08: 00000000fffffffb R09: 00000000fffffffe
      R10: 0000000000000000 R11: 000000000000000f R12: ffff8808326ed900
      R13: ffff880830d7ca00 R14: ffff88081cec5b68 R15: 0000000000000001
      FS: 00007f2439fae700(0000) GS:ffff8800283c0000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000010 CR3: 000000076e864000 CR4: 00000000000406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process lctl (pid: 29909, threadinfo ffff8808326ec000, task ffff880853a58aa0)
      Stack:
      0000000000000000 ffff880800000000 ffff8808326ed888 0000000da11ba731
      <d> ffffffffa1420450 ffffffffa1a1ac20 0000000000000000 ffff8808527281c0
      <d> ffff880855ab21b0 ffff88077c646780 ffff88081cec5800 ffff880855ab21b0
      Call Trace:
      [<ffffffffa1420450>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc]
      [<ffffffffa1a1ac20>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [<ffffffffa1a22109>] mdt_object_lock0+0x339/0xaf0 [mdt]
      [<ffffffffa1a1ac20>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [<ffffffffa1420450>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc]
      [<ffffffffa1a22984>] mdt_object_lock+0x14/0x20 [mdt]
      [<ffffffffa1a22b51>] mdt_object_find_lock+0x61/0x170 [mdt]
      [<ffffffffa1a70f43>] hsm_restore_cb+0x1e3/0x662 [mdt]
      [<ffffffffa1225dee>] llog_process_thread+0x8ce/0xe50 [obdclass]
      [<ffffffffa1a70d60>] ? hsm_restore_cb+0x0/0x662 [mdt]
      [<ffffffffa1227c35>] llog_process_or_fork+0x145/0x660 [obdclass]
      [<ffffffffa122a14a>] llog_cat_process_cb+0x39a/0x4b0 [obdclass]
      [<ffffffffa1225dee>] llog_process_thread+0x8ce/0xe50 [obdclass]
      [<ffffffffa1229db0>] ? llog_cat_process_cb+0x0/0x4b0 [obdclass]
      [<ffffffffa1227c35>] llog_process_or_fork+0x145/0x660 [obdclass]
      [<ffffffffa1a70d60>] ? hsm_restore_cb+0x0/0x662 [mdt]
      [<ffffffffa1228f7d>] llog_cat_process_or_fork+0x1ad/0x300 [obdclass]
      [<ffffffffa1a70d60>] ? hsm_restore_cb+0x0/0x662 [mdt]
      [<ffffffffa1a70d60>] ? hsm_restore_cb+0x0/0x662 [mdt]
      [<ffffffffa1a70d60>] ? hsm_restore_cb+0x0/0x662 [mdt]
      [<ffffffffa11ba731>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      [<ffffffffa1a70d60>] ? hsm_restore_cb+0x0/0x662 [mdt]
      [<ffffffffa12290e9>] llog_cat_process+0x19/0x20 [obdclass]
      [<ffffffffa1a604da>] cdt_llog_process+0xba/0x360 [mdt]
      [<ffffffff811e0420>] ? proc_reg_open+0x0/0x1d0
      [<ffffffffa1a69443>] mdt_hsm_cdt_start+0x133/0x4f0 [mdt]
      [<ffffffffa1a6acc6>] lprocfs_wr_hsm_cdt_control+0x66/0x9c0 [mdt]
      [<ffffffff8104452c>] ? __do_page_fault+0x1ec/0x480
      [<ffffffffa124badb>] lprocfs_fops_write+0x7b/0xa0 [obdclass]
      [<ffffffff811dfece>] proc_reg_write+0x7e/0xc0
      [<ffffffff8117a408>] vfs_write+0xb8/0x1a0
      [<ffffffff810d5d82>] ? audit_syscall_entry+0x272/0x2a0
      [<ffffffff8117ae21>] sys_write+0x51/0x90
      [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

      Attachments

        Issue Links

          Activity

            People

              bfaccini Bruno Faccini (Inactive)
              lixi Li Xi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: