Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-440

system auto restart after running runracer

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • None
    • None
    • latest master build #176 RHEL6/x86_64
    • 3
    • 6598

    Description

      lustre client auto restart after running runracer and got following error:

      BUG: unable to handle kernel NULL pointer dereference at (null)

      client syslog:
      ----------------
      Lustre: DEBUG MARKER: ----============= acceptance-small: runracer ============---- Mon Jun 20 22:23:56 PDT 2011
      Lustre: DEBUG MARKER: excepting tests:
      Lustre: DEBUG MARKER: Using TIMEOUT=20
      LustreError: 5132:0:(quota_ctl.c:328:client_quota_ctl()) ptlrpc_queue_wait failed, rc: -114
      Lustre: 5213:0:(debug.c:323:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
      Lustre: 5213:0:(debug.c:323:libcfs_debug_str2mask()) Skipped 3 previous similar messages
      Lustre: DEBUG MARKER: == runracer test 1: racer on clients: client-15-ib,client-18-ib DURATION=120 == 22:24:02 (1308633842)
      Lustre: DEBUG MARKER: == runracer runracer test complete, duration 320 sec == 22:29:16 (1308634156)
      runracer returned 0
      Stopping clients: client-15.lab.whamcloud.com,client-18-ib /mnt/lustre (opts
      Stopping client client-15.lab.whamcloud.com /mnt/lustre opts:
      Stopping client client-18.lab.whamcloud.com /mnt/lustre opts:
      LustreError: 14900:0:(ldlm_request.c:1169:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
      LustreError: 14900:0:(ldlm_request.c:1796:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
      LustreError: 14900:0:(ldlm_request.c:1169:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
      LustreError: 14900:0:(ldlm_request.c:1796:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
      Lustre: client ffff88031779e400 umount complete
      Stopping clients: client-15.lab.whamcloud.com,client-18-ib /mnt/lustre2 (opts
      Stopping /mnt/mds1 (opts:-f)
      Stopping /mnt/ost1 (opts:-f)
      Stopping /mnt/ost2 (opts:-f)
      Stopping /mnt/ost3 (opts:-f)
      Stopping /mnt/ost4 (opts:-f)
      Stopping /mnt/ost5 (opts:-f)
      Stopping /mnt/ost6 (opts:-f)
      BUG: unable to handle kernel NULL pointer dereference at (null)
      IP: [<ffffffff814dcf35>] _spin_lock_irq+0x15/0x40
      PGD 3111e1067 PUD 309070067 PMD 0
      Oops: 0002 1 SMP
      last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
      CPU 2
      Modules linked in: llite_lloop(U) lustre(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) lquota(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa dm_mirror dm_region_hash dm_log mlx4_ib ib_mad ib_core mlx4_en mlx4_core igb serio_raw ghes hed i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp ext3 jbd mbcache sd_mod crc_t10dif ahci dm_mod [last unloaded: microcode]

      Modules linked in: llite_lloop(U) lustre(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) lquota(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa dm_mirror dm_region_hash dm_log mlx4_ib ib_mad ib_core mlx4_en mlx4_core igb serio_raw ghes hed i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp ext3 jbd mbcache sd_mod crc_t10dif ahci dm_mod [last unloaded: microcode]
      Pid: 15189, comm: rmmod Not tainted 2.6.32-131.2.1.el6.x86_64 #1 X8DTT
      RIP: 0010:[<ffffffff814dcf35>] [<ffffffff814dcf35>] _spin_lock_irq+0x15/0x40
      RSP: 0018:ffff88031b845da8 EFLAGS: 00010092
      RAX: 0000000000010000 RBX: ffff880325d82000 RCX: 000000000000f6e0
      RDX: 0000000000000000 RSI: ffff880319142290 RDI: 0000000000000000
      RBP: ffff88031b845da8 R08: 000000000000000b R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000000 R12: ffff880319142000
      R13: ffff880325d82000 R14: ffff88031918d480 R15: 0000000000000001
      FS: 00007f17989a4700(0000) GS:ffff880032e40000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000000 CR3: 0000000317293000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process rmmod (pid: 15189, threadinfo ffff88031b844000, task ffff88031b9eab00)
      Stack:
      ffff88031b845dd8 ffffffff8125689c ffff880325d82000 ffff880325d82328
      <0> ffff880325d82328 ffff88031918d480 ffff88031b845df8 ffffffff8124ba66
      <0> ffffffff81a8a820 ffff880325d82360 ffff88031b845e28 ffffffff81264a2d
      Call Trace:
      [<ffffffff8125689c>] blk_throtl_exit+0x3c/0xd0
      [<ffffffff8124ba66>] blk_release_queue+0x26/0x80
      [<ffffffff81264a2d>] kobject_release+0x8d/0x240
      [<ffffffff812649a0>] ? kobject_release+0x0/0x240
      [<ffffffff81265fd7>] kref_put+0x37/0x70
      [<ffffffff812648a7>] kobject_put+0x27/0x60
      [<ffffffff81247687>] blk_cleanup_queue+0x57/0x70
      [<ffffffffa00410b1>] lloop_exit+0x61/0x2f0 [llite_lloop]
      [<ffffffff81069012>] ? put_online_cpus+0x52/0x70
      [<ffffffff810a8ef8>] ? module_refcount+0x58/0x70
      [<ffffffff810a9a74>] sys_delete_module+0x194/0x260
      [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
      Code: c1 74 0e f3 90 0f b7 0f eb f5 83 3f 00 75 f4 eb df 48 89 d0 c9 c3 55 48 89 e5 0f 1f 44 00 00 fa 66 0f 1f 44 00 00 b8 00 00 01 00 <f0> 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 0f b7 17 eb f5
      RIP [<ffffffff814dcf35>] _spin_lock_irq+0x15/0x40
      RSP <ffff88031b845da8>
      CR2: 0000000000000000
      Initializing cgroup subsys cpuset
      Initializing cgroup subsys cpu
      Linux version 2.6.32-131.2.1.el6.x86_64 (mockbuild@x86-003.build.bos.redhat.com) (gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC) ) #1 SMP Wed May 18 07:07:37 EDT 2011
      Command line: ro root=UUID=f742492f-3be2-4ff7-89fd-5d28a2a02a2a rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us console=ttyS0,115200 irqpoll maxcpus=1 reset_devices cgroup_disable=memory memmap=exactmap memmap=640K@0K memmap=131436K@33408K elfcorehdr=164844K memmap=104K$920K memmap=8K$3136952K memmap=56K#3136960K memmap=328K#3137016K memmap=64K$3137344K memmap=8272K$3137456K memmap=262144K$3670016K memmap=4K$4175872K memmap=4096K$4190208K
      KERNEL supported cpus:
      Intel GenuineIntel
      AMD AuthenticAMD
      Centaur CentaurHauls
      BIOS-provided physical RAM map:
      BIOS-e820: 0000000000000100 - 000000000009cc00 (usable)
      BIOS-e820: 000000000009cc00 - 00000000000a0000 (reserved)
      BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)
      BIOS-e820: 0000000000100000 - 00000000bf760000 (usable)
      BIOS-e820: 00000000bf76e000 - 00000000bf770000 (reserved)
      BIOS-e820: 00000000bf770000 - 00000000bf77e000 (ACPI data)
      BIOS-e820: 00000000bf77e000 - 00000000bf7d0000 (ACPI NVS)
      BIOS-e820: 00000000bf7d0000 - 00000000bf7e0000 (reserved)
      BIOS-e820: 00000000bf7ec000 - 00000000c0000000 (reserved)
      BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
      BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
      BIOS-e820: 00000000ffc00000 - 0000000100000000 (reserved)
      BIOS-e820: 0000000100000000 - 0000000340000000 (usable)

      Attachments

        Activity

          People

            rread Robert Read (Inactive)
            sarah Sarah Liu
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: