Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7966

LNetError: 4231:0:(linux-cpu.c:1081:cfs_cpu_init()) LBUG

Details

    • Bug
    • Resolution: Done
    • Blocker
    • None
    • Lustre 2.8.0
    • lola
      build: 2.8 GA + patches
    • 3
    • 9223372036854775807

    Description

      Error happens during soak testing of build '20160324' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160324).
      LNet run on IB (all nodes equipped with Mellanox HCAs 4xQDR )

      Sequence of events

      • Error happened after a MDS node paniced (see LU-7935) during MDT failback at 2016-03-29 14:53( umount of MDT). The MDT (lola-9) node
        was unsuable (i.e no primary or secondary resources mounted) as the error occurred on the Lustre client described below. Anyway, evtl. this event isn't related.
      • Lustre client crash with the following error message:
        <0>LNetError: 4231:0:(linux-cpu.c:1081:cfs_cpu_init()) ASSERTION( !(((current_thread_info()->preempt_count) & ((((1UL << (10))-1) << ((0 + 8) + 8)) | (((1UL << (8))-1) << (0 + 8)) | (((1UL << (1))-1) << (((0 + 8) + 8) + 10))))) || (((cpumask_size())) <= (2 << 12) && ((((((gfp_t)0x10u) | ((gfp_t)0x40u)))) & (((gfp_t)0x20u)))) != 0 ) failed: 
        <0>LNetError: 4231:0:(linux-cpu.c:1081:cfs_cpu_init()) LBUG
        <0>Kernel panic - not syncing: LBUG in interrupt.
        <0>
        <4>Pid: 4231, comm: modprobe Not tainted 2.6.32-504.30.3.el6.x86_64 #1
        <4>Call Trace:
        <4> [<ffffffff815293fc>] ? panic+0xa7/0x16f
        <4> [<ffffffffa0478ebd>] ? lbug_with_loc+0x8d/0xb0 [libcfs]
        <4> [<ffffffffa047dcfc>] ? cfs_cpu_init+0xc7c/0xcb0 [libcfs]
        <4> [<ffffffff810a5525>] ? atomic_notifier_chain_register+0x55/0x60
        <4> [<ffffffffa047875c>] ? libcfs_register_panic_notifier+0x1c/0x20 [libcfs]
        <4> [<ffffffffa0482b70>] ? init_libcfs_module+0x0/0x340 [libcfs]
        <4> [<ffffffffa0482b97>] ? init_libcfs_module+0x27/0x340 [libcfs]
        <4> [<ffffffff8100204c>] ? do_one_initcall+0x3c/0x1d0
        <4> [<ffffffff810c0181>] ? sys_init_module+0xe1/0x250
        <4> [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
        

      Attached files:
      console, messages, vmcore-dmsg.txt of affected node lola-33.
      Crash dump file is available.

      Attachments

        Activity

          People

            heckes Frank Heckes (Inactive)
            heckes Frank Heckes (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: