Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6407

acceptor_000 runs at 100% all the time

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.8.0
    • Lustre 2.8.0
    • 3
    • 9223372036854775807

    Description

      Using 2.7.51 after I run llmount.sh I see acceptor_000 running at 100% all the time.

      top - 11:29:59 up 1 min,  2 users,  load average: 0.71, 0.19, 0.06
      Tasks: 298 total,   2 running, 296 sleeping,   0 stopped,   0 zombie
      Cpu(s):  0.1%us, 25.1%sy,  0.0%ni, 74.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
      Mem:   3901240k total,   596948k used,  3304292k free,    25188k buffers
      Swap:        0k total,        0k used,        0k free,   229524k cached
      
        PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
       2335 root      20   0     0    0    0 R 100.0  0.0   0:33.71 acceptor_000
       2278 root      20   0 15164 1352  908 R  0.7  0.0   0:00.28 top
          1 root      20   0 19352 1500 1188 S  0.0  0.0   0:00.85 init
          2 root      20   0     0    0    0 S  0.0  0.0   0:00.03 kthreadd
          3 root      RT   0     0    0    0 S  0.0  0.0   0:00.08 migration/0
          4 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0
      ...
      

      I crashed the machine a got a backtrace:

      crash> bt
      PID: 27520  TASK: ffff8800c0fa0580  CPU: 2   COMMAND: "acceptor_000"
       #0 [ffff88002c407e30] crash_nmi_callback at ffffffff8103054d
       #1 [ffff88002c407e50] notifier_call_chain at ffffffff81559e45
       #2 [ffff88002c407e90] __atomic_notifier_call_chain at ffffffff81559edc
       #3 [ffff88002c407ee0] atomic_notifier_call_chain at ffffffff81559f26
       #4 [ffff88002c407ef0] notify_die at ffffffff810a57be
       #5 [ffff88002c407f20] do_nmi at ffffffff815576a3
       #6 [ffff88002c407f50] nmi at ffffffff815571f0
          [exception RIP: check_poison_obj+80]
          RIP: ffffffff811840a0  RSP: ffff880012479bf0  RFLAGS: 00000293
          RAX: 000000000000006b  RBX: 0000000000000124  RCX: ffffffff8146c68f
          RDX: 000000000000006b  RSI: ffff8800aa5d4568  RDI: ffff88011dd81500
          RBP: ffff880012479c40   R8: 0000000000000000   R9: 0000000000000001
          R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
          R13: 0000000000000510  R14: ffff8800aa5d4570  R15: 000000000000050f
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
      --- <NMI exception stack> ---
       #7 [ffff880012479bf0] check_poison_obj at ffffffff811840a0
       #8 [ffff880012479c48] cache_alloc_debugcheck_after at ffffffff8118439c
       #9 [ffff880012479c88] kmem_cache_alloc at ffffffff81187806
      #10 [ffff880012479cd8] sock_alloc_inode at ffffffff8146c68f
      #11 [ffff880012479cf8] alloc_inode at ffffffff811c0cf7
      #12 [ffff880012479d18] new_inode at ffffffff811c19fb
      #13 [ffff880012479d48] sock_alloc at ffffffff8146d389
      #14 [ffff880012479d58] sock_create_lite at ffffffff8146dca5
      #15 [ffff880012479da8] lnet_sock_accept at ffffffffa0b07e86 [lnet]
      #16 [ffff880012479e08] lnet_acceptor at ffffffffa0b1a9b7 [lnet]
      #17 [ffff880012479eb8] kthread at ffffffff8109e856
      #18 [ffff880012479f48] kernel_thread at ffffffff8100c30a
      

      Attachments

        Activity

          People

            ashehata Amir Shehata (Inactive)
            jhammond John Hammond
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: