Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.8.0
-
3
-
9223372036854775807
Description
Using 2.7.51 after I run llmount.sh I see acceptor_000 running at 100% all the time.
top - 11:29:59 up 1 min, 2 users, load average: 0.71, 0.19, 0.06 Tasks: 298 total, 2 running, 296 sleeping, 0 stopped, 0 zombie Cpu(s): 0.1%us, 25.1%sy, 0.0%ni, 74.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3901240k total, 596948k used, 3304292k free, 25188k buffers Swap: 0k total, 0k used, 0k free, 229524k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2335 root 20 0 0 0 0 R 100.0 0.0 0:33.71 acceptor_000 2278 root 20 0 15164 1352 908 R 0.7 0.0 0:00.28 top 1 root 20 0 19352 1500 1188 S 0.0 0.0 0:00.85 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.03 kthreadd 3 root RT 0 0 0 0 S 0.0 0.0 0:00.08 migration/0 4 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 ...
I crashed the machine a got a backtrace:
crash> bt PID: 27520 TASK: ffff8800c0fa0580 CPU: 2 COMMAND: "acceptor_000" #0 [ffff88002c407e30] crash_nmi_callback at ffffffff8103054d #1 [ffff88002c407e50] notifier_call_chain at ffffffff81559e45 #2 [ffff88002c407e90] __atomic_notifier_call_chain at ffffffff81559edc #3 [ffff88002c407ee0] atomic_notifier_call_chain at ffffffff81559f26 #4 [ffff88002c407ef0] notify_die at ffffffff810a57be #5 [ffff88002c407f20] do_nmi at ffffffff815576a3 #6 [ffff88002c407f50] nmi at ffffffff815571f0 [exception RIP: check_poison_obj+80] RIP: ffffffff811840a0 RSP: ffff880012479bf0 RFLAGS: 00000293 RAX: 000000000000006b RBX: 0000000000000124 RCX: ffffffff8146c68f RDX: 000000000000006b RSI: ffff8800aa5d4568 RDI: ffff88011dd81500 RBP: ffff880012479c40 R8: 0000000000000000 R9: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000510 R14: ffff8800aa5d4570 R15: 000000000000050f ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #7 [ffff880012479bf0] check_poison_obj at ffffffff811840a0 #8 [ffff880012479c48] cache_alloc_debugcheck_after at ffffffff8118439c #9 [ffff880012479c88] kmem_cache_alloc at ffffffff81187806 #10 [ffff880012479cd8] sock_alloc_inode at ffffffff8146c68f #11 [ffff880012479cf8] alloc_inode at ffffffff811c0cf7 #12 [ffff880012479d18] new_inode at ffffffff811c19fb #13 [ffff880012479d48] sock_alloc at ffffffff8146d389 #14 [ffff880012479d58] sock_create_lite at ffffffff8146dca5 #15 [ffff880012479da8] lnet_sock_accept at ffffffffa0b07e86 [lnet] #16 [ffff880012479e08] lnet_acceptor at ffffffffa0b1a9b7 [lnet] #17 [ffff880012479eb8] kthread at ffffffff8109e856 #18 [ffff880012479f48] kernel_thread at ffffffff8100c30a