Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.8.0
-
3
-
9223372036854775807
Description
Using 2.7.51 after I run llmount.sh I see acceptor_000 running at 100% all the time.
top - 11:29:59 up 1 min, 2 users, load average: 0.71, 0.19, 0.06
Tasks: 298 total, 2 running, 296 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.1%us, 25.1%sy, 0.0%ni, 74.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 3901240k total, 596948k used, 3304292k free, 25188k buffers
Swap: 0k total, 0k used, 0k free, 229524k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2335 root 20 0 0 0 0 R 100.0 0.0 0:33.71 acceptor_000
2278 root 20 0 15164 1352 908 R 0.7 0.0 0:00.28 top
1 root 20 0 19352 1500 1188 S 0.0 0.0 0:00.85 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.03 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 0:00.08 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
...
I crashed the machine a got a backtrace:
crash> bt
PID: 27520 TASK: ffff8800c0fa0580 CPU: 2 COMMAND: "acceptor_000"
#0 [ffff88002c407e30] crash_nmi_callback at ffffffff8103054d
#1 [ffff88002c407e50] notifier_call_chain at ffffffff81559e45
#2 [ffff88002c407e90] __atomic_notifier_call_chain at ffffffff81559edc
#3 [ffff88002c407ee0] atomic_notifier_call_chain at ffffffff81559f26
#4 [ffff88002c407ef0] notify_die at ffffffff810a57be
#5 [ffff88002c407f20] do_nmi at ffffffff815576a3
#6 [ffff88002c407f50] nmi at ffffffff815571f0
[exception RIP: check_poison_obj+80]
RIP: ffffffff811840a0 RSP: ffff880012479bf0 RFLAGS: 00000293
RAX: 000000000000006b RBX: 0000000000000124 RCX: ffffffff8146c68f
RDX: 000000000000006b RSI: ffff8800aa5d4568 RDI: ffff88011dd81500
RBP: ffff880012479c40 R8: 0000000000000000 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000510 R14: ffff8800aa5d4570 R15: 000000000000050f
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#7 [ffff880012479bf0] check_poison_obj at ffffffff811840a0
#8 [ffff880012479c48] cache_alloc_debugcheck_after at ffffffff8118439c
#9 [ffff880012479c88] kmem_cache_alloc at ffffffff81187806
#10 [ffff880012479cd8] sock_alloc_inode at ffffffff8146c68f
#11 [ffff880012479cf8] alloc_inode at ffffffff811c0cf7
#12 [ffff880012479d18] new_inode at ffffffff811c19fb
#13 [ffff880012479d48] sock_alloc at ffffffff8146d389
#14 [ffff880012479d58] sock_create_lite at ffffffff8146dca5
#15 [ffff880012479da8] lnet_sock_accept at ffffffffa0b07e86 [lnet]
#16 [ffff880012479e08] lnet_acceptor at ffffffffa0b1a9b7 [lnet]
#17 [ffff880012479eb8] kthread at ffffffff8109e856
#18 [ffff880012479f48] kernel_thread at ffffffff8100c30a