Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.4.0
-
Lustre 2.4.0-RC1_3chaos, PPC64 lustre client
-
3
-
9346
Description
A few weeks ago we had a login node (Lustre client) die in the following assertion:
(llite_internal.h:1064:vvp_env_session()) ASSERTION( ses != ((void *)0) )
It was running Lustre 2.4.0-RC1_3chaos
The backtrace from crash looks like:
crash> bt PID: 10792 TASK: c000000ef9de7da0 CPU: 28 COMMAND: "slurm_prolog" #0 [c000000bc3706720] .crash_kexec at c0000000000e5aa4 #1 [c000000bc3706920] .panic at c0000000005c4f40 #2 [c000000bc37069b0] .lbug_with_loc at d00000000aa714e0 [libcfs] #3 [c000000bc3706a40] .vvp_io_init at d00000000c6cc03c [lustre] #4 [c000000bc3706b20] .cl_io_init0 at d00000000b808024 [obdclass] #5 [c000000bc3706bd0] .cl_pages_prune at d00000000b7fbc18 [obdclass] #6 [c000000bc3706c80] .cl_object_prune at d00000000b7f1f00 [obdclass] #7 [c000000bc3706d30] .lov_delete_raid0 at d00000000c1fa8a4 [lov] #8 [c000000bc3706e50] .lov_object_delete at d00000000c1f9240 [lov] #9 [c000000bc3706f00] .lu_object_free at d00000000b7e3520 [obdclass] #10 [c000000bc3706fe0] .lu_object_put at d00000000b7e7360 [obdclass] #11 [c000000bc37070b0] .cl_object_put at d00000000b7f2c90 [obdclass] #12 [c000000bc3707120] .cl_inode_fini at d00000000c6bfd68 [lustre] #13 [c000000bc3707230] .ll_clear_inode at d00000000c677264 [lustre] #14 [c000000bc3707310] .clear_inode at c0000000001e1cc8 #15 [c000000bc37073a0] .dispose_list at c0000000001e2068 #16 [c000000bc3707450] .shrink_icache_memory at c0000000001e24c4 #17 [c000000bc3707540] .shrink_slab at c00000000016ecbc #18 [c000000bc3707600] .do_try_to_free_pages at c0000000001716b0 #19 [c000000bc3707720] .try_to_free_pages at c000000000171a88 #20 [c000000bc3707820] .__alloc_pages_nodemask at c0000000001668c0 #21 [c000000bc37079c0] .alloc_pages_vma at c0000000001a2694 #22 [c000000bc3707a70] .handle_pte_fault at c00000000017fec4 #23 [c000000bc3707b80] .do_page_fault at c0000000005c14b0 #24 [c000000bc3707e30] handle_page_fault at c00000000000520c Data Access error [301] exception frame: R0: 0000000000000000 R1: 00000fffffffd9c0 R2: 0000040000323268 R3: 0000040000320878 R4: 000000000000001d R5: 00000fffffffdebe R6: 0000000000000000 R7: 00000400003210c8 R8: 0000000000000218 R9: 0000000010320000 R10: 0000000000000031 R11: 0000000000020001 R12: 0000000028002482 R13: 000004000004f040 NIP: 00000400001f3ad4 MSR: 800000000000d032 OR3: 0000000010340000 CTR: 0000000000000000 LR: 00000400001f494c XER: 0000000000000010 CCR: 0000000028002482 MQ: 0000000000000001 DAR: 0000000010320008 DSISR: 0000000042000000 Syscall Result: 0000000000000000
This was a PPC64 login node.