[LU-3629] vvp_env_session() ASSERTION( ses != ((void *)0) ) Created: 24/Jul/13 Updated: 12/Jan/19 Resolved: 12/Jan/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Christopher Morrone | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | llnl | ||
| Environment: |
Lustre 2.4.0-RC1_3chaos, PPC64 lustre client |
||
| Severity: | 3 |
| Rank (Obsolete): | 9346 |
| Description |
|
A few weeks ago we had a login node (Lustre client) die in the following assertion: (llite_internal.h:1064:vvp_env_session()) ASSERTION( ses != ((void *)0) ) It was running Lustre 2.4.0-RC1_3chaos The backtrace from crash looks like: crash> bt PID: 10792 TASK: c000000ef9de7da0 CPU: 28 COMMAND: "slurm_prolog" #0 [c000000bc3706720] .crash_kexec at c0000000000e5aa4 #1 [c000000bc3706920] .panic at c0000000005c4f40 #2 [c000000bc37069b0] .lbug_with_loc at d00000000aa714e0 [libcfs] #3 [c000000bc3706a40] .vvp_io_init at d00000000c6cc03c [lustre] #4 [c000000bc3706b20] .cl_io_init0 at d00000000b808024 [obdclass] #5 [c000000bc3706bd0] .cl_pages_prune at d00000000b7fbc18 [obdclass] #6 [c000000bc3706c80] .cl_object_prune at d00000000b7f1f00 [obdclass] #7 [c000000bc3706d30] .lov_delete_raid0 at d00000000c1fa8a4 [lov] #8 [c000000bc3706e50] .lov_object_delete at d00000000c1f9240 [lov] #9 [c000000bc3706f00] .lu_object_free at d00000000b7e3520 [obdclass] #10 [c000000bc3706fe0] .lu_object_put at d00000000b7e7360 [obdclass] #11 [c000000bc37070b0] .cl_object_put at d00000000b7f2c90 [obdclass] #12 [c000000bc3707120] .cl_inode_fini at d00000000c6bfd68 [lustre] #13 [c000000bc3707230] .ll_clear_inode at d00000000c677264 [lustre] #14 [c000000bc3707310] .clear_inode at c0000000001e1cc8 #15 [c000000bc37073a0] .dispose_list at c0000000001e2068 #16 [c000000bc3707450] .shrink_icache_memory at c0000000001e24c4 #17 [c000000bc3707540] .shrink_slab at c00000000016ecbc #18 [c000000bc3707600] .do_try_to_free_pages at c0000000001716b0 #19 [c000000bc3707720] .try_to_free_pages at c000000000171a88 #20 [c000000bc3707820] .__alloc_pages_nodemask at c0000000001668c0 #21 [c000000bc37079c0] .alloc_pages_vma at c0000000001a2694 #22 [c000000bc3707a70] .handle_pte_fault at c00000000017fec4 #23 [c000000bc3707b80] .do_page_fault at c0000000005c14b0 #24 [c000000bc3707e30] handle_page_fault at c00000000000520c Data Access error [301] exception frame: R0: 0000000000000000 R1: 00000fffffffd9c0 R2: 0000040000323268 R3: 0000040000320878 R4: 000000000000001d R5: 00000fffffffdebe R6: 0000000000000000 R7: 00000400003210c8 R8: 0000000000000218 R9: 0000000010320000 R10: 0000000000000031 R11: 0000000000020001 R12: 0000000028002482 R13: 000004000004f040 NIP: 00000400001f3ad4 MSR: 800000000000d032 OR3: 0000000010340000 CTR: 0000000000000000 LR: 00000400001f494c XER: 0000000000000010 CCR: 0000000028002482 MQ: 0000000000000001 DAR: 0000000010320008 DSISR: 0000000042000000 Syscall Result: 0000000000000000 This was a PPC64 login node. |
| Comments |
| Comment by Peter Jones [ 25/Jul/13 ] |
|
Niu Could you please comment on this one? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 29/Jul/13 ] |
|
I didn't see how this can happen from the master code. Chris, how to checkout the 2.4.0-RC1_3chaos? I'd like to if there is any difference in your branch. Thanks. |
| Comment by Peter Jones [ 29/Jul/13 ] |
|
Niu Try looking at https://github.com/chaos/lustre Peter |
| Comment by Niu Yawei (Inactive) [ 05/Aug/13 ] |
|
I still don't see how this can happen after checking the chaos code, I suppose it's rare, isn't it? Do you know if this was happening when memory is under pressure? and is there any other abnormal log from client? Thanks. |
| Comment by Christopher Morrone [ 07/Aug/13 ] |
|
It is fairly rare, yes. Memory pressure is certainly a possibility. The login nodes are heavily used. |
| Comment by Peter Jones [ 20/Jul/17 ] |
|
Ned Is this issue still seen on 2.8.x releases? Peter |
| Comment by Peter Jones [ 12/Jan/19 ] |
|
closing ancient ticket |