[LU-3417] ASSERTION( __builtin_offsetof(kib_msg_t,ibm_u.immediate.ibim_payload[payload_nob]) <= (4<<10) ) failed: - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Major
Fix Version/s: Lustre 2.4.0
Affects Version/s: None
Labels:
- llnl
Environment:
Lustre 2.3.63-6chaos (https://github.com/chaos/lustre/tree/2.3.63-6chaos)

Severity:
3
Rank (Obsolete):
8472

Description

We had a server running Lustre over ZFS, Lustre version 2.3.63-6chaos that hit the following assertion. Clients were mostly BG/Q clients, with some 2.1 x86_64 as well.

<ConMan> Console [vesta56] log at 2013-05-24 17:00:00 PDT.
2013-05-24 17:51:58 LustreError: 33230:0:(sec_null.c:320:null_alloc_rs()) vmalloc of 'rs' (-1073741304 bytes) failed
2013-05-24 17:51:58 LustreError: 33230:0:(sec_null.c:320:null_alloc_rs()) 369821362 total bytes allocated by Lustre, 593003541 by LNET
2013-05-24 17:51:58 LustreError: 33230:0:(pack_generic.c:428:lustre_msg_buf_v2()) msg ffff880da3718108 buffer[1] size -1073741792 too small (required 0, opc=0)
2013-05-24 17:51:58 LNetError: 33230:0:(o2iblnd_cb.c:1601:kiblnd_send()) ASSERTION( __builtin_offsetof(kib_msg_t,ibm_u.immediate.ibim_payload[payload_nob]) <= (4<<10) ) failed:
2013-05-24 17:51:58 LNetError: 33230:0:(o2iblnd_cb.c:1601:kiblnd_send()) LBUG

When the failover partner started up, it hit the same assertion after recovery completed:

2013-05-24 17:56:42 Lustre: fsv-OST0037: Client e985e75f-abf3-b986-3da7-3cd1c4f29af0 (at 172.20.17.95@o2ib500) refused reconnection, still busy with 1 active RPCs
2013-05-24 17:56:42 Lustre: Skipped 128 previous similar messages
2013-05-24 17:57:25 Lustre: fsv-OST0037: Recovery over after 3:45, of 405 clients 405 recovered and 0 were evicted.
2013-05-24 17:57:25 LustreError: 11574:0:(sec_null.c:320:null_alloc_rs()) vmalloc of 'rs' (-1073741304 bytes) failed
2013-05-24 17:57:25 LustreError: 11574:0:(sec_null.c:320:null_alloc_rs()) 408246602 total bytes allocated by Lustre, 1140997813 by LNET
2013-05-24 17:57:25 LustreError: 11574:0:(pack_generic.c:428:lustre_msg_buf_v2()) msg ffff880e344b8108 buffer[1] size -1073741792 too small (required 0, opc=0)
2013-05-24 17:57:25 LNetError: 11574:0:(o2iblnd_cb.c:1601:kiblnd_send()) ASSERTION( __builtin_offsetof(kib_msg_t,ibm_u.immediate.ibim_payload[payload_nob]) <= (4<<10) ) failed:

This happened over and over again for some time until a sysadmin intervened and aborted recovery manually. That seemed to allow everything to start up normally.

Attachments

Activity

People

Assignee:: Bruno Faccini (Inactive)

Reporter:: Christopher Morrone (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 30/May/13 12:21 AM

Updated:: 13/Oct/21 3:03 AM

Resolved:: 13/Oct/21 3:03 AM