Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3417

ASSERTION( __builtin_offsetof(kib_msg_t,ibm_u.immediate.ibim_payload[payload_nob]) <= (4<<10) ) failed:

    Details

    • Severity:
      3
    • Rank (Obsolete):
      8472

      Description

      We had a server running Lustre over ZFS, Lustre version 2.3.63-6chaos that hit the following assertion. Clients were mostly BG/Q clients, with some 2.1 x86_64 as well.

      <ConMan> Console [vesta56] log at 2013-05-24 17:00:00 PDT.
      2013-05-24 17:51:58 LustreError: 33230:0:(sec_null.c:320:null_alloc_rs()) vmalloc of 'rs' (-1073741304 bytes) failed
      2013-05-24 17:51:58 LustreError: 33230:0:(sec_null.c:320:null_alloc_rs()) 369821362 total bytes allocated by Lustre, 593003541 by LNET
      2013-05-24 17:51:58 LustreError: 33230:0:(pack_generic.c:428:lustre_msg_buf_v2()) msg ffff880da3718108 buffer[1] size -1073741792 too small (required 0, opc=0)
      2013-05-24 17:51:58 LNetError: 33230:0:(o2iblnd_cb.c:1601:kiblnd_send()) ASSERTION( __builtin_offsetof(kib_msg_t,ibm_u.immediate.ibim_payload[payload_nob]) <= (4<<10) ) failed:
      2013-05-24 17:51:58 LNetError: 33230:0:(o2iblnd_cb.c:1601:kiblnd_send()) LBUG
      

      When the failover partner started up, it hit the same assertion after recovery completed:

      2013-05-24 17:56:42 Lustre: fsv-OST0037: Client e985e75f-abf3-b986-3da7-3cd1c4f29af0 (at 172.20.17.95@o2ib500) refused reconnection, still busy with 1 active RPCs
      2013-05-24 17:56:42 Lustre: Skipped 128 previous similar messages
      2013-05-24 17:57:25 Lustre: fsv-OST0037: Recovery over after 3:45, of 405 clients 405 recovered and 0 were evicted.
      2013-05-24 17:57:25 LustreError: 11574:0:(sec_null.c:320:null_alloc_rs()) vmalloc of 'rs' (-1073741304 bytes) failed
      2013-05-24 17:57:25 LustreError: 11574:0:(sec_null.c:320:null_alloc_rs()) 408246602 total bytes allocated by Lustre, 1140997813 by LNET
      2013-05-24 17:57:25 LustreError: 11574:0:(pack_generic.c:428:lustre_msg_buf_v2()) msg ffff880e344b8108 buffer[1] size -1073741792 too small (required 0, opc=0)
      2013-05-24 17:57:25 LNetError: 11574:0:(o2iblnd_cb.c:1601:kiblnd_send()) ASSERTION( __builtin_offsetof(kib_msg_t,ibm_u.immediate.ibim_payload[payload_nob]) <= (4<<10) ) failed: 
      

      This happened over and over again for some time until a sysadmin intervened and aborted recovery manually. That seemed to allow everything to start up normally.

        Attachments

          Activity

            People

            • Assignee:
              bfaccini Bruno Faccini (Inactive)
              Reporter:
              morrone Christopher Morrone
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: