Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3417

ASSERTION( __builtin_offsetof(kib_msg_t,ibm_u.immediate.ibim_payload[payload_nob]) <= (4<<10) ) failed:

    XMLWordPrintable

Details

    • 3
    • 8472

    Description

      We had a server running Lustre over ZFS, Lustre version 2.3.63-6chaos that hit the following assertion. Clients were mostly BG/Q clients, with some 2.1 x86_64 as well.

      <ConMan> Console [vesta56] log at 2013-05-24 17:00:00 PDT.
      2013-05-24 17:51:58 LustreError: 33230:0:(sec_null.c:320:null_alloc_rs()) vmalloc of 'rs' (-1073741304 bytes) failed
      2013-05-24 17:51:58 LustreError: 33230:0:(sec_null.c:320:null_alloc_rs()) 369821362 total bytes allocated by Lustre, 593003541 by LNET
      2013-05-24 17:51:58 LustreError: 33230:0:(pack_generic.c:428:lustre_msg_buf_v2()) msg ffff880da3718108 buffer[1] size -1073741792 too small (required 0, opc=0)
      2013-05-24 17:51:58 LNetError: 33230:0:(o2iblnd_cb.c:1601:kiblnd_send()) ASSERTION( __builtin_offsetof(kib_msg_t,ibm_u.immediate.ibim_payload[payload_nob]) <= (4<<10) ) failed:
      2013-05-24 17:51:58 LNetError: 33230:0:(o2iblnd_cb.c:1601:kiblnd_send()) LBUG
      

      When the failover partner started up, it hit the same assertion after recovery completed:

      2013-05-24 17:56:42 Lustre: fsv-OST0037: Client e985e75f-abf3-b986-3da7-3cd1c4f29af0 (at 172.20.17.95@o2ib500) refused reconnection, still busy with 1 active RPCs
      2013-05-24 17:56:42 Lustre: Skipped 128 previous similar messages
      2013-05-24 17:57:25 Lustre: fsv-OST0037: Recovery over after 3:45, of 405 clients 405 recovered and 0 were evicted.
      2013-05-24 17:57:25 LustreError: 11574:0:(sec_null.c:320:null_alloc_rs()) vmalloc of 'rs' (-1073741304 bytes) failed
      2013-05-24 17:57:25 LustreError: 11574:0:(sec_null.c:320:null_alloc_rs()) 408246602 total bytes allocated by Lustre, 1140997813 by LNET
      2013-05-24 17:57:25 LustreError: 11574:0:(pack_generic.c:428:lustre_msg_buf_v2()) msg ffff880e344b8108 buffer[1] size -1073741792 too small (required 0, opc=0)
      2013-05-24 17:57:25 LNetError: 11574:0:(o2iblnd_cb.c:1601:kiblnd_send()) ASSERTION( __builtin_offsetof(kib_msg_t,ibm_u.immediate.ibim_payload[payload_nob]) <= (4<<10) ) failed: 
      

      This happened over and over again for some time until a sysadmin intervened and aborted recovery manually. That seemed to allow everything to start up normally.

      Attachments

        Activity

          People

            bfaccini Bruno Faccini (Inactive)
            morrone Christopher Morrone (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: