Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8573

IOR: niobuf.c:319:ptlrpc_register_bulk()) ASSERTION( desc->bd_md_count == 0 ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.9.0
    • Lustre 2.9.0, Lustre 2.10.3
    • None
    • Lustre-master 2.8.56_68_gd4a4c07 build 3430, RHEL7, Spirit performance cluster Revision: d4a4c0795b3befb87d47a5bf441adeba3b1c36f8
    • 3
    • 9223372036854775807

    Description

      Attempted IOR test on both sets of OSTs.
      No progress made on test.
      Client and OST logs are full of:
      OST

      LustreError: 5245:0:(events.c:449:server_bulk_callback()) event type 3, status -113, desc ffff88104dc53200
      LustreError: 9071:0:(events.c:449:server_bulk_callback()) event type 3, status -103, desc ffff880b0c956200
      LustreError: 5245:0:(events.c:449:server_bulk_callback()) event type 5, status -113, desc ffff880b13e95200
      LustreError: 5245:0:(events.c:449:server_bulk_callback()) event type 3, status -113, desc ffff880b13e95200
      LustreError: 5245:0:(events.c:449:server_bulk_callback()) event type 5, status -113, desc ffff881032b5a800
      LustreError: 5245:0:(events.c:449:server_bulk_callback()) event type 3, status -113, desc ffff881032b5a800
      

      Client

      [12378.313197] LustreError: 3138:0:(events.c:203:client_bulk_callback()) event type 1, status -5, desc ffff8807442f9c00
      [12378.313223] LustreError: 3136:0:(events.c:203:client_bulk_callback()) event type 1, status -5, desc ffff8807442f9800
      [12378.337649] LustreError: 3136:0:(events.c:203:client_bulk_callback()) event type 1, status -5, desc ffff8810102c4200
      [12378.337671] LustreError: 3136:0:(events.c:203:client_bulk_callback()) event type 1, status -5, desc ffff880f55547400
      [12378.337677] LustreError: 3138:0:(events.c:203:client_bulk_callback()) event type 1, status -5, desc ffff880729970000
      [12378.362167] LustreError: 3137:0:(events.c:203:client_bulk_callback()) event type 1, status -5, desc ffff880804b78800
      [12378.374324] LustreError: 3137:0:(events.c:203:client_bulk_callback()) event type 1, status -5, desc ffff8807299e5200
      [12378.410886] LustreError: 3137:0:(events.c:203:client_bulk_callback()) event type 1, status -5, desc ffff880f53ce1c00
      

      Eventually, one client LBUGS

      [12377.942289] LustreError: 3153:0:(niobuf.c:319:ptlrpc_register_bulk()) ASSERTION( desc->bd_md_count == 0 ) failed: 
      [12378.564210] LustreError: 3153:0:(niobuf.c:319:ptlrpc_register_bulk()) LBUG
      [12378.564210] LustreError: 3153:0:(niobuf.c:319:ptlrpc_register_bulk()) LBUG
      [12378.571890] Pid: 3153, comm: ptlrpcd_01_02
      [12378.576468]
      Call Trace:
      [12378.580868]  [<ffffffffa08557d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
      [12378.588652]  [<ffffffffa0855d75>] lbug_with_loc+0x45/0xc0 [libcfs]
      [12378.595597]  [<ffffffffa0c3e661>] ptlrpc_register_bulk+0x831/0x9c0 [ptlrpc]
      [12378.603389]  [<ffffffffa08cace2>] ? LNetMDUnlink+0xe2/0x180 [lnet]
      [12378.610322]  [<ffffffffa0c6be76>] ? sptlrpc_import_sec_ref+0x36/0x40 [ptlrpc]
      [12378.618321]  [<ffffffffa0c3f1af>] ptl_send_rpc+0x1ff/0xda0 [ptlrpc]
      [12378.625361]  [<ffffffffa0c39256>] ptlrpc_check_set.part.23+0x1896/0x1dd0 [ptlrpc]
      [12378.633743]  [<ffffffffa0c397eb>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
      [12378.640976]  [<ffffffffa0c643fb>] ptlrpcd_check+0x4eb/0x5e0 [ptlrpc]
      [12378.648095]  [<ffffffffa0c647ab>] ptlrpcd+0x2bb/0x560 [ptlrpc]
      [12378.654613]  [<ffffffff810b8910>] ? default_wake_function+0x0/0x20
      [12378.661550]  [<ffffffffa0c644f0>] ? ptlrpcd+0x0/0x560 [ptlrpc]
      [12378.668067]  [<ffffffff810a5b2f>] kthread+0xcf/0xe0
      [12378.673515]  [<ffffffff810a5a60>] ? kthread+0x0/0xe0
      [12378.679058]  [<ffffffff81646a98>] ret_from_fork+0x58/0x90
      [12378.685086]  [<ffffffff810a5a60>] ? kthread+0x0/0xe0
      [12378.690626]
      [12378.692396] Kernel panic - not syncing: LBUG
      [12378.697164] CPU: 9 PID: 3153 Comm: ptlrpcd_01_02 Tainted: G           OE  ------------   3.10.0-327.28.3.el7.x86_64 #1
      [12378.709098] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.01.0002.082220131453 08/22/2013
      [12378.720548]  ffffffffa0872def 000000003210ab53 ffff881000c1ba80 ffffffff81636453
      [12378.728852]  ffff881000c1bb00 ffffffff8162fce7 ffffffff00000008 ffff881000c1bb10
      [12378.737165]  ffff881000c1bab0 000000003210ab53 ffffffffa0c90d30 0000000000000246
      [12378.745463] Call Trace:
      [12378.748204]  [<ffffffff81636453>] dump_stack+0x19/0x1b
      [12378.753939]  [<ffffffff8162fce7>] panic+0xd8/0x1e7
      [12378.759295]  [<ffffffffa0855ddb>] lbug_with_loc+0xab/0xc0 [libcfs]
      [12378.766235]  [<ffffffffa0c3e661>] ptlrpc_register_bulk+0x831/0x9c0 [ptlrpc]
      [12378.774009]  [<ffffffffa08cace2>] ? LNetMDUnlink+0xe2/0x180 [lnet]
      [12378.780949]  [<ffffffffa0c6be76>] ? sptlrpc_import_sec_ref+0x36/0x40 [ptlrpc]
      [12378.788950]  [<ffffffffa0c3f1af>] ptl_send_rpc+0x1ff/0xda0 [ptlrpc]
      [12378.795981]  [<ffffffffa0c39256>] ptlrpc_check_set.part.23+0x1896/0x1dd0 [ptlrpc]
      [12378.804368]  [<ffffffffa0c397eb>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
      [12378.811596]  [<ffffffffa0c643fb>] ptlrpcd_check+0x4eb/0x5e0 [ptlrpc]
      [12378.818725]  [<ffffffffa0c647ab>] ptlrpcd+0x2bb/0x560 [ptlrpc]
      [12378.825237]  [<ffffffff810b8910>] ? wake_up_state+0x20/0x20
      [12378.831492]  [<ffffffffa0c644f0>] ? ptlrpcd_check+0x5e0/0x5e0 [ptlrpc]
      [12378.838778]  [<ffffffff810a5b2f>] kthread+0xcf/0xe0
      [12378.844224]  [<ffffffff810a5a60>] ? kthread_create_on_node+0x140/0x140
      [12378.851510]  [<ffffffff81646a98>] ret_from_fork+0x58/0x90
      [12378.857534]  [<ffffffff810a5a60>] ? kthread_create_on_node+0x140/0x140
      

      vmcore is available on Spirit for analysis

      Attachments

        1. client.spirit-28.log.gz
          2.98 MB
        2. lu-8573.console.tar
          0.2 kB
        3. lu-8573-syslog.tar
          0.2 kB
        4. mds.sprit-3.log.gz
          0.2 kB
        5. oss.aeon-1.log.gz
          0.2 kB
        6. oss.aeon-2.log.gz
          2.29 MB
        7. spirit-11.log.gz
          1.24 MB
        8. spirit-14-lustre-log.20160928_2029-4-lu-8573.bz2
          1.01 MB
        9. spirit-15-lustre-log.20160928_2029-4-lu-8573.bz2
          954 kB
        10. spirit-16-lustre-log.20160928_2029-4-lu-8573.bz2
          1.04 MB
        11. spirit-17.log.gz
          1.05 MB
        12. spirit-20.log.gz
          1.23 MB
        13. spirit-33.log-20160904
          15 kB
        14. spirit-3-lustre-log.20160928_2029-4-lu-8573.bz2
          0.3 kB
        15. spirit-4.log.gz
          2.20 MB
        16. spirit-5.callback.txt.gz
          0.2 kB
        17. spirit-5.log.gz
          1.28 MB
        18. spirit-6.callback.lustre.log.gz
          0.3 kB
        19. spirit-6.log.gz
          1.42 MB
        20. spirit-aeon-1-lustre-log.20160928_2029-4-lu-8573.bz2
          0.3 kB
        21. spirit-aeon-2-lustre-log.20160928_2029-4-lu-8573.bz2
          0.3 kB
        22. sprit-11.callbacks.lustre.log.gz
          1.98 MB

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: