Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10395

ASSERTION( osd->od_oi_table != NULL && osd->od_oi_count >= 1 ) failed

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0, Lustre 2.12.6
    • Lustre 2.12.0, Lustre 2.13.0
    • None
    • 3
    • 9223372036854775807

    Description

      Just had this crash in my master-next testing:

       

       
      [271899.484182] Lustre: DEBUG MARKER: == replay-single test 26: |X| open(O_CREAT), unlink two, close one, replay, close one (test mds_cleanup_orphans) ====================================================================================================== 05:31:19 (1513333879)
      [271900.114927] Turning device loop0 (0x700000) read-only
      [271900.159562] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
      [271900.197289] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000
      [271900.868045] LustreError: 29112:0:(osd_internal.h:899:osd_fid2oi()) ASSERTION( osd->od_oi_table != NULL && osd->od_oi_count >= 1 ) failed: [0xa:0x8:0x0]
      [271900.870230] LustreError: 29112:0:(osd_internal.h:899:osd_fid2oi()) LBUG
      [271900.870897] Pid: 29112, comm: ll_mgs_0002
      [271900.871499] 
      Call Trace:
      [271900.874098]  [<ffffffffa02927ce>] libcfs_call_trace+0x4e/0x60 [libcfs]
      [271900.874904]  [<ffffffffa029285c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      [271900.875989]  [<ffffffffa0bd5390>] __osd_oi_lookup+0x2e0/0x390 [osd_ldiskfs]
      [271900.876716]  [<ffffffffa0bd715a>] osd_oi_lookup+0xca/0x190 [osd_ldiskfs]
      [271900.877452]  [<ffffffffa0bd3112>] osd_fid_lookup+0x4a2/0x1b50 [osd_ldiskfs]
      [271900.878132]  [<ffffffff810e3224>] ? lockdep_init_map+0xc4/0x600
      [271900.902777]  [<ffffffffa0bd4821>] osd_object_init+0x61/0x180 [osd_ldiskfs]
      [271900.903535]  [<ffffffffa03d352f>] lu_object_alloc+0xdf/0x310 [obdclass]
      [271900.904230]  [<ffffffffa03d38cc>] lu_object_find_at+0x16c/0x290 [obdclass]
      [271900.904930]  [<ffffffffa03d4d88>] dt_locate_at+0x18/0xb0 [obdclass]
      [271900.905594]  [<ffffffffa0399140>] llog_osd_open+0x4f0/0xf80 [obdclass]
      [271900.906616]  [<ffffffffa038814a>] llog_open+0x13a/0x3b0 [obdclass]
      [271900.907360]  [<ffffffffa0647953>] llog_origin_handle_read_header+0x1b3/0x630 [ptlrpc]
      [271900.908617]  [<ffffffffa068da13>] tgt_llog_read_header+0x33/0xe0 [ptlrpc]
      [271900.909364]  [<ffffffffa069716b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc]
      [271900.910077]  [<ffffffffa063c091>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc]
      [271900.911298]  [<ffffffffa063fe48>] ptlrpc_main+0xa58/0x1df0 [ptlrpc]
      [271900.911970]  [<ffffffff81706467>] ? _raw_spin_unlock_irq+0x27/0x50
      [271900.912646]  [<ffffffffa063f3f0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc]
      [271900.914668]  [<ffffffff810a2eba>] kthread+0xea/0xf0
      [271900.915359]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [271900.915996]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
      [271900.916627]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [271900.917273] 
      [271900.917861] Kernel panic - not syncing: LBUG
      

      I have a crashdump.

      Attachments

        Activity

          [LU-10395] ASSERTION( osd->od_oi_table != NULL && osd->od_oi_count >= 1 ) failed

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38153/
          Subject: LU-10395 osd: stop OI at device shutdown
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set:
          Commit: b27a323147d992b510fddcfbef8aaef508be7c87

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38153/ Subject: LU-10395 osd: stop OI at device shutdown Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: b27a323147d992b510fddcfbef8aaef508be7c87

          can we close the ticket now?

          bzzz Alex Zhuravlev added a comment - can we close the ticket now?

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37635/
          Subject: LU-10395 tests: add test_280 sanity
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: f4eeadee5ba5d4ab9d04918d8d81d18907daa831

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37635/ Subject: LU-10395 tests: add test_280 sanity Project: fs/lustre-release Branch: master Current Patch Set: Commit: f4eeadee5ba5d4ab9d04918d8d81d18907daa831

          Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38153
          Subject: LU-10395 osd: stop OI at device shutdown
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set: 1
          Commit: a7e345f49985b29d1d6f45a6065af56340102470

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38153 Subject: LU-10395 osd: stop OI at device shutdown Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: a7e345f49985b29d1d6f45a6065af56340102470

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37615/
          Subject: LU-10395 osd: stop OI at device shutdown
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 2789978e1192dbf6d90399c96b5594e0dc049cd9

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37615/ Subject: LU-10395 osd: stop OI at device shutdown Project: fs/lustre-release Branch: master Current Patch Set: Commit: 2789978e1192dbf6d90399c96b5594e0dc049cd9
          adilger Andreas Dilger added a comment - - edited

          +1 on master sanity test_208:
          https://testing.whamcloud.com/test_sets/52f42644-0100-4d2d-bb11-794a4b7a1bf0

          [ 8058.688742] LustreError: 12938:0:(osd_internal.h:1010:osd_fid2oi()) ASSERTION( osd->od_oi_table != NULL && osd->od_oi_count >= 1 ) failed: [0xa:0x3:0x0]
          [ 8058.691039] LustreError: 12938:0:(osd_internal.h:1010:osd_fid2oi()) LBUG
          [ 8058.692144] Pid: 12938, comm: ll_mgs_0002 3.10.0-1062.9.1.el7_lustre.x86_64 #1 SMP Wed Feb 12 09:50:45 UTC 2020
          [ 8058.693915] Call Trace:
          [ 8058.694388]  [<ffffffffc09b8f7c>] libcfs_call_trace+0x8c/0xc0 [libcfs]
          [ 8058.695655]  [<ffffffffc09b902c>] lbug_with_loc+0x4c/0xa0 [libcfs]
          [ 8058.696793]  [<ffffffffc11159d0>] __osd_oi_lookup+0x310/0x3c0 [osd_ldiskfs]
          [ 8058.698145]  [<ffffffffc1117925>] osd_oi_lookup+0x95/0x1e0 [osd_ldiskfs]
          [ 8058.699407]  [<ffffffffc1112ff5>] osd_fid_lookup+0x455/0x1d60 [osd_ldiskfs]
          [ 8058.700594]  [<ffffffffc1114961>] osd_object_init+0x61/0x110 [osd_ldiskfs]
          [ 8058.701927]  [<ffffffffc0bdbafb>] lu_object_start.isra.31+0x8b/0x120 [obdclass]
          [ 8058.703596]  [<ffffffffc0bdfba2>] lu_object_find_at+0x1b2/0x980 [obdclass]
          [ 8058.704808]  [<ffffffffc0be0fcd>] dt_locate_at+0x1d/0xb0 [obdclass]
          [ 8058.705983]  [<ffffffffc0ba2c4e>] llog_osd_open+0x50e/0xf30 [obdclass]
          [ 8058.707231]  [<ffffffffc0b8f08f>] llog_open+0x25f/0x400 [obdclass]
          [ 8058.708380]  [<ffffffffc0edb5b6>] llog_origin_handle_read_header+0x1b6/0x630 [ptlrpc]
          [ 8058.710027]  [<ffffffffc0f25ca3>] tgt_llog_read_header+0x33/0xe0 [ptlrpc]
          [ 8058.711367]  [<ffffffffc0f2f68a>] tgt_request_handle+0x95a/0x1610 [ptlrpc]
          [ 8058.712594]  [<ffffffffc0ed1066>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
          [ 8058.714032]  [<ffffffffc0ed5464>] ptlrpc_main+0xbb4/0x1550 [ptlrpc]
          [ 8058.715207]  [<ffffffffa32c61f1>] kthread+0xd1/0xe0
          
          adilger Andreas Dilger added a comment - - edited +1 on master sanity test_208: https://testing.whamcloud.com/test_sets/52f42644-0100-4d2d-bb11-794a4b7a1bf0 [ 8058.688742] LustreError: 12938:0:(osd_internal.h:1010:osd_fid2oi()) ASSERTION( osd->od_oi_table != NULL && osd->od_oi_count >= 1 ) failed: [0xa:0x3:0x0] [ 8058.691039] LustreError: 12938:0:(osd_internal.h:1010:osd_fid2oi()) LBUG [ 8058.692144] Pid: 12938, comm: ll_mgs_0002 3.10.0-1062.9.1.el7_lustre.x86_64 #1 SMP Wed Feb 12 09:50:45 UTC 2020 [ 8058.693915] Call Trace: [ 8058.694388] [<ffffffffc09b8f7c>] libcfs_call_trace+0x8c/0xc0 [libcfs] [ 8058.695655] [<ffffffffc09b902c>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 8058.696793] [<ffffffffc11159d0>] __osd_oi_lookup+0x310/0x3c0 [osd_ldiskfs] [ 8058.698145] [<ffffffffc1117925>] osd_oi_lookup+0x95/0x1e0 [osd_ldiskfs] [ 8058.699407] [<ffffffffc1112ff5>] osd_fid_lookup+0x455/0x1d60 [osd_ldiskfs] [ 8058.700594] [<ffffffffc1114961>] osd_object_init+0x61/0x110 [osd_ldiskfs] [ 8058.701927] [<ffffffffc0bdbafb>] lu_object_start.isra.31+0x8b/0x120 [obdclass] [ 8058.703596] [<ffffffffc0bdfba2>] lu_object_find_at+0x1b2/0x980 [obdclass] [ 8058.704808] [<ffffffffc0be0fcd>] dt_locate_at+0x1d/0xb0 [obdclass] [ 8058.705983] [<ffffffffc0ba2c4e>] llog_osd_open+0x50e/0xf30 [obdclass] [ 8058.707231] [<ffffffffc0b8f08f>] llog_open+0x25f/0x400 [obdclass] [ 8058.708380] [<ffffffffc0edb5b6>] llog_origin_handle_read_header+0x1b6/0x630 [ptlrpc] [ 8058.710027] [<ffffffffc0f25ca3>] tgt_llog_read_header+0x33/0xe0 [ptlrpc] [ 8058.711367] [<ffffffffc0f2f68a>] tgt_request_handle+0x95a/0x1610 [ptlrpc] [ 8058.712594] [<ffffffffc0ed1066>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [ 8058.714032] [<ffffffffc0ed5464>] ptlrpc_main+0xbb4/0x1550 [ptlrpc] [ 8058.715207] [<ffffffffa32c61f1>] kthread+0xd1/0xe0

          Alex, I don't see how qsd_op_begin() pins qsd at memory during using. Only small checks about !=NULL and qsd_started, even qsd_stopping is missed. 

          aboyko Alexander Boyko added a comment - Alex, I don't see how qsd_op_begin() pins qsd at memory during using. Only small checks about !=NULL and qsd_started, even qsd_stopping is missed. 

          thanks!

          bzzz Alex Zhuravlev added a comment - thanks!

          I've pushed the regression test for issue if it is OK I rebase it to the fix.

          aboyko Alexander Boyko added a comment - I've pushed the regression test for issue if it is OK I rebase it to the fix.

          Alexandr Boyko (c17825@cray.com) uploaded a new patch: https://review.whamcloud.com/37635
          Subject: LU-10395 tests: add test_280 sanity
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 997da4924fc37125bbf5fbae2c12c186bb05a284

          gerrit Gerrit Updater added a comment - Alexandr Boyko (c17825@cray.com) uploaded a new patch: https://review.whamcloud.com/37635 Subject: LU-10395 tests: add test_280 sanity Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 997da4924fc37125bbf5fbae2c12c186bb05a284

          People

            bzzz Alex Zhuravlev
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: