Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2983

ASSERTION in osd_bufs_get_read()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.4.0
    • Lustre 2.4.0
    • ZFS OSDs
    • 1
    • 3
    • 7270

    Description

      Despite our parity checking hardware RAID on Grove we appear to have run in to a case where ZFS is getting bad block data from disk. The root cause for this still isn't clear and we're looking in to it.

      However, it clearly exposed that right now the ZFS OSD doesn't even try to handle IO errors on read from the DMU. Lustre hit the following assertion when ZFS returned the IO error. We need to update osd_bufs_get_read() to handle the error and return it up the stack.

      <ConMan> Console [grove250] log at 2013-03-17 23:00:00 PDT.
      2013-03-17 23:50:10 LustreError: 7462:0:(osd_io.c:276:osd_bufs_get_read()) ASSERTION( rc == 0 ) failed: 
      2013-03-17 23:50:10 LustreError: 7462:0:(osd_io.c:276:osd_bufs_get_read()) LBUG
      2013-03-17 23:50:10 Pid: 7462, comm: ll_ost_io00_060
      2013-03-17 23:50:10
      2013-03-17 23:50:10 Call Trace:
      2013-03-17 23:50:10  [<ffffffffa0346965>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      2013-03-17 23:50:10  [<ffffffffa0346f77>] lbug_with_loc+0x47/0xb0 [libcfs]
      2013-03-17 23:50:10  [<ffffffffa0d36796>] osd_bufs_get+0x996/0xa10 [osd_zfs]
      2013-03-17 23:50:10  [<ffffffffa06cc386>] ? lu_object_find+0x16/0x20 [obdclass]
      2013-03-17 23:50:10  [<ffffffffa0dd540f>] ofd_preprw_read+0x13f/0x850 [ofd]
      2013-03-17 23:50:10  [<ffffffffa0dd6073>] ofd_preprw+0x553/0x12b0 [ofd]
      2013-03-17 23:50:10  [<ffffffffa0d9030c>] obd_preprw+0x12c/0x3d0 [ost]
      2013-03-17 23:50:10  [<ffffffffa0d95af4>] ost_brw_read+0xd14/0x12f0 [ost]
      2013-03-17 23:50:10  [<ffffffff8126c489>] ? cpumask_next_and+0x29/0x50
      2013-03-17 23:50:10  [<ffffffff810551d4>] ? find_busiest_group+0x244/0x9f0
      2013-03-17 23:50:10  [<ffffffffa085d52c>] ? lustre_msg_get_version+0x8c/0x100 [ptlrpc]
      2013-03-17 23:50:10  [<ffffffffa085d688>] ? lustre_msg_check_version+0xe8/0x100 [ptlrpc]
      2013-03-17 23:50:10  [<ffffffffa0d9c658>] ost_handle+0x2a68/0x46a0 [ost]
      2013-03-17 23:50:10  [<ffffffffa0864c2b>] ? ptlrpc_update_export_timer+0x4b/0x470 [ptlrpc]
      2013-03-17 23:50:10  [<ffffffffa086d08c>] ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
      2013-03-17 23:50:10  [<ffffffffa03476be>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      2013-03-17 23:50:10  [<ffffffffa035914f>] ? lc_watchdog_touch+0x6f/0x180 [libcfs]
      2013-03-17 23:50:10  [<ffffffffa0864459>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      2013-03-17 23:50:10  [<ffffffff81051ba3>] ? __wake_up+0x53/0x70
      2013-03-17 23:50:10  [<ffffffffa086e625>] ptlrpc_main+0xbb5/0x1970 [ptlrpc]
      2013-03-17 23:50:10  [<ffffffffa086da70>] ? ptlrpc_main+0x0/0x1970 [ptlrpc]
      2013-03-17 23:50:10  [<ffffffff8100c14a>] child_rip+0xa/0x20
      2013-03-17 23:50:10  [<ffffffffa086da70>] ? ptlrpc_main+0x0/0x1970 [ptlrpc]
      2013-03-17 23:50:10  [<ffffffffa086da70>] ? ptlrpc_main+0x0/0x1970 [ptlrpc]
      2013-03-17 23:50:10  [<ffffffff8100c140>] ? child_rip+0x0/0x20
      

      Attachments

        Activity

          People

            bzzz Alex Zhuravlev
            behlendorf Brian Behlendorf
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: