Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4290

osp_sync_threads encounters EIO on mount

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.6.0
    • Lustre 2.4.1
    • None
    • RHEL 6.4/distro IB
    • 2
    • 11773

    Description

      We encountered this assertion in production, libcfs_panic_on_lbug was set to 1, so server rebooted. On mount, the same assertion and lbug would occur. Filesystem will mount with panic_on_lbug set to 0. We've captured a crash dump and lustre log messages with the debug flags:

      [root@atlas-mds3 ~]# cat /proc/sys/lnet/debug
      trace ioctl neterror warning other error emerg ha config console

      Ran e2fsck:
      e2fsck -f -j /dev/mapper/atlas2-mdt1-journal /dev/mapper/atlas2-mdt1

      and only fixed the quota inconsistencies it found.

      At the moment, we are back to production after the osp_sync_threads lbugs on mount. There are hung task messages about osp_sync_threads as would be expected. We want to fix the root issue that is causing the assertions.

      kernel messages during one of the failed mounts
      Nov 21 21:16:44 atlas-mds3 kernel: [ 911.319839] LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. quota=on. Opts:
      Nov 21 21:16:44 atlas-mds3 kernel: [ 911.986208] Lustre: mdt_num_threads module parameter is deprecated, use mds_num_threads instead or unset both for dynamic thread startup
      Nov 21 21:16:46 atlas-mds3 kernel: [ 913.069371] Lustre: atlas2-MDT0000: used disk, loading
      Nov 21 21:16:47 atlas-mds3 kernel: [ 914.261572] LustreError: 18945:0:(osp_sync.c:862:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 0 in progress, 0 in flight: -5
      Nov 21 21:16:47 atlas-mds3 kernel: [ 914.278318] LustreError: 18945:0:(osp_sync.c:862:osp_sync_thread()) LBUG
      Nov 21 21:16:47 atlas-mds3 kernel: [ 914.286036] Pid: 18945, comm: osp-syn-256
      Nov 21 21:16:47 atlas-mds3 kernel: [ 914.290841]
      Nov 21 21:16:47 atlas-mds3 kernel: [ 914.290844] Call Trace:

      We also see this message:
      Nov 21 23:01:01 atlas-mds3 kernel: [ 1512.633528] ERST: NVRAM ERST Log Address Range is not implemented yet

      Attachments

        1. 6305.llog_out
          73 kB
        2. 6306.llog_out
          73 kB
        3. 6307.llog_out
          73 kB
        4. 6308.llog_out
          73 kB
        5. lustre-log.1385095225.19969.gz
          38 kB
        6. lustre-log.1385095225.19971.gz
          7 kB
        7. lustre-log.1385095225.19973.gz
          2.88 MB
        8. lustre-log.1385095225.19975.gz
          5 kB

        Activity

          People

            bzzz Alex Zhuravlev
            blakecaldwell Blake Caldwell
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: