Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3063

osp_sync.c:866:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 29 changes, 26 in progress, 7 in flight: -5

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.4.1, Lustre 2.5.0
    • Lustre 2.4.0
    • 2
    • 7462

    Description

      LBUG happens quite often when running racer under a memory pressure environment. For example, I'm using 2G memory vmware, and set up lustre with:

      OSTSIZE=$((512*1024)) REFORMAT=1 sh racer.sh

      It can hit LBUG as follows:

      ll_ost_io01_005: page allocation failure. order:4, mode:0x50
      Pid: 22344, comm: ll_ost_io01_005 Not tainted 2.6.32-279.19.1.el6.x86_64.debug #1
      Call Trace:
       [<ffffffff81139b2a>] ? __alloc_pages_nodemask+0x6aa/0xa20
       [<ffffffff81175dfe>] ? kmem_getpages+0x6e/0x170
       [<ffffffff8117884b>] ? fallback_alloc+0x1cb/0x2b0
       [<ffffffff811780a9>] ? cache_grow+0x4c9/0x530
       [<ffffffff8117852b>] ? ____cache_alloc_node+0xab/0x200
       [<ffffffff81179d08>] ? __kmalloc+0x288/0x330
       [<ffffffffa048ebb0>] ? cfs_alloc+0x30/0x60 [libcfs]
       [<ffffffffa048ebb0>] ? cfs_alloc+0x30/0x60 [libcfs]
       [<ffffffffa0e35ad8>] ? ost_io_thread_init+0x48/0x300 [ost]
       [<ffffffffa0888703>] ? ptlrpc_main+0xa3/0x1810 [ptlrpc]
       [<ffffffff810aebad>] ? trace_hardirqs_on+0xd/0x10
       [<ffffffff8151f690>] ? _spin_unlock_irq+0x30/0x40
       [<ffffffff8105960d>] ? finish_task_switch+0x7d/0x110
       [<ffffffff810595d8>] ? finish_task_switch+0x48/0x110
       [<ffffffff810097dc>] ? __switch_to+0x1ac/0x320
       [<ffffffffa0888660>] ? ptlrpc_main+0x0/0x1810 [ptlrpc]
       [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
       [<ffffffff8151f690>] ? _spin_unlock_irq+0x30/0x40
       [<ffffffff8100bb10>] ? restore_args+0x0/0x30
       [<ffffffffa0888660>] ? ptlrpc_main+0x0/0x1810 [ptlrpc]
       [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
      Mem-Info:
      Node 0 DMA per-cpu:
      CPU    0: hi:    0, btch:   1 usd:   0
      CPU    1: hi:    0, btch:   1 usd:   0
      CPU    2: hi:    0, btch:   1 usd:   0
      CPU    3: hi:    0, btch:   1 usd:   0
      Node 0 DMA32 per-cpu:
      CPU    0: hi:  186, btch:  31 usd:  40
      CPU    1: hi:  186, btch:  31 usd:   0
      CPU    2: hi:  186, btch:  31 usd:   0
      CPU    3: hi:  186, btch:  31 usd:  38
      active_anon:74087 inactive_anon:150981 isolated_anon:0
       active_file:14723 inactive_file:15191 isolated_file:5
       unevictable:0 dirty:1405 writeback:0 unstable:0
       free:57272 slab_reclaimable:6236 slab_unreclaimable:76201
       mapped:1475 shmem:214401 pagetables:2281 bounce:0
      Node 0 DMA free:8756kB min:332kB low:412kB high:496kB active_anon:656kB inactive_anon:1672kB active_file:68kB inactive_file:1488kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15300kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:2328kB slab_reclaimable:8kB slab_unreclaimable:3028kB kernel_stack:16kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 2004 2004 2004
      Node 0 DMA32 free:231280kB min:44720kB low:55900kB high:67080kB active_anon:298048kB inactive_anon:602252kB active_file:58824kB inactive_file:59020kB unevictable:0kB isolated(anon):0kB isolated(file):20kB present:2052192kB mlocked:0kB dirty:5804kB writeback:0kB mapped:5900kB shmem:855276kB slab_reclaimable:24936kB slab_unreclaimable:287616kB kernel_stack:4200kB pagetables:9860kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      Node 0 DMA: 203*4kB 104*8kB 24*16kB 6*32kB 5*64kB 3*128kB 3*256kB 2*512kB 2*1024kB 1*2048kB 0*4096kB = 8812kB
      Node 0 DMA32: 35286*4kB 9007*8kB 1989*16kB 87*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 251840kB
      244228 total pagecache pages
      0 pages in swap cache
      Swap cache stats: add 0, delete 0, find 0/0
      Free swap  = 0kB
      Total swap = 0kB
      524272 pages RAM
      77789 pages reserved
      65618 pages shared
      325369 pages non-shared
      LustreError: 47322:0:(vvp_io.c:1086:vvp_io_commit_write()) Write page 17112 of inode ffff880038147728 failed -28
      LustreError: 39799:0:(vvp_io.c:1086:vvp_io_commit_write()) Write page 1022 of inode ffff8800598402a8 failed -28
      LustreError: 44035:0:(vvp_io.c:1086:vvp_io_commit_write()) Write page 255 of inode ffff880046224268 failed -28
      LustreError: 46598:0:(vvp_io.c:1086:vvp_io_commit_write()) Write page 304 of inode ffff8800203b02a8 failed -28
      Buffer I/O error on device loop0, logical block 17163
      lost page write due to I/O error on loop0
      LustreError: 48203:0:(file.c:2707:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000401:0x6f4d:0x0] error: rc = -116
      Buffer I/O error on device loop0, logical block 29955
      lost page write due to I/O error on loop0
      Buffer I/O error on device loop0, logical block 29956
      lost page write due to I/O error on loop0
      Buffer I/O error on device loop0, logical block 29960
      lost page write due to I/O error on loop0
      Buffer I/O error on device loop0, logical block 12699
      lost page write due to I/O error on loop0
      Buffer I/O error on device loop0, logical block 12723
      lost page write due to I/O error on loop0
      Buffer I/O error on device loop0, logical block 12783
      lost page write due to I/O error on loop0
      Buffer I/O error on device loop0, logical block 15429
      lost page write due to I/O error on loop0
      Buffer I/O error on device loop0, logical block 15430
      lost page write due to I/O error on loop0
      LDISKFS-fs error (device loop0): ldiskfs_find_entry: reading directory #50271 offset 0
      Aborting journal on device loop0-8.
      LustreError: 4993:0:(file.c:158:ll_close_inode_openhandle()) inode 144115205272531443 mdc close failed: rc = -30
      LustreError: 4972:0:(llite_lib.c:1292:ll_md_setattr()) md_setattr fails: rc = -30
      LustreError: 3150:0:(osd_handler.c:635:osd_trans_commit_cb()) transaction @0xffff88005eef12a0 commit error: 2
      LDISKFS-fs error (device loop0): ldiskfs_journal_start_sb: Detected aborted journal
      LDISKFS-fs (loop0): Remounting filesystem read-only
      LustreError: 3150:0:(osd_handler.c:635:osd_trans_commit_cb()) transaction @0xffff880058789818 commit error: 2
      LustreError: 3498:0:(osd_io.c:997:osd_ldiskfs_read()) lustre-MDT0000: can't read 4096@172032 on ino 110: rc = -5
      LustreError: 3498:0:(llog_osd.c:562:llog_osd_next_block()) lustre-MDT0000-osd: can't read llog block from log [0x1:0xc:0x0] offset 172032: rc = -5
      LustreError: 3498:0:(osp_sync.c:866:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 29 changes, 26 in progress, 7 in flight: -5
      LustreError: 3498:0:(osp_sync.c:866:osp_sync_thread()) LBUG
      Pid: 3498, comm: osp-syn-1
      
      Call Trace:
       [<ffffffffa048d8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa048dec7>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa0ea4a73>] osp_sync_thread+0x783/0x800 [osp]
       [<ffffffff810595d8>] ? finish_task_switch+0x48/0x110
       [<ffffffff8151f690>] ? _spin_unlock_irq+0x30/0x40
       [<ffffffff8105960d>] ? finish_task_switch+0x7d/0x110
       [<ffffffff810595d8>] ? finish_task_switch+0x48/0x110
       [<ffffffffa0ea42f0>] ? osp_sync_thread+0x0/0x800 [osp]
      LustreError: 3496:0:(osd_io.c:997:osd_ldiskfs_read()) lustre-MDT0000: can't read 4096@180224 on ino 108: rc = -5
      LustreError: 3496:0:(llog_osd.c:562:llog_osd_next_block()) lustre-MDT0000-osd: can't read llog block from log [0x1:0xa:0x0] offset 180224: rc = -5
      LustreError: 3496:0:(osp_sync.c:866:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 14 changes, 30 in progress, 6 in flight: -5
      LustreError: 3496:0:(osp_sync.c:866:osp_sync_thread()) LBUG
      Pid: 3496, comm: osp-syn-0
      
      Call Trace:
       [<ffffffffa048d8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa048dec7>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa0ea4a73>] osp_sync_thread+0x783/0x800 [osp]
       [<ffffffff810595d8>] ? finish_task_switch+0x48/0x110
       [<ffffffff8151f690>] ? _spin_unlock_irq+0x30/0x40
       [<ffffffff8105960d>] ? finish_task_switch+0x7d/0x110
       [<ffffffff810595d8>] ? finish_task_switch+0x48/0x110
       [<ffffffffa0ea42f0>] ? osp_sync_thread+0x0/0x800 [osp]
       [<ffffffff8100c1ca>] child_rip+0xa/0x20
       [<ffffffff8151f690>] ? _spin_unlock_irq+0x30/0x40
       [<ffffffff8100bb10>] ? restore_args+0x0/0x30
       [<ffffffffa0ea42f0>] ? osp_sync_thread+0x0/0x800 [osp]
       [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
      
      LustreError: 50716:0:(file.c:158:ll_close_inode_openhandle()) inode 144115205255753869 mdc close failed: rc = -30
      LustreError: 50716:0:(file.c:158:ll_close_inode_openhandle()) Skipped 1 previous similar message
      LDISKFS-fs (loop0): Remounting filesystem read-only
      Kernel panic - not syncing: LBUG
      Pid: 3496, comm: osp-syn-0 Not tainted 2.6.32-279.19.1.el6.x86_64.debug #1
      Call Trace:
       [<ffffffff8151baba>] ? panic+0xa0/0x16d
       [<ffffffffa048df1b>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
       [<ffffffffa0ea4a73>] ? osp_sync_thread+0x783/0x800 [osp]
       [<ffffffff810595d8>] ? finish_task_switch+0x48/0x110
       [<ffffffff8151f690>] ? _spin_unlock_irq+0x30/0x40
       [<ffffffff8105960d>] ? finish_task_switch+0x7d/0x110
       [<ffffffff810595d8>] ? finish_task_switch+0x48/0x110
       [<ffffffffa0ea42f0>] ? osp_sync_thread+0x0/0x800 [osp]
       [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
       [<ffffffff8151f690>] ? _spin_unlock_irq+0x30/0x40
       [<ffffffff8100bb10>] ? restore_args+0x0/0x30
       [<ffffffffa0ea42f0>] ? osp_sync_thread+0x0/0x800 [osp]
       [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
      

      Attachments

        1. 3063-out-v3
          2.06 MB
          Keith Mannthey
        2. odd-hang-messages
          2.72 MB
          Keith Mannthey
        3. single-3063
          249 kB
          Keith Mannthey

        Issue Links

          Activity

            People

              keith Keith Mannthey (Inactive)
              jay Jinshan Xiong (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: