Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2818

Failure on test suite parallel-scale-nfsv4 test_compilebench: (mdt_lvb.c:126:mdt_lvbo_fill()) ASSERTION( rc == 0 ) failed

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.6.0, Lustre 2.5.1
    • Lustre 2.4.0, Lustre 2.5.0
    • None
    • 3
    • 6822

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/25d39028-755e-11e2-bf59-52540035b04c.

      The sub-test test_compilebench failed with the following error:

      compilebench failed: 1

      MDS console:

      12:44:26:Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test compilebench: compilebench ============================================== 12:44:25 (1360701865)
      12:44:27:Lustre: DEBUG MARKER: /usr/sbin/lctl mark .\/compilebench -D \/mnt\/lustre\/d0.compilebench -i 2         -r 2 --makej
      12:44:27:Lustre: DEBUG MARKER: ./compilebench -D /mnt/lustre/d0.compilebench -i 2 -r 2 --makej
      13:06:42:mdt00_003: page allocation failure. order:5, mode:0x50
      13:06:42:Pid: 22716, comm: mdt00_003 Not tainted 2.6.32-279.19.1.el6_lustre.x86_64 #1
      13:06:42:Call Trace:
      13:06:42: [<ffffffff811231ff>] ? __alloc_pages_nodemask+0x77f/0x940
      13:06:42: [<ffffffff8115d1a2>] ? kmem_getpages+0x62/0x170
      13:06:42: [<ffffffff8115ddba>] ? fallback_alloc+0x1ba/0x270
      13:06:42: [<ffffffff8115d80f>] ? cache_grow+0x2cf/0x320
      13:06:42: [<ffffffff8115db39>] ? ____cache_alloc_node+0x99/0x160
      13:06:42: [<ffffffffa04d8b60>] ? cfs_alloc+0x30/0x60 [libcfs]
      13:06:42: [<ffffffff8115e909>] ? __kmalloc+0x189/0x220
      13:06:42: [<ffffffffa04d8b60>] ? cfs_alloc+0x30/0x60 [libcfs]
      13:06:42: [<ffffffffa0d3b69e>] ? osd_key_init+0x1e/0x670 [osd_ldiskfs]
      13:06:42: [<ffffffffa066622f>] ? keys_fill+0x6f/0x190 [obdclass]
      13:06:42: [<ffffffffa0669e6b>] ? lu_context_init+0xab/0x260 [obdclass]
      13:06:42: [<ffffffffa066a03e>] ? lu_env_init+0x1e/0x30 [obdclass]
      13:06:42: [<ffffffffa0e775ff>] ? mdt_lvbo_fill+0x1af/0x800 [mdt]
      13:06:42: [<ffffffffa0e77450>] ? mdt_lvbo_fill+0x0/0x800 [mdt]
      13:06:42: [<ffffffffa07c6591>] ? ldlm_handle_enqueue0+0x641/0x1080 [ptlrpc]
      13:06:42: [<ffffffffa0e4e376>] ? mdt_enqueue+0x46/0x110 [mdt]
      13:06:42: [<ffffffffa0e42fb8>] ? mdt_handle_common+0x628/0x1620 [mdt]
      13:06:42: [<ffffffffa0e7a5c5>] ? mds_regular_handle+0x15/0x20 [mdt]
      13:06:42: [<ffffffffa07f800c>] ? ptlrpc_server_handle_request+0x41c/0xdf0 [ptlrpc]
      13:06:42: [<ffffffffa04d85de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      13:06:42: [<ffffffffa07ef739>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      13:06:42: [<ffffffff81052223>] ? __wake_up+0x53/0x70
      13:06:42: [<ffffffffa07f9556>] ? ptlrpc_main+0xb76/0x1870 [ptlrpc]
      13:06:42: [<ffffffffa07f89e0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
      13:06:42: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
      13:06:42: [<ffffffffa07f89e0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
      13:06:42: [<ffffffffa07f89e0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
      13:06:42: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      13:06:42:Mem-Info:
      13:06:42:Node 0 DMA per-cpu:
      13:06:42:CPU    0: hi:    0, btch:   1 usd:   0
      13:06:42:Node 0 DMA32 per-cpu:
      13:06:42:CPU    0: hi:  186, btch:  31 usd: 152
      13:06:42:active_anon:4078 inactive_anon:1377 isolated_anon:0
      13:06:42: active_file:109040 inactive_file:162532 isolated_file:0
      13:06:42: unevictable:0 dirty:14064 writeback:512 unstable:0
      13:06:42: free:14243 slab_reclaimable:32585 slab_unreclaimable:116609
      13:06:42: mapped:3106 shmem:41 pagetables:752 bounce:0
      13:06:42:Node 0 DMA free:8348kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:184kB inactive_file:3376kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15324kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:556kB slab_unreclaimable:3268kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      13:06:42:lowmem_reserve[]: 0 2003 2003 2003
      13:06:42:Node 0 DMA32 free:48624kB min:44720kB low:55900kB high:67080kB active_anon:16312kB inactive_anon:5508kB active_file:435976kB inactive_file:646752kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052064kB mlocked:0kB dirty:56256kB writeback:2048kB mapped:12424kB shmem:164kB slab_reclaimable:129784kB slab_unreclaimable:463168kB kernel_stack:1960kB pagetables:3008kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      13:06:42:lowmem_reserve[]: 0 0 0 0
      13:06:42:Node 0 DMA: 33*4kB 5*8kB 3*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 3*2048kB 0*4096kB = 8348kB
      13:06:42:Node 0 DMA32: 5780*4kB 2228*8kB 4*16kB 28*32kB 3*64kB 3*128kB 2*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 48624kB
      13:06:42:195649 total pagecache pages
      13:06:42:0 pages in swap cache
      13:06:42:Swap cache stats: add 0, delete 0, find 0/0
      13:06:42:Free swap  = 4128760kB
      13:06:42:Total swap = 4128760kB
      13:06:42:524284 pages RAM
      13:06:42:43608 pages reserved
      13:06:42:207968 pages shared
      13:06:42:267301 pages non-shared
      13:06:42:LustreError: 22716:0:(mdt_lvb.c:126:mdt_lvbo_fill()) ASSERTION( rc == 0 ) failed: 
      13:06:42:LustreError: 22716:0:(mdt_lvb.c:126:mdt_lvbo_fill()) LBUG
      13:06:42:Pid: 22716, comm: mdt00_003
      13:06:42:
      13:06:42:Call Trace:
      13:06:42: [<ffffffffa04d7895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      13:06:42: [<ffffffffa04d7e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      13:06:42: [<ffffffffa0e77bd5>] mdt_lvbo_fill+0x785/0x800 [mdt]
      13:06:42: [<ffffffffa0e77450>] ? mdt_lvbo_fill+0x0/0x800 [mdt]
      13:06:42: [<ffffffffa07c6591>] ldlm_handle_enqueue0+0x641/0x1080 [ptlrpc]
      13:06:42: [<ffffffffa0e4e376>] mdt_enqueue+0x46/0x110 [mdt]
      13:06:43: [<ffffffffa0e42fb8>] mdt_handle_common+0x628/0x1620 [mdt]
      13:06:43: [<ffffffffa0e7a5c5>] mds_regular_handle+0x15/0x20 [mdt]
      13:06:43: [<ffffffffa07f800c>] ptlrpc_server_handle_request+0x41c/0xdf0 [ptlrpc]
      13:06:43: [<ffffffffa04d85de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      13:06:43: [<ffffffffa07ef739>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      13:06:43: [<ffffffff81052223>] ? __wake_up+0x53/0x70
      13:06:43: [<ffffffffa07f9556>] ptlrpc_main+0xb76/0x1870 [ptlrpc]
      13:06:43: [<ffffffffa07f89e0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
      13:06:43: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      13:06:43: [<ffffffffa07f89e0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
      13:06:43: [<ffffffffa07f89e0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
      13:06:43: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      13:06:43:
      13:06:43:Kernel panic - not syncing: LBUG
      

      Attachments

        Issue Links

          Activity

            [LU-2818] Failure on test suite parallel-scale-nfsv4 test_compilebench: (mdt_lvb.c:126:mdt_lvbo_fill()) ASSERTION( rc == 0 ) failed
            pjones Peter Jones added a comment -

            Landed for 2.6

            pjones Peter Jones added a comment - Landed for 2.6
            green Oleg Drokin added a comment -

            Because I am really tired of hittign this issue, here's a patch to at least making mdt not crash when it happens:
            http://review.whamcloud.com/8947

            green Oleg Drokin added a comment - Because I am really tired of hittign this issue, here's a patch to at least making mdt not crash when it happens: http://review.whamcloud.com/8947

            I think this can stop to happen once we start to fetch the layouts with explicit getxattr..

            bzzz Alex Zhuravlev added a comment - I think this can stop to happen once we start to fetch the layouts with explicit getxattr..
            green Oleg Drokin added a comment -

            I am now frequently hitting this again while running sanity in a loop, this didi not happen bfore in this sort of a test on same config node, so we probably need to take another look at this?

            <4>[13497.371686] mdt00_004: page allocation failure. order:1, mode:0x40
            <4>[13497.372009] Pid: 11831, comm: mdt00_004 Not tainted 2.6.32-rhe6.4-debug #2
            <4>[13497.372016] Call Trace:
            <4>[13497.372016]  [<ffffffff8112a666>] ? __alloc_pages_nodemask+0x7c6/0x980
            <4>[13497.372016]  [<ffffffff814fe0ee>] ? _spin_unlock+0xe/0x10
            <4>[13497.372016]  [<ffffffff811658f2>] ? kmem_getpages+0x62/0x170
            <4>[13497.372016]  [<ffffffff8116834a>] ? fallback_alloc+0x1ba/0x270
            <4>[13497.372016]  [<ffffffff81167bf7>] ? cache_grow+0x4d7/0x520
            <4>[13497.372016]  [<ffffffff81168038>] ? ____cache_alloc_node+0xa8/0x200
            <4>[13497.372016]  [<ffffffff81168943>] ? kmem_cache_alloc_trace+0x1c3/0x250
            <4>[13497.372016]  [<ffffffffa0b4f2a5>] ? osd_key_init+0x25/0x4e0 [osd_ldiskfs]
            <4>[13497.372016]  [<ffffffffa0b4f2a5>] ? osd_key_init+0x25/0x4e0 [osd_ldiskfs]
            <4>[13497.372016]  [<ffffffffa04f8f8f>] ? keys_fill+0x6f/0x190 [obdclass]
            <4>[13497.372016]  [<ffffffffa04fd00e>] ? lu_context_init+0x4e/0x240 [obdclass]
            <4>[13497.372016]  [<ffffffffa04fd063>] ? lu_context_init+0xa3/0x240 [obdclass]
            <4>[13497.372016]  [<ffffffffa04fd21e>] ? lu_env_init+0x1e/0x30 [obdclass]
            <4>[13497.372016]  [<ffffffffa0bf131b>] ? mdt_lvbo_fill+0x1ab/0x840 [mdt]
            <4>[13497.372016]  [<ffffffffa0bf1170>] ? mdt_lvbo_fill+0x0/0x840 [mdt]
            <4>[13497.372016]  [<ffffffffa0664033>] ? ldlm_handle_enqueue0+0x623/0x10c0 [ptlrpc]
            <4>[13497.372016]  [<ffffffffa0bb6a96>] ? mdt_enqueue+0x46/0xe0 [mdt]
            <4>[13497.372016]  [<ffffffffa0bbcc57>] ? mdt_handle_common+0x647/0x16d0 [mdt]
            <4>[13497.372016]  [<ffffffffa0bf54a5>] ? mds_regular_handle+0x15/0x20 [mdt]
            <4>[13497.372016]  [<ffffffffa0693898>] ? ptlrpc_server_handle_request+0x3a8/0xc70 [ptlrpc]
            <4>[13497.372016]  [<ffffffffa0a2155e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
            <4>[13497.372016]  [<ffffffffa0a32b9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
            <4>[13497.372016]  [<ffffffffa068afb1>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
            <4>[13497.372016]  [<ffffffff81054613>] ? __wake_up+0x53/0x70
            <4>[13497.372016]  [<ffffffffa0694b82>] ? ptlrpc_main+0xa22/0x1650 [ptlrpc]
            <4>[13497.372016]  [<ffffffffa0694160>] ? ptlrpc_main+0x0/0x1650 [ptlrpc]
            <4>[13497.372016]  [<ffffffff81094606>] ? kthread+0x96/0xa0
            <4>[13497.372016]  [<ffffffff8100c10a>] ? child_rip+0xa/0x20
            <4>[13497.372016]  [<ffffffff81094570>] ? kthread+0x0/0xa0
            <4>[13497.372016]  [<ffffffff8100c100>] ? child_rip+0x0/0x20
            <6>[13497.372016] Mem-Info:
            <4>[13497.372016] Node 0 DMA per-cpu:
            <4>[13497.372016] CPU    0: hi:    0, btch:   1 usd:   0
            <4>[13497.385289] CPU    1: hi:    0, btch:   1 usd:   0
            <4>[13497.385289] CPU    2: hi:    0, btch:   1 usd:   0
            <4>[13497.385289] CPU    3: hi:    0, btch:   1 usd:   0
            <4>[13497.385289] Node 0 DMA32 per-cpu:
            <4>[13497.385289] CPU    0: hi:  186, btch:  31 usd:  26
            <4>[13497.385289] CPU    1: hi:  186, btch:  31 usd:  75
            <4>[13497.385289] CPU    2: hi:  186, btch:  31 usd: 177
            <4>[13497.385289] CPU    3: hi:  186, btch:  31 usd: 170
            <4>[13497.385289] active_anon:83027 inactive_anon:83006 isolated_anon:0
            <4>[13497.385289]  active_file:31104 inactive_file:29174 isolated_file:0
            <4>[13497.385289]  unevictable:0 dirty:59 writeback:55 unstable:0
            <4>[13497.385289]  free:13805 slab_reclaimable:79792 slab_unreclaimable:363470
            <4>[13497.385289]  mapped:669 shmem:163871 pagetables:341 bounce:0
            <4>[13497.388538] Node 0 DMA free:9700kB min:136kB low:168kB high:204kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:9312kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
            <4>[13497.389526] lowmem_reserve[]: 0 2967 2967 2967
            <4>[13497.389526] Node 0 DMA32 free:45520kB min:44916kB low:56144kB high:67372kB active_anon:332000kB inactive_anon:332388kB active_file:124252kB inactive_file:116696kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3039080kB mlocked:0kB dirty:236kB writeback:220kB mapped:2676kB shmem:655484kB slab_reclaimable:319168kB slab_unreclaimable:1453880kB kernel_stack:2640kB pagetables:1364kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:861 all_unreclaimable? no
            <4>[13497.389526] lowmem_reserve[]: 0 0 0 0
            <4>[13497.389526] Node 0 DMA: 3*4kB 1*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 2*4096kB = 9700kB
            <4>[13497.389526] Node 0 DMA32: 10258*4kB 18*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 45272kB
            <4>[13497.389526] 194553 total pagecache pages
            <4>[13497.389526] 0 pages in swap cache
            <4>[13497.389526] Swap cache stats: add 1550, delete 1550, find 434/448
            <4>[13497.389526] Free swap  = 2091392kB
            <4>[13497.389526] Total swap = 2097144kB
            <6>[13497.389526] 774397 pages RAM
            <6>[13497.389526] 38457 pages reserved
            <6>[13497.389526] 19663 pages shared
            <6>[13497.389526] 697781 pages non-shared
            <0>[13497.402539] LustreError: 11831:0:(mdt_lvb.c:125:mdt_lvbo_fill()) ASSERTION( rc == 0 ) failed: 
            <0>[13497.403365] LustreError: 11831:0:(mdt_lvb.c:125:mdt_lvbo_fill()) LBUG
            <4>[13497.403820] Pid: 11831, comm: mdt00_004
            <4>[13497.404356] 
            <4>[13497.404356] Call Trace:
            <4>[13497.405032]  [<ffffffffa0a208a5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
            <4>[13497.405510]  [<ffffffffa0a20ea7>] lbug_with_loc+0x47/0xb0 [libcfs]
            <4>[13497.405979]  [<ffffffffa0bf18d4>] mdt_lvbo_fill+0x764/0x840 [mdt]
            <4>[13497.406426]  [<ffffffffa0bf1170>] ? mdt_lvbo_fill+0x0/0x840 [mdt]
            <4>[13497.406735]  [<ffffffffa0664033>] ldlm_handle_enqueue0+0x623/0x10c0 [ptlrpc]
            <4>[13497.407043]  [<ffffffffa0bb6a96>] mdt_enqueue+0x46/0xe0 [mdt]
            <4>[13497.407317]  [<ffffffffa0bbcc57>] mdt_handle_common+0x647/0x16d0 [mdt]
            <4>[13497.407604]  [<ffffffffa0bf54a5>] mds_regular_handle+0x15/0x20 [mdt]
            <4>[13497.407908]  [<ffffffffa0693898>] ptlrpc_server_handle_request+0x3a8/0xc70 [ptlrpc]
            <4>[13497.408502]  [<ffffffffa0a2155e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
            <4>[13497.408814]  [<ffffffffa0a32b9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
            <4>[13497.409138]  [<ffffffffa068afb1>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
            <4>[13497.409425]  [<ffffffff81054613>] ? __wake_up+0x53/0x70
            <4>[13497.409706]  [<ffffffffa0694b82>] ptlrpc_main+0xa22/0x1650 [ptlrpc]
            <4>[13497.410010]  [<ffffffffa0694160>] ? ptlrpc_main+0x0/0x1650 [ptlrpc]
            <4>[13497.410288]  [<ffffffff81094606>] kthread+0x96/0xa0
            <4>[13497.410537]  [<ffffffff8100c10a>] child_rip+0xa/0x20
            <4>[13497.410790]  [<ffffffff81094570>] ? kthread+0x0/0xa0
            <4>[13497.411045]  [<ffffffff8100c100>] ? child_rip+0x0/0x20
            <4>[13497.411299] 
            <0>[13497.681303] Kernel panic - not syncing: LBUG
            
            green Oleg Drokin added a comment - I am now frequently hitting this again while running sanity in a loop, this didi not happen bfore in this sort of a test on same config node, so we probably need to take another look at this? <4>[13497.371686] mdt00_004: page allocation failure. order:1, mode:0x40 <4>[13497.372009] Pid: 11831, comm: mdt00_004 Not tainted 2.6.32-rhe6.4-debug #2 <4>[13497.372016] Call Trace: <4>[13497.372016] [<ffffffff8112a666>] ? __alloc_pages_nodemask+0x7c6/0x980 <4>[13497.372016] [<ffffffff814fe0ee>] ? _spin_unlock+0xe/0x10 <4>[13497.372016] [<ffffffff811658f2>] ? kmem_getpages+0x62/0x170 <4>[13497.372016] [<ffffffff8116834a>] ? fallback_alloc+0x1ba/0x270 <4>[13497.372016] [<ffffffff81167bf7>] ? cache_grow+0x4d7/0x520 <4>[13497.372016] [<ffffffff81168038>] ? ____cache_alloc_node+0xa8/0x200 <4>[13497.372016] [<ffffffff81168943>] ? kmem_cache_alloc_trace+0x1c3/0x250 <4>[13497.372016] [<ffffffffa0b4f2a5>] ? osd_key_init+0x25/0x4e0 [osd_ldiskfs] <4>[13497.372016] [<ffffffffa0b4f2a5>] ? osd_key_init+0x25/0x4e0 [osd_ldiskfs] <4>[13497.372016] [<ffffffffa04f8f8f>] ? keys_fill+0x6f/0x190 [obdclass] <4>[13497.372016] [<ffffffffa04fd00e>] ? lu_context_init+0x4e/0x240 [obdclass] <4>[13497.372016] [<ffffffffa04fd063>] ? lu_context_init+0xa3/0x240 [obdclass] <4>[13497.372016] [<ffffffffa04fd21e>] ? lu_env_init+0x1e/0x30 [obdclass] <4>[13497.372016] [<ffffffffa0bf131b>] ? mdt_lvbo_fill+0x1ab/0x840 [mdt] <4>[13497.372016] [<ffffffffa0bf1170>] ? mdt_lvbo_fill+0x0/0x840 [mdt] <4>[13497.372016] [<ffffffffa0664033>] ? ldlm_handle_enqueue0+0x623/0x10c0 [ptlrpc] <4>[13497.372016] [<ffffffffa0bb6a96>] ? mdt_enqueue+0x46/0xe0 [mdt] <4>[13497.372016] [<ffffffffa0bbcc57>] ? mdt_handle_common+0x647/0x16d0 [mdt] <4>[13497.372016] [<ffffffffa0bf54a5>] ? mds_regular_handle+0x15/0x20 [mdt] <4>[13497.372016] [<ffffffffa0693898>] ? ptlrpc_server_handle_request+0x3a8/0xc70 [ptlrpc] <4>[13497.372016] [<ffffffffa0a2155e>] ? cfs_timer_arm+0xe/0x10 [libcfs] <4>[13497.372016] [<ffffffffa0a32b9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] <4>[13497.372016] [<ffffffffa068afb1>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc] <4>[13497.372016] [<ffffffff81054613>] ? __wake_up+0x53/0x70 <4>[13497.372016] [<ffffffffa0694b82>] ? ptlrpc_main+0xa22/0x1650 [ptlrpc] <4>[13497.372016] [<ffffffffa0694160>] ? ptlrpc_main+0x0/0x1650 [ptlrpc] <4>[13497.372016] [<ffffffff81094606>] ? kthread+0x96/0xa0 <4>[13497.372016] [<ffffffff8100c10a>] ? child_rip+0xa/0x20 <4>[13497.372016] [<ffffffff81094570>] ? kthread+0x0/0xa0 <4>[13497.372016] [<ffffffff8100c100>] ? child_rip+0x0/0x20 <6>[13497.372016] Mem-Info: <4>[13497.372016] Node 0 DMA per-cpu: <4>[13497.372016] CPU 0: hi: 0, btch: 1 usd: 0 <4>[13497.385289] CPU 1: hi: 0, btch: 1 usd: 0 <4>[13497.385289] CPU 2: hi: 0, btch: 1 usd: 0 <4>[13497.385289] CPU 3: hi: 0, btch: 1 usd: 0 <4>[13497.385289] Node 0 DMA32 per-cpu: <4>[13497.385289] CPU 0: hi: 186, btch: 31 usd: 26 <4>[13497.385289] CPU 1: hi: 186, btch: 31 usd: 75 <4>[13497.385289] CPU 2: hi: 186, btch: 31 usd: 177 <4>[13497.385289] CPU 3: hi: 186, btch: 31 usd: 170 <4>[13497.385289] active_anon:83027 inactive_anon:83006 isolated_anon:0 <4>[13497.385289] active_file:31104 inactive_file:29174 isolated_file:0 <4>[13497.385289] unevictable:0 dirty:59 writeback:55 unstable:0 <4>[13497.385289] free:13805 slab_reclaimable:79792 slab_unreclaimable:363470 <4>[13497.385289] mapped:669 shmem:163871 pagetables:341 bounce:0 <4>[13497.388538] Node 0 DMA free:9700kB min:136kB low:168kB high:204kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:9312kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes <4>[13497.389526] lowmem_reserve[]: 0 2967 2967 2967 <4>[13497.389526] Node 0 DMA32 free:45520kB min:44916kB low:56144kB high:67372kB active_anon:332000kB inactive_anon:332388kB active_file:124252kB inactive_file:116696kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3039080kB mlocked:0kB dirty:236kB writeback:220kB mapped:2676kB shmem:655484kB slab_reclaimable:319168kB slab_unreclaimable:1453880kB kernel_stack:2640kB pagetables:1364kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:861 all_unreclaimable? no <4>[13497.389526] lowmem_reserve[]: 0 0 0 0 <4>[13497.389526] Node 0 DMA: 3*4kB 1*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 2*4096kB = 9700kB <4>[13497.389526] Node 0 DMA32: 10258*4kB 18*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 45272kB <4>[13497.389526] 194553 total pagecache pages <4>[13497.389526] 0 pages in swap cache <4>[13497.389526] Swap cache stats: add 1550, delete 1550, find 434/448 <4>[13497.389526] Free swap = 2091392kB <4>[13497.389526] Total swap = 2097144kB <6>[13497.389526] 774397 pages RAM <6>[13497.389526] 38457 pages reserved <6>[13497.389526] 19663 pages shared <6>[13497.389526] 697781 pages non-shared <0>[13497.402539] LustreError: 11831:0:(mdt_lvb.c:125:mdt_lvbo_fill()) ASSERTION( rc == 0 ) failed: <0>[13497.403365] LustreError: 11831:0:(mdt_lvb.c:125:mdt_lvbo_fill()) LBUG <4>[13497.403820] Pid: 11831, comm: mdt00_004 <4>[13497.404356] <4>[13497.404356] Call Trace: <4>[13497.405032] [<ffffffffa0a208a5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4>[13497.405510] [<ffffffffa0a20ea7>] lbug_with_loc+0x47/0xb0 [libcfs] <4>[13497.405979] [<ffffffffa0bf18d4>] mdt_lvbo_fill+0x764/0x840 [mdt] <4>[13497.406426] [<ffffffffa0bf1170>] ? mdt_lvbo_fill+0x0/0x840 [mdt] <4>[13497.406735] [<ffffffffa0664033>] ldlm_handle_enqueue0+0x623/0x10c0 [ptlrpc] <4>[13497.407043] [<ffffffffa0bb6a96>] mdt_enqueue+0x46/0xe0 [mdt] <4>[13497.407317] [<ffffffffa0bbcc57>] mdt_handle_common+0x647/0x16d0 [mdt] <4>[13497.407604] [<ffffffffa0bf54a5>] mds_regular_handle+0x15/0x20 [mdt] <4>[13497.407908] [<ffffffffa0693898>] ptlrpc_server_handle_request+0x3a8/0xc70 [ptlrpc] <4>[13497.408502] [<ffffffffa0a2155e>] ? cfs_timer_arm+0xe/0x10 [libcfs] <4>[13497.408814] [<ffffffffa0a32b9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] <4>[13497.409138] [<ffffffffa068afb1>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc] <4>[13497.409425] [<ffffffff81054613>] ? __wake_up+0x53/0x70 <4>[13497.409706] [<ffffffffa0694b82>] ptlrpc_main+0xa22/0x1650 [ptlrpc] <4>[13497.410010] [<ffffffffa0694160>] ? ptlrpc_main+0x0/0x1650 [ptlrpc] <4>[13497.410288] [<ffffffff81094606>] kthread+0x96/0xa0 <4>[13497.410537] [<ffffffff8100c10a>] child_rip+0xa/0x20 <4>[13497.410790] [<ffffffff81094570>] ? kthread+0x0/0xa0 <4>[13497.411045] [<ffffffff8100c100>] ? child_rip+0x0/0x20 <4>[13497.411299] <0>[13497.681303] Kernel panic - not syncing: LBUG

            the patch passing the "env" into ldlm module has been pushed to Gerrit.

            hongchao.zhang Hongchao Zhang added a comment - the patch passing the "env" into ldlm module has been pushed to Gerrit.

            we're not allocating large chunks for osd_thread_info anymore, so this isn't a blocker already.

            bzzz Alex Zhuravlev added a comment - we're not allocating large chunks for osd_thread_info anymore, so this isn't a blocker already.

            status update:

            the patch which adding "lu_env" to "LDLM" modules is not ready yet for its complexity, and it affects lots of funtions.
            it's better to test&merge the patches in LU-2748, then this ticket can be dropped as a blocker.

            hongchao.zhang Hongchao Zhang added a comment - status update: the patch which adding "lu_env" to "LDLM" modules is not ready yet for its complexity, and it affects lots of funtions. it's better to test&merge the patches in LU-2748 , then this ticket can be dropped as a blocker.

            the additional patch is tracked at http://review.whamcloud.com/#change,5497

            hongchao.zhang Hongchao Zhang added a comment - the additional patch is tracked at http://review.whamcloud.com/#change,5497

            this bug can be mitigated greatly by the patches(http://review.whamcloud.com/#change,5323, http://review.whamcloud.com/#change,5444) in LU-2748.
            and I am also creating a patch to let "mdt_lvbo_fill" to find the corresponding "lu_env" in the current ptlrpc_thread to eliminate the need to
            allocate a new "lu_env".

            hongchao.zhang Hongchao Zhang added a comment - this bug can be mitigated greatly by the patches( http://review.whamcloud.com/#change,5323 , http://review.whamcloud.com/#change,5444 ) in LU-2748 . and I am also creating a patch to let "mdt_lvbo_fill" to find the corresponding "lu_env" in the current ptlrpc_thread to eliminate the need to allocate a new "lu_env".

            People

              hongchao.zhang Hongchao Zhang
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: