Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5670

replay-vbr test_4c: MDS OOM

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.7.0
    • None
    • client and server: lustre-master build #2659 RHEL6
    • 3
    • 15887

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/2529a74a-44b8-11e4-bb5a-5254006e85c2.

      The sub-test test_4c failed with the following error:

      test failed to respond and timed out
      
      16:39:32:Lustre: DEBUG MARKER: == replay-vbr test 4c: setattr of UID checks versions == 22:38:34 (1411598314)
      16:39:32:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdd.lustre-MDT0000.sync_permission=0
      16:39:32:Lustre: DEBUG MARKER: /usr/sbin/lctl set_param mdt.lustre-MDT0000.commit_on_sharing=0
      16:39:32:Lustre: DEBUG MARKER: sync; sync; sync
      16:39:32:Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno
      16:39:32:Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly
      16:39:32:Turning device dm-0 (0xfd00000) read-only
      16:39:32:Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000
      16:39:32:Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
      16:39:32:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
      16:39:32:Lustre: DEBUG MARKER: umount -d /mnt/mds1
      16:39:32:Removing read-only on unknown block (0xfd00000)
      16:39:32:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
      16:39:32:Lustre: DEBUG MARKER: hostname
      16:39:32:Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1
      16:39:32:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre   		                   /dev/lvm-Role_MDS/P1 /mnt/mds1
      16:39:32:LDISKFS-fs (dm-0): recovery complete
      16:39:32:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: 
      16:39:32:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
      16:39:32:Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 2>/dev/null
      16:39:32:Lustre: 25029:0:(qsd_reint.c:237:qsd_reint_index()) lustre-MDT0000: II_FL_NONUNQ is set on index transfer for fid [0x200000005:0x1011:0x0], it shouldn't be
      16:39:32:mdt_out00_001 invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0
      16:39:32:mdt_out00_001 cpuset=/ mems_allowed=0
      16:39:32:Pid: 24999, comm: mdt_out00_001 Not tainted 2.6.32-431.29.2.el6_lustre.g5d1aa14.x86_64 #1
      16:39:32:Call Trace:
      16:39:32: [<ffffffff810d07b1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
      16:39:32: [<ffffffff81122b80>] ? dump_header+0x90/0x1b0
      16:39:32: [<ffffffff8122894c>] ? security_real_capable_noaudit+0x3c/0x70
      16:39:32: [<ffffffff81123002>] ? oom_kill_process+0x82/0x2a0
      16:39:32: [<ffffffff81122efe>] ? select_bad_process+0x9e/0x120
      16:39:32: [<ffffffff81123440>] ? out_of_memory+0x220/0x3c0
      16:39:32: [<ffffffff8112fd5f>] ? __alloc_pages_nodemask+0x89f/0x8d0
      16:39:32: [<ffffffff8116e6d2>] ? kmem_getpages+0x62/0x170
      16:39:32: [<ffffffff8116f2ea>] ? fallback_alloc+0x1ba/0x270
      16:39:32: [<ffffffff8116ed3f>] ? cache_grow+0x2cf/0x320
      16:39:32: [<ffffffff8116f069>] ? ____cache_alloc_node+0x99/0x160
      16:39:32: [<ffffffff8124ce61>] ? __crypto_alloc_tfm+0x41/0x130
      16:39:32: [<ffffffff8116fe39>] ? __kmalloc+0x189/0x220
      16:39:32: [<ffffffff8124ce61>] ? __crypto_alloc_tfm+0x41/0x130
      16:39:32: [<ffffffff8124d6fa>] ? crypto_alloc_base+0x5a/0xb0
      16:39:32: [<ffffffffa048d107>] ? cfs_crypto_hash_alloc+0x77/0x290 [libcfs]
      16:39:32: [<ffffffff8116f069>] ? ____cache_alloc_node+0x99/0x160
      16:39:32: [<ffffffffa048d7e6>] ? cfs_crypto_hash_digest+0x66/0xf0 [libcfs]
      16:39:32: [<ffffffff8116febc>] ? __kmalloc+0x20c/0x220
      16:39:32: [<ffffffffa081f3f3>] ? lustre_msg_calc_cksum+0xd3/0x140 [ptlrpc]
      16:39:32: [<ffffffffa0858e11>] ? null_authorize+0xa1/0x100 [ptlrpc]
      16:39:32: [<ffffffffa0847ea6>] ? sptlrpc_svc_wrap_reply+0x56/0x1c0 [ptlrpc]
      16:39:32: [<ffffffffa08177ec>] ? ptlrpc_send_reply+0x1fc/0x7f0 [ptlrpc]
      16:39:32: [<ffffffffa082ee95>] ? ptlrpc_at_check_timed+0xcd5/0x1370 [ptlrpc]
      16:39:32: [<ffffffffa0826309>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
      16:39:32: [<ffffffffa0830818>] ? ptlrpc_main+0x12e8/0x1990 [ptlrpc]
      16:39:32: [<ffffffffa082f530>] ? ptlrpc_main+0x0/0x1990 [ptlrpc]
      16:39:32: [<ffffffff8109abf6>] ? kthread+0x96/0xa0
      16:39:32: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
      16:39:32: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
      16:39:32: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      16:39:32:Mem-Info:
      16:39:32:Node 0 DMA per-cpu:
      16:39:32:CPU    0: hi:    0, btch:   1 usd:   0
      16:39:32:CPU    1: hi:    0, btch:   1 usd:   0
      16:39:32:Node 0 DMA32 per-cpu:
      16:39:32:CPU    0: hi:  186, btch:  31 usd:  92
      16:39:32:CPU    1: hi:  186, btch:  31 usd: 184
      16:39:32:active_anon:13 inactive_anon:24 isolated_anon:0
      16:39:32: active_file:960 inactive_file:960 isolated_file:0
      16:39:32: unevictable:0 dirty:0 writeback:0 unstable:0
      16:39:32: free:13239 slab_reclaimable:1888 slab_unreclaimable:437400
      16:39:32: mapped:9 shmem:0 pagetables:461 bounce:0
      16:39:32:Node 0 DMA free:8336kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:7408kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      16:39:32:lowmem_reserve[]: 0 2004 2004 2004
      16:39:32:Node 0 DMA32 free:44620kB min:44720kB low:55900kB high:67080kB active_anon:52kB inactive_anon:96kB active_file:3840kB inactive_file:3840kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052308kB mlocked:0kB dirty:0kB writeback:0kB mapped:36kB shmem:0kB slab_reclaimable:7552kB slab_unreclaimable:1742192kB kernel_stack:1616kB pagetables:1844kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:32136 all_unreclaimable? yes
      16:39:32:lowmem_reserve[]: 0 0 0 0
      16:39:32:Node 0 DMA: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 2*2048kB 1*4096kB = 8336kB
      16:39:32:Node 0 DMA32: 769*4kB 675*8kB 441*16kB 271*32kB 119*64kB 26*128kB 13*256kB 2*512kB 1*1024kB 0*2048kB 1*4096kB = 44620kB
      16:39:32:161 total pagecache pages
      16:39:32:44 pages in swap cache
      16:39:32:Swap cache stats: add 6008, delete 5964, find 5102/5453
      16:39:32:Free swap  = 4117152kB
      16:39:32:Total swap = 4128764kB
      16:39:32:524284 pages RAM
      16:39:32:43695 pages reserved
      16:39:32:152 pages shared
      16:39:32:462130 pages non-shared
      16:39:32:[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
      16:39:32:[  356]     0   356     2672        0   1     -17         -1000 udevd
      16:39:32:[  983]     0   983    23298        1   1     -17         -1000 auditd
      16:39:32:[ 1174]    81  1174     6411        1   1       0             0 dbus-daemon
      16:39:32:[ 1189]     0  1189    53901        1   1       0             0 ypbind
      16:39:32:[ 1253]     0  1253     1020        0   0       0             0 acpid
      16:39:32:[ 1262]    68  1262    10467        1   0       0             0 hald
      16:39:32:[ 1263]     0  1263     5081        1   1       0             0 hald-runner
      16:39:32:[ 1295]     0  1295     5611        1   1       0             0 hald-addon-inpu
      16:39:32:[ 1302]    68  1302     4483        1   0       0             0 hald-addon-acpi
      16:39:32:[ 1341]     0  1341    26827        0   0       0             0 rpc.rquotad
      16:39:32:[ 1345]     0  1345     5414        0   0       0             0 rpc.mountd
      16:39:32:[ 1380]     0  1380     6291        1   0       0             0 rpc.idmapd
      16:39:32:[ 1411]   498  1411    57322        1   1       0             0 munged
      16:39:32:[ 1426]     0  1426    16653        0   0     -17         -1000 sshd
      16:39:32:[ 1434]     0  1434     5545        1   0       0             0 xinetd
      16:39:32:[ 1458]     0  1458    22317        0   1       0             0 sendmail
      16:39:32:[ 1466]    51  1466    20180        0   0       0             0 sendmail
      16:39:32:[ 1488]     0  1488    29325        1   1       0             0 crond
      16:39:32:[ 1499]     0  1499     5385        0   0       0             0 atd
      16:39:32:[ 1524]     0  1524     1020        1   1       0             0 agetty
      16:39:32:[ 1526]     0  1526     1016        1   1       0             0 mingetty
      16:39:32:[ 1528]     0  1528     1016        1   1       0             0 mingetty
      16:39:32:[ 1530]     0  1530     1016        1   1       0             0 mingetty
      16:39:32:[ 1532]     0  1532     1016        1   1       0             0 mingetty
      16:39:32:[ 1533]     0  1533     2671        0   1     -17         -1000 udevd
      16:39:32:[ 1535]     0  1535     2671        0   0     -17         -1000 udevd
      16:39:32:[ 1536]     0  1536     1016        1   0       0             0 mingetty
      16:39:32:[ 1538]     0  1538     1016        1   0       0             0 mingetty
      16:39:32:[ 2064]    38  2064     7686       14   1       0             0 ntpd
      16:39:32:Out of memory: Kill process 1174 (dbus-daemon) score 1 or sacrifice child
      16:39:32:Killed process 1174, UID 81, (dbus-daemon) total-vm:25644kB, anon-rss:0kB, file-rss:4kB
      16:39:32:init invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
      16:39:32:init cpuset=/ mems_allowed=0
      16:39:32:Pid: 1, comm: init Not tainted 2.6.32-431.29.2.el6_lustre.g5d1aa14.x86_64 #1
      16:39:32:Call Trace:
      16:39:32: [<ffffffff810d07b1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
      16:39:32: [<ffffffff81122b80>] ? dump_header+0x90/0x1b0
      16:39:32: [<ffffffff8122894c>] ? security_real_capable_noaudit+0x3c/0x70
      16:39:32: [<ffffffff81123002>] ? oom_kill_process+0x82/0x2a0
      16:39:32: [<ffffffff81122efe>] ? select_bad_process+0x9e/0x120
      16:39:32: [<ffffffff81123440>] ? out_of_memory+0x220/0x3c0
      16:39:32: [<ffffffff8112fd5f>] ? __alloc_pages_nodemask+0x89f/0x8d0
      16:39:32: [<ffffffff81167cea>] ? alloc_pages_current+0xaa/0x110
      16:39:32: [<ffffffff8111ff77>] ? __page_cache_alloc+0x87/0x90
      16:39:32: [<ffffffff8111f95e>] ? find_get_page+0x1e/0xa0
      16:39:32: [<ffffffff81120f17>] ? filemap_fault+0x1a7/0x500
      16:39:32: [<ffffffff8152a98e>] ? __wait_on_bit+0x7e/0x90
      16:39:32: [<ffffffff8114a254>] ? __do_fault+0x54/0x530
      16:39:32: [<ffffffff811793e7>] ? mem_cgroup_uncharge_swap+0x27/0x90
      16:39:32: [<ffffffff8114a827>] ? handle_pte_fault+0xf7/0xb00
      16:39:32: [<ffffffff8114b45a>] ? handle_mm_fault+0x22a/0x300
      16:39:32: [<ffffffff8104a8d8>] ? __do_page_fault+0x138/0x480
      16:39:32: [<ffffffff81016c71>] ? fpu_finit+0x21/0x40
      16:39:32: [<ffffffff81016ce9>] ? init_fpu+0x59/0xc0
      16:39:32: [<ffffffff81017b38>] ? restore_i387_xstate+0x138/0x1c0
      16:39:32: [<ffffffff81227386>] ? security_file_permission+0x16/0x20
      16:39:32: [<ffffffff8152f25e>] ? do_page_fault+0x3e/0xa0
      16:39:32: [<ffffffff8152c615>] ? page_fault+0x25/0x30
      

      Info required for matching: replay-vbr 4c

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: