[LU-6295] sanity-lfsck test_4: oom on MDT0 Created: 26/Feb/15  Updated: 18/Oct/18  Resolved: 27/Mar/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Di Wang
Resolution: Duplicate Votes: 0
Labels: dne2

Issue Links:
Related
is related to LU-3534 async update cross-MDTs Resolved
Severity: 3
Bugzilla ID: 6,380
Rank (Obsolete): 17628

 Description   

This issue was created by maloo for wangdi <di.wang@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/61eb856e-bd39-11e4-8d85-5254006e85c2.

The sub-test test_4 failed with the following error:

test failed to respond and timed out

Please provide additional information about the failure here.

Info required for matching: sanity-lfsck 4



 Comments   
Comment by Di Wang [ 26/Feb/15 ]

16:30:47:Mem-Info:
16:30:47:Node 0 DMA per-cpu:
16:30:47:CPU 0: hi: 0, btch: 1 usd: 0
16:30:47:CPU 1: hi: 0, btch: 1 usd: 0
16:30:47:Node 0 DMA32 per-cpu:
16:30:47:CPU 0: hi: 186, btch: 31 usd: 30
16:30:47:CPU 1: hi: 186, btch: 31 usd: 0
16:30:47:active_anon:1002 inactive_anon:1020 isolated_anon:0
16:30:47: active_file:30 inactive_file:93 isolated_file:0
16:30:47: unevictable:0 dirty:0 writeback:1054 unstable:0
16:30:47: free:13256 slab_reclaimable:1615 slab_unreclaimable:10739
16:30:47: mapped:26 shmem:13 pagetables:539 bounce:0
16:30:47:Node 0 DMA free:8336kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:24kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
16:30:47:lowmem_reserve[]: 0 2004 2004 2004
16:30:47:Node 0 DMA32 free:44688kB min:44720kB low:55900kB high:67080kB active_anon:4008kB inactive_anon:4080kB active_file:120kB inactive_file:372kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052308kB mlocked:0kB dirty:0kB writeback:4216kB mapped:104kB shmem:52kB slab_reclaimable:6460kB slab_unreclaimable:42932kB kernel_stack:1672kB pagetables:2156kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2304 all_unreclaimable? no
16:30:47:lowmem_reserve[]: 0 0 0 0
16:30:47:Node 0 DMA: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 2*2048kB 1*4096kB = 8336kB
16:30:47:Node 0 DMA32: 2304*4kB 1172*8kB 445*16kB 167*32kB 77*64kB 18*128kB 3*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 44688kB
16:30:47:1189 total pagecache pages
16:30:47:1052 pages in swap cache
16:30:47:Swap cache stats: add 2969, delete 1917, find 29/40
16:30:47:Free swap = 4117292kB
16:30:47:Total swap = 4128764kB
16:30:47:524284 pages RAM
16:30:47:43706 pages reserved
16:30:47:316 pages shared
16:30:47:462410 pages non-shared
16:30:47:[ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
16:30:47:[ 412] 0 412 2692 33 0 -17 -1000 udevd
16:30:47:[ 1049] 0 1049 6899 30 0 -17 -1000 auditd
16:30:47:[ 1069] 0 1069 62273 87 1 0 0 rsyslogd
16:30:47:[ 1099] 0 1099 4560 30 1 0 0 irqbalance
16:30:47:[ 1115] 32 1115 4744 19 1 0 0 rpcbind
16:30:47:[ 1135] 29 1135 5837 6 1 0 0 rpc.statd
16:30:47:[ 1252] 81 1252 6418 4 1 0 0 dbus-daemon
16:30:47:[ 1269] 0 1269 53919 20 1 0 0 ypbind
16:30:47:[ 1338] 0 1338 1020 9 1 0 0 acpid
16:30:47:[ 1348] 68 1348 10507 95 1 0 0 hald
16:30:47:[ 1349] 0 1349 5099 4 0 0 0 hald-runner
16:30:47:[ 1381] 0 1381 5629 4 1 0 0 hald-addon-inpu
16:30:47:[ 1391] 68 1391 4501 3 1 0 0 hald-addon-acpi
16:30:47:[ 1429] 0 1429 26827 2 0 0 0 rpc.rquotad
16:30:47:[ 1434] 0 1434 5417 0 0 0 0 rpc.mountd
16:30:47:[ 1474] 0 1474 6291 3 0 0 0 rpc.idmapd
16:30:47:[ 1507] 498 1507 57325 150 1 0 0 munged
16:30:47:[ 1525] 0 1525 16553 7 0 -17 -1000 sshd
16:30:47:[ 1534] 0 1534 5429 21 0 0 0 xinetd
16:30:47:[ 1562] 0 1562 22208 2 1 0 0 sendmail
16:30:47:[ 1571] 51 1571 20071 0 0 0 0 sendmail
16:30:47:[ 1595] 0 1595 29215 130 1 0 0 crond
16:30:47:[ 1608] 0 1608 5276 51 0 0 0 atd
16:30:47:[ 1622] 0 1622 1020 25 1 0 0 agetty
16:30:47:[ 1623] 0 1623 1016 23 1 0 0 mingetty
16:30:47:[ 1625] 0 1625 1016 24 1 0 0 mingetty
16:30:47:[ 1627] 0 1627 1016 23 1 0 0 mingetty
16:30:47:[ 1629] 0 1629 1016 23 1 0 0 mingetty
16:30:47:[ 1631] 0 1631 1016 24 1 0 0 mingetty
16:30:47:[ 1633] 0 1633 2692 36 1 -17 -1000 udevd
16:30:47:[ 1634] 0 1634 2691 24 0 -17 -1000 udevd
16:30:47:[ 1636] 0 1636 1016 24 0 0 0 mingetty
16:30:47:[ 2278] 38 2278 7689 171 0 0 0 ntpd
16:30:47:[12785] 0 12785 4763 60 0 0 0 anacron
16:30:47:Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled

Comment by Di Wang [ 04/Mar/15 ]

This because the inject fail makes recovery hang and can not finish.

20:26:51:Lustre: lustre-MDT0000-osd: the OI mapping for the FID [0x200000009:0x0:0x0] become inconsistent, the given ID 111/4224599767, the ID in OI mapping 111/111
20:26:51:LustreError: 11956:0:(lod_sub_object.c:903:lod_sub_prep_llog()) lustre-MDT0000-mdtlov: can't get id from catalogs: rc = -78
Comment by nasf (Inactive) [ 05/Mar/15 ]

But what will happen if such FID mapping is really crashed in the real world?

Comment by Di Wang [ 27/Mar/15 ]

duplicate with LU-6380

Generated at Sat Feb 10 07:16:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.