[LU-4875] We have 2 OSS server in HA and two MDS in HA , On each OSS 12 OSTs are mounted per OSS with faiover. OSS servers get reboots while working - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Incomplete
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.2.0
Labels:
None

Severity:
3
Rank (Obsolete):
13484

Description

We have 2 OSS servers in HA with corosync. Each OSS has 12 OSTs mounted in failover. While working intermittently OSS server get reboots frequently, This is affecting the availability of file system badly.

Attachments

Activity

[LU-4875] We have 2 OSS server in HA and two MDS in HA , On each OSS 12 OSTs are mounted per OSS with faiover. OSS servers get reboots while working

Pankaj Sharma (Inactive) added a comment - 10/Apr/14 1:59 PM

we have not given any specific journal size while formatting, its default, have shared above OST info from which if you can make out.

Pankaj Sharma (Inactive) added a comment - 10/Apr/14 1:59 PM we have not given any specific journal size while formatting, its default, have shared above OST info from which if you can make out.

Pankaj Sharma (Inactive) added a comment - 10/Apr/14 1:30 PM

Please find OST detail below

[root@homeoss1 ~]# tune2fs -l /dev/mapper/mpathg
tune2fs 1.42.7.wc2 (07-Nov-2013)
device /dev/dm-6 mounted by lustre per /proc/fs/lustre/obdfilter/home-OST0006/mntdev
Filesystem volume name: home-OST0006
Last mounted on: /
Filesystem UUID: 5a3ea3b2-568e-4062-a13b-ec5f121c0bd1
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent mmp flex_bg sparse_super large_file huge_file uninit_bg dir_nlink
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 11435008
Block count: 731811520
Reserved block count: 36590576
Free blocks: 583168103
Free inodes: 11013734
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 848
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 512
Inode blocks per group: 32
RAID stripe width: 256
Flex block group size: 256
Filesystem created: Fri Mar 28 01:16:44 2014
Last mount time: Wed Apr 9 16:22:44 2014
Last write time: Wed Apr 9 16:22:44 2014
Mount count: 51
Maximum mount count: -1
Last checked: Fri Mar 28 01:16:44 2014
Check interval: 0 (<none>)
Lifetime writes: 593 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: 5cd731f7-67c3-4db6-9e7c-21db7e829749
Journal backup: inode blocks
MMP block number: 9734
MMP update interval: 5

Pankaj Sharma (Inactive) added a comment - 10/Apr/14 1:30 PM Please find OST detail below [root@homeoss1 ~] # tune2fs -l /dev/mapper/mpathg tune2fs 1.42.7.wc2 (07-Nov-2013) device /dev/dm-6 mounted by lustre per /proc/fs/lustre/obdfilter/home-OST0006/mntdev Filesystem volume name: home-OST0006 Last mounted on: / Filesystem UUID: 5a3ea3b2-568e-4062-a13b-ec5f121c0bd1 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent mmp flex_bg sparse_super large_file huge_file uninit_bg dir_nlink Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 11435008 Block count: 731811520 Reserved block count: 36590576 Free blocks: 583168103 Free inodes: 11013734 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 848 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 512 Inode blocks per group: 32 RAID stripe width: 256 Flex block group size: 256 Filesystem created: Fri Mar 28 01:16:44 2014 Last mount time: Wed Apr 9 16:22:44 2014 Last write time: Wed Apr 9 16:22:44 2014 Mount count: 51 Maximum mount count: -1 Last checked: Fri Mar 28 01:16:44 2014 Check interval: 0 (<none>) Lifetime writes: 593 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: 5cd731f7-67c3-4db6-9e7c-21db7e829749 Journal backup: inode blocks MMP block number: 9734 MMP update interval: 5

Pankaj Sharma (Inactive) added a comment - 10/Apr/14 1:30 PM

uploaded the last 2 days sar file from OSS2 server

Pankaj Sharma (Inactive) added a comment - 10/Apr/14 1:30 PM uploaded the last 2 days sar file from OSS2 server

Andreas Dilger added a comment - 10/Apr/14 1:06 PM

You should collect the messages from the console, which is best done by connecting via serial port to the servers. That will hopefully tell you exactly what is going wrong at the time of failure.

What is the size of the journal on each OST?

Andreas Dilger added a comment - 10/Apr/14 1:06 PM You should collect the messages from the console, which is best done by connecting via serial port to the servers. That will hopefully tell you exactly what is going wrong at the time of failure. What is the size of the journal on each OST?

Pankaj Sharma (Inactive) added a comment - 10/Apr/14 12:30 PM

HA configuration file - corosync.conf.txt and oos1-cibxml.txt are uploaded

Pankaj Sharma (Inactive) added a comment - 10/Apr/14 12:30 PM HA configuration file - corosync.conf.txt and oos1-cibxml.txt are uploaded

Pankaj Sharma (Inactive) added a comment - 10/Apr/14 12:27 PM

we have noticed following in var/log/messages " max_child_count reached, postponing execution of operation monitor on ocf::Filesystem "

Do this has some relation with reboot, if yes then what exactly this means

Pankaj Sharma (Inactive) added a comment - 10/Apr/14 12:27 PM we have noticed following in var/log/messages " max_child_count reached, postponing execution of operation monitor on ocf::Filesystem " Do this has some relation with reboot, if yes then what exactly this means

Pankaj Sharma (Inactive) added a comment - 10/Apr/14 11:50 AM

sar file from OSS1 for last 3 days are uploaded which can give us some idea for cpu utilization, I/O wait etc.

Pankaj Sharma (Inactive) added a comment - 10/Apr/14 11:50 AM sar file from OSS1 for last 3 days are uploaded which can give us some idea for cpu utilization, I/O wait etc.

Pankaj Sharma (Inactive) added a comment - 10/Apr/14 11:04 AM

HA failover configuration file

Pankaj Sharma (Inactive) added a comment - 10/Apr/14 11:04 AM HA failover configuration file

Pankaj Sharma (Inactive) added a comment - 10/Apr/14 10:59 AM

Do we have any parameters in Lustre thru which we can restrict running out of memory. As we have already reduced ost_io.threads_max to 256

Pankaj Sharma (Inactive) added a comment - 10/Apr/14 10:59 AM Do we have any parameters in Lustre thru which we can restrict running out of memory. As we have already reduced ost_io.threads_max to 256

Pankaj Sharma (Inactive) added a comment - 10/Apr/14 10:56 AM

Thanks Andreas for prompt reply.
Each OSS server has 32 GB RAM.
We are using hardware RAID 5. Each OST consists of 11 x 300 GB SAS Disks in RAID 5. We have such 12 OSTs on each OSS.
Yes probably you may be right that they may be running out of memory but how can we make sure of that, is there anything in logs or we can monitor in Lustre thru some debug option that it is running out of memory. If you need any other log then do let me know.
I will extract .xz files and upload HA configuration files as .tar/.zip

Pankaj Sharma (Inactive) added a comment - 10/Apr/14 10:56 AM Thanks Andreas for prompt reply. Each OSS server has 32 GB RAM. We are using hardware RAID 5. Each OST consists of 11 x 300 GB SAS Disks in RAID 5. We have such 12 OSTs on each OSS. Yes probably you may be right that they may be running out of memory but how can we make sure of that, is there anything in logs or we can monitor in Lustre thru some debug option that it is running out of memory. If you need any other log then do let me know. I will extract .xz files and upload HA configuration files as .tar/.zip

Andreas Dilger added a comment - 10/Apr/14 5:28 AM

I don't know what .xz files are, so I cannot look at them. The dmesg and messages files do not list how much RAM is on these nodes, nor what type of RAID you are using. Is it MD software RAID?

My first guess would be that with 12 very large OSTs (I see 180 disks) on the node that it is just running out of memory.

Andreas Dilger added a comment - 10/Apr/14 5:28 AM I don't know what .xz files are, so I cannot look at them. The dmesg and messages files do not list how much RAM is on these nodes, nor what type of RAID you are using. Is it MD software RAID? My first guess would be that with 12 very large OSTs (I see 180 disks) on the node that it is just running out of memory.

People

Assignee:: John Fuchs-Chesney (Inactive)

Reporter:: Pankaj Sharma (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 09/Apr/14 5:05 PM

Updated:: 29/Nov/17 9:15 PM

Resolved:: 19/May/15 12:33 AM