[LU-12265] LustreError: 141027:0:(osd_iam_lfix.c:188:iam_lfix_init()) Bad magic in node 1861726 #34: 0xcc != 0x1976 or bad cnt: 0 170: rc = -5 - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.12.0
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Hello,
I have been running the IO500 benchmark suite recently across our all-flash NVMe-based filesystem and I have twice now come across the following errors that cause client IO errors and run failure, and I was hoping to find out more about what they indicate?

The following are errors on one of the servers, which is a combined OSS & MDS, and is one of 24 such servers:

May 06 09:29:43 dac-e-3 kernel: LustreError: 141015:0:(osd_iam_lfix.c:188:iam_lfix_init()) Skipped 11 previous similar messages
May 06 09:29:43 dac-e-3 kernel: LustreError: 141015:0:(osd_iam_lfix.c:188:iam_lfix_init()) Bad magic in node 1861726 #34: 0xcc != 0x1976 or bad cnt: 0 170: rc = -5
May 06 08:49:09 dac-e-3 kernel: LustreError: 140855:0:(osd_iam_lfix.c:188:iam_lfix_init()) Skipped 9 previous similar messages
May 06 08:49:09 dac-e-3 kernel: LustreError: 140855:0:(osd_iam_lfix.c:188:iam_lfix_init()) Bad magic in node 1861726 #34: 0xcc != 0x1976 or bad cnt: 0 170: rc = -5
May 06 08:47:25 dac-e-3 kernel: LustreError: 141027:0:(osd_iam_lfix.c:188:iam_lfix_init()) Bad magic in node 1861726 #34: 0xcc != 0x1976 or bad cnt: 0 170: rc = -5

I see no other lustre errors on any other servers, or on any of the clients, but the client application sees an error.

These errors are also only rarely seen so I'm not sure if I can easily reproduce them - I have been running this benchmark suite very intensely the past few days and we are fairly frequently re-formatting all of the hardware and rebuilding filesystems on this hardware as it is a pool of hardware that we use in a filesystem-on-demand style of usage.

At the time of the errors I was running an mdtest benchmark from the 'md easy' portion of the suite, with 128 clients, 32 ranks, so a very large number of files were being created at the time:

mdtest-1.9.3 was launched with 2048 total task(s) on 128 node(s)
Command line used: /home/mjr208/projects/benchmarking/io-500-src-stonewall-fix/bin/mdtest "-C" "-n" "140000" "-u" "-L" "-F" "-d" "/dac/fs1/mjr208/job11312297-2019-05-05-2356/mdt_easy"
Path: /dac/fs1/mjr208/job11312297-2019-05-05-2356
FS: 412.6 TiB   Used FS: 24.2%   Inodes: 960.0 Mi   Used Inodes: 0.0%

2048 tasks, 286720000 files
ior ERROR: open64() failed, errno 5, Input/output error (aiori-POSIX.c:376)
Abort(-1) on node 480 (rank 480 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 480
ior ERROR: open64() failed, errno 5, Input/output error (aiori-POSIX.c:376)
ior ERROR: open64() failed, errno 5, Input/output error (aiori-POSIX.c:376)
Abort(-1) on node 486 (rank 486 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 486
ior ERROR: open64() failed, errno 5, Input/output error (aiori-POSIX.c:376)
Abort(-1) on node 488 (rank 488 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 488
ior ERROR: open64() failed, errno 5, Input/output error (aiori-POSIX.c:376)
Abort(-1) on node 491 (rank 491 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 491
ior ERROR: open64() failed, errno 5, Input/output error (aiori-POSIX.c:376)
Abort(-1) on node 492 (rank 492 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 492
ior ERROR: open64() failed, errno 5, Input/output error (aiori-POSIX.c:376)
Abort(-1) on node 493 (rank 493 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 493
Abort(-1) on node 482 (rank 482 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 482

The filesystem itself is configured using DNE and specifically we are using DNE2 striped directories for all mdtest runs. We are using a large number of MDTs, 24 at the moment, one-per server, (which other than this problem, is otherwise working excellently), and the directory-stripe is '-1', so we are striping all the directories over all 24 MDTs, one per server. Each server contains 12 NVMe drives, and we partition one of the drives so it has both an OST and MDT partition.

Lustre and Kernel versions are as follows:

Server: kernel-3.10.0-957.el7_lustre.x86_64
Server: lustre-2.12.0-1.el7.x86_64

Clients: kernel-3.10.0-957.10.1.el7.x86_64
Clients: lustre-client-2.10.7-1.el7.x86_64

Could I get some advice on what this error indicates here?

Attachments

Issue Links

is related to

LU-12268 LDISKFS-fs error: ldiskfs_find_dest_de:2066: bad entry in directory: rec_len is smaller than minimal - offset=0( 0), inode=201, rec_len=0, name_len=0

Resolved

LU-14188 rw_semaphore in the iam_container structure that never been used

Resolved

is related to

LU-14188 rw_semaphore in the iam_container structure that never been used

Resolved

LU-15016 OI Scrub backup and rebuild

Open

Activity

[LU-12265] LustreError: 141027:0:(osd_iam_lfix.c:188:iam_lfix_init()) Bad magic in node 1861726 #34: 0xcc != 0x1976 or bad cnt: 0 170: rc = -5

Andreas Dilger added a comment - 11/Jun/19 10:41 PM

Matt has been hitting it regularly in his large-scale IO-500 runs in CAM-79.

Andreas Dilger added a comment - 11/Jun/19 10:41 PM Matt has been hitting it regularly in his large-scale IO-500 runs in CAM-79.

Hongchao Zhang added a comment - 11/Jun/19 12:04 PM

Hi Andreas,
Where do you encounter this problem? can it be reproducible? Thanks!

Hongchao Zhang added a comment - 11/Jun/19 12:04 PM Hi Andreas, Where do you encounter this problem? can it be reproducible? Thanks!

Andreas Dilger added a comment - 07/Jun/19 10:16 PM

This looks like it is still being hit.

Andreas Dilger added a comment - 07/Jun/19 10:16 PM This looks like it is still being hit.

Matt Rásó-Barnett (Inactive) added a comment - 24/May/19 3:03 PM

Hi Hongchao,
This might be possible to do, but this was a relatively rare bug for us so I'm tempted to leave this as a 'Can't Reproduce' until we are in a position to run into this again. It was only showing during the most strenous runs for IO500 and I didn't investigate other factors first - I wanted to raise the ticket in case the errors indicated anything obvious.

So perhaps we close this and my other ticket for now, and if I run into this issue again I can reopen and get you onto the platform to investigate?

Thanks,
Matt

Matt Rásó-Barnett (Inactive) added a comment - 24/May/19 3:03 PM Hi Hongchao, This might be possible to do, but this was a relatively rare bug for us so I'm tempted to leave this as a 'Can't Reproduce' until we are in a position to run into this again. It was only showing during the most strenous runs for IO500 and I didn't investigate other factors first - I wanted to raise the ticket in case the errors indicated anything obvious. So perhaps we close this and my other ticket for now, and if I run into this issue again I can reopen and get you onto the platform to investigate? Thanks, Matt

Hongchao Zhang added a comment - 22/May/19 1:13 PM

Hi, I have tried to reproduce this issue in local VMs, but it can't be reproduced.
Is it possible to login to your test site to check it? Thanks!

Hongchao Zhang added a comment - 22/May/19 1:13 PM Hi, I have tried to reproduce this issue in local VMs, but it can't be reproduced. Is it possible to login to your test site to check it? Thanks!

Matt Rásó-Barnett (Inactive) added a comment - 17/May/19 2:19 PM

Hi Hongchao, I've just started seeing this error again the past couple of days again unfortunately, and as mentioned in my last comment, this is where I am using a striped directory only on the parent directory, to ensure that all the subdirectories are put on different MDTs.

The application sees an I/O error failing to make the directory:

error could not create directory "/dac/fs1/mjr208/job11535231-2019-05-17-1421/mdt_easy/#test-dir.0-0/mdtest_tree.408.0/"
ior ERROR: open64() failed, errno 2, No such file or directory (aiori-POSIX.c:407)
Abort(-1) on node 408 (rank 408 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 408

and I get the error on the MDS as mentioned before.

I'm not sure what else would be useful to look for on this? I don't see any other syslog messages indicating hardware/driver issues of the device, and I've seen this error on different MDS/MDTs.

Thanks,
Matt

Matt Rásó-Barnett (Inactive) added a comment - 17/May/19 2:19 PM Hi Hongchao, I've just started seeing this error again the past couple of days again unfortunately, and as mentioned in my last comment, this is where I am using a striped directory only on the parent directory, to ensure that all the subdirectories are put on different MDTs. The application sees an I/O error failing to make the directory: error could not create directory "/dac/fs1/mjr208/job11535231-2019-05-17-1421/mdt_easy/#test-dir.0-0/mdtest_tree.408.0/" ior ERROR: open64() failed, errno 2, No such file or directory (aiori-POSIX.c:407) Abort(-1) on node 408 (rank 408 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 408 and I get the error on the MDS as mentioned before. I'm not sure what else would be useful to look for on this? I don't see any other syslog messages indicating hardware/driver issues of the device, and I've seen this error on different MDS/MDTs. Thanks, Matt

Hongchao Zhang added a comment - 17/May/19 11:12 AM

I have tried to run mdtest with 24 MDT in one VM, and don't encounter this problem.

[root@zhanghc tests]# ../utils/lfs getstripe -D /mnt/lustre/mdt_easy
(Default)/mnt/lustre/mdt_easy
lmv_stripe_count: 4294967295 lmv_stripe_offset: -1 lmv_hash_type: fnv_1a_64

[root@zhanghc mdtest]# ./mdtest -C -n 70000 -u -L -F -d /mnt/lustre/mdt_easy
-- started at 05/15/2019 10:29:42 --

mdtest-1.9.3 was launched with 1 total task(s) on 1 node(s)
Command line used: ./mdtest -C -n 70000 -u -L -F -d /mnt/lustre/mdt_easy
Path: /mnt/lustre
FS: 0.6 GiB   Used FS: 5.1%   Inodes: 0.2 Mi   Used Inodes: 3.4%

1 tasks, 70000 files

SUMMARY: (of 1 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :       2066.418       2066.418       2066.418          0.000
   File stat         :          0.000          0.000          0.000          0.000
   File read         :          0.000          0.000          0.000          0.000
   File removal      :          0.000          0.000          0.000          0.000
   Tree creation     :         11.047         11.047         11.047          0.000
   Tree removal      :          0.000          0.000          0.000          0.000

-- finished at 05/15/2019 10:30:15 --

Hongchao Zhang added a comment - 17/May/19 11:12 AM I have tried to run mdtest with 24 MDT in one VM, and don't encounter this problem. [root@zhanghc tests]# ../utils/lfs getstripe -D /mnt/lustre/mdt_easy (Default)/mnt/lustre/mdt_easy lmv_stripe_count: 4294967295 lmv_stripe_offset: -1 lmv_hash_type: fnv_1a_64 [root@zhanghc mdtest]# ./mdtest -C -n 70000 -u -L -F -d /mnt/lustre/mdt_easy -- started at 05/15/2019 10:29:42 -- mdtest-1.9.3 was launched with 1 total task(s) on 1 node(s) Command line used: ./mdtest -C -n 70000 -u -L -F -d /mnt/lustre/mdt_easy Path: /mnt/lustre FS: 0.6 GiB Used FS: 5.1% Inodes: 0.2 Mi Used Inodes: 3.4% 1 tasks, 70000 files SUMMARY: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 2066.418 2066.418 2066.418 0.000 File stat : 0.000 0.000 0.000 0.000 File read : 0.000 0.000 0.000 0.000 File removal : 0.000 0.000 0.000 0.000 Tree creation : 11.047 11.047 11.047 0.000 Tree removal : 0.000 0.000 0.000 0.000 -- finished at 05/15/2019 10:30:15 --

Matt Rásó-Barnett (Inactive) added a comment - 10/May/19 8:22 AM

Hi Hongchao, I have another ticket open https://jira.whamcloud.com/browse/LU-12268 where I do see 'LDISKFS-fs error' messages, however these are usually not occurring at the same time as the above errors.

They are both seen though when I'm doing the same benchmark, so perhaps they are related. My normal mode of operation is to re-create the entire filesystem after I get these errors, but I have seen them relatively frequently when running these large-scale mdtest benchmarks.

I also, only get these errors when I'm striping the mdtest directories over all 24 MDTs in the filesystem, eg:

lfs setdirstripe -c -1 -D mdt_easy

I don't see any problems when I'm either not striping every directory (just using DNE remote directories), eg:

lfs setdirstripe -c -1 mdt_easy (so all the subdirectories are on a different MDT, but are not themselves striped)

or the directory stripe_count: <= 12, eg:

lfs setdirstripe -c 12 -D mdt_easy

I haven't probed stripe_count: (12-24] yet. So perhaps this is DNE related?

I can't rule out a hardware or driver issue, however I've seen this error move around and occur on different MDTs in the filesystem, and as mentioned above it appears to go away if I don't use large stripe_count.

Matt Rásó-Barnett (Inactive) added a comment - 10/May/19 8:22 AM Hi Hongchao, I have another ticket open https://jira.whamcloud.com/browse/LU-12268 where I do see 'LDISKFS-fs error' messages, however these are usually not occurring at the same time as the above errors. They are both seen though when I'm doing the same benchmark, so perhaps they are related. My normal mode of operation is to re-create the entire filesystem after I get these errors, but I have seen them relatively frequently when running these large-scale mdtest benchmarks. I also, only get these errors when I'm striping the mdtest directories over all 24 MDTs in the filesystem, eg: lfs setdirstripe -c -1 -D mdt_easy I don't see any problems when I'm either not striping every directory (just using DNE remote directories), eg: lfs setdirstripe -c -1 mdt_easy (so all the subdirectories are on a different MDT, but are not themselves striped) or the directory stripe_count: <= 12, eg: lfs setdirstripe -c 12 -D mdt_easy I haven't probed stripe_count: (12-24] yet. So perhaps this is DNE related? I can't rule out a hardware or driver issue, however I've seen this error move around and occur on different MDTs in the filesystem, and as mentioned above it appears to go away if I don't use large stripe_count.

Hongchao Zhang added a comment - 09/May/19 11:57 AM

This error shows the index files (OI, scrub etc) are corrupted, it could be caused by some hardware issue or disk driver issues,
Is there any log related to this kind of failure (such as the errors contain "LDISKFS-fs error" or "JBD2")?

Thanks!

Hongchao Zhang added a comment - 09/May/19 11:57 AM This error shows the index files (OI, scrub etc) are corrupted, it could be caused by some hardware issue or disk driver issues, Is there any log related to this kind of failure (such as the errors contain "LDISKFS-fs error" or "JBD2")? Thanks!

Peter Jones added a comment - 06/May/19 5:18 PM

Hongchao

Could you please investigate?

Thanks

Peter

Peter Jones added a comment - 06/May/19 5:18 PM Hongchao Could you please investigate? Thanks Peter

People

Assignee:: Hongchao Zhang

Reporter:: Matt Rásó-Barnett (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 06/May/19 9:05 AM

Updated:: 28/Jan/22 1:42 AM