[LU-11418] hung threads on MDT and MDT won't umount Created: 22/Sep/18  Updated: 02/May/22  Resolved: 23/Jan/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.4
Fix Version/s: Lustre 2.13.0, Lustre 2.12.1

Type: Bug Priority: Minor
Reporter: SC Admin (Inactive) Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None
Environment:

x86_64, zfs, 3 MDTs, all on 1 MDS, , 2.10.4 + many patches ~= 2.10.5 to 2.12


Attachments: File lustre-log.1537573059.458124.gz     File messages-grep-vslurm.txt.gz     File messages-john99.warble2.Lustre.20181014.txt.gz     File messages-warble.20180924.next.txt.gz     File messages-warble.20180924.txt.gz     File warble2.conman.20181024.log    
Issue Links:
Duplicate
is duplicated by LU-12209 cannot create stripe dir: Stale file ... Resolved
Related
is related to LU-11336 replay-single test 80d hangs on MDT u... Open
is related to LU-11516 ASSERTION( ((o)->lo_header->loh_attr ... Resolved
is related to LU-15761 cannot finish MDS recovery Resolved
is related to LU-13070 mdd_orphan_destroy loop caused by com... Resolved
is related to LU-11681 sanity test 65i fails with 'find /mn... Resolved
is related to LU-11857 repeated "could not delete orphan [0x... Resolved
is related to LU-12747 sanity: test 811 fail with "MDD orpha... Resolved
Epic/Theme: dne
Severity: 2
Rank (Obsolete): 9223372036854775807

 Description   

Hi,

unfortunately once again similar/same symptoms as LU-11082 and LU-11301.

chgrp/chmod sweep across files and directories results in eventual total hang of the filesystem. hung MDT threads. one MDT won't umount. MDS has to be powered off to fix the fs.

processes that are stuck on the client doing the sweep are

root     142716  0.0  0.0 108252   116 pts/1    S    01:33   0:34 xargs -0 -n5 chgrp -h oz044
root     236217  0.0  0.0 108252   116 pts/1    S    01:15   0:25 xargs -0 -n5 chgrp -h oz065
root     385816  0.0  0.0 108252   116 pts/1    S    05:34   0:15 xargs -0 -n5 chgrp -h oz100
root     418923  0.0  0.0 120512   136 pts/1    S    09:34   0:00 chgrp -h oz100 oz100/pipes/DWF_PIPE/MARY_WORK/Antlia_170206_msystembis5_8/ccd5/catalogs/candidates.cat oz100/pipes/DWF_PIPE/MARY_WORK/Antlia_170206_msystembis5_8/ccd5/catalogs/candidates_ranked.cat oz100/pipes/DWF_PIPE/MARY_WORK/Antlia_170206_msystembis5_8/ccd5/catalogs/candidates_full.cat oz100/pipes/DWF_PIPE/MARY_WORK/Antlia_170206_msystembis5_8/ccd5/images oz100/pipes/DWF_PIPE/MARY_WORK/Antlia_170206_msystembis5_8/ccd46
root     418944  0.0  0.0 120512   136 pts/1    S    09:34   0:01 chgrp -h oz044 oz044/mbernet/c_cpp/dust_prc/src/pgplot/sys_msdos/msdriv.f oz044/mbernet/c_cpp/dust_prc/src/pgplot/sys_msdos/grexec.f oz044/mbernet/c_cpp/dust_prc/src/pgplot/sys_msdos/grdos.f oz044/mbernet/c_cpp/dust_prc/src/pgplot/makemake oz044/mbernet/c_cpp/dust_prc/src/pgplot/sys
root     418947  0.0  0.0 120512   136 pts/1    S    09:34   0:00 chgrp -h oz065 oz065/OpenFOAM/szhu-v1806/run/Deen/LES/run09_multiperforation_periodic/3x3/fine_GraceDrag_constantLift_defaultLES_probe2_u_0p005_ozstar/processor14/85/uniform/functionObjects oz065/OpenFOAM/szhu-v1806/run/Deen/LES/run09_multiperforation_periodic/3x3/fine_GraceDrag_constantLift_defaultLES_probe2_u_0p005_ozstar/processor14/85/uniform/functionObjects/functionObjectProperties oz065/OpenFOAM/szhu-v1806/run/Deen/LES/run09_multiperforation_periodic/3x3/fine_GraceDrag_constantLift_defaultLES_probe2_u_0p005_ozstar/processor14/85/alpha.water oz065/OpenFOAM/szhu-v1806/run/Deen/LES/run09_multiperforation_periodic/3x3/fine_GraceDrag_constantLift_defaultLES_probe2_u_0p005_ozstar/processor14/85/Ur oz065/OpenFOAM/szhu-v1806/run/Deen/LES/run09_multiperforation_periodic/3x3/fine_GraceDrag_constantLift_defaultLES_probe2_u_0p005_ozstar/processor14/85/p...

I can't see any rc=-116 in the logs this time.

first hung thread is

Sep 22 09:37:39 warble2 kernel: LNet: Service thread pid 458124 was inactive for 200.31s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
Sep 22 09:37:39 warble2 kernel: Pid: 458124, comm: mdt01_095 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018
Sep 22 09:37:39 warble2 kernel: Call Trace:
Sep 22 09:37:39 warble2 kernel: [<ffffffffc159c047>] top_trans_wait_result+0xa6/0x155 [ptlrpc]
Sep 22 09:37:39 warble2 kernel: [<ffffffffc157d91b>] top_trans_stop+0x42b/0x930 [ptlrpc]
Sep 22 09:37:39 warble2 kernel: [<ffffffffc16d65f9>] lod_trans_stop+0x259/0x340 [lod]
Sep 22 09:37:39 warble2 kernel: [<ffffffffc177423a>] mdd_trans_stop+0x2a/0x46 [mdd]
Sep 22 09:37:39 warble2 kernel: [<ffffffffc1769bcb>] mdd_attr_set+0x5eb/0xce0 [mdd]
Sep 22 09:37:39 warble2 kernel: [<ffffffffc0ff65f5>] mdt_reint_setattr+0xba5/0x1060 [mdt]
Sep 22 09:37:39 warble2 kernel: [<ffffffffc0ff6b33>] mdt_reint_rec+0x83/0x210 [mdt]
Sep 22 09:37:39 warble2 kernel: [<ffffffffc0fd836b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
Sep 22 09:37:39 warble2 kernel: [<ffffffffc0fe3f07>] mdt_reint+0x67/0x140 [mdt]
Sep 22 09:37:39 warble2 kernel: [<ffffffffc156a38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
Sep 22 09:37:39 warble2 kernel: [<ffffffffc1512e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
Sep 22 09:37:39 warble2 kernel: [<ffffffffc1516592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
Sep 22 09:37:39 warble2 kernel: [<ffffffffb64bb621>] kthread+0xd1/0xe0
Sep 22 09:37:39 warble2 kernel: [<ffffffffb6b205dd>] ret_from_fork_nospec_begin+0x7/0x21
Sep 22 09:37:39 warble2 kernel: [<ffffffffffffffff>] 0xffffffffffffffff
Sep 22 09:37:39 warble2 kernel: LustreError: dumping log to /tmp/lustre-log.1537573059.458124

there was a subnet manager crash and restart about 15 minutes before the MDS threads hung this time, but I don't think that's related.

first lustre-log for warble2 and syslog for the cluster are attached.

I also did a sryrq 't' and 'w' before resetting warble2, so that may be of help to you.
those start at
Sep 22 16:26:15
in messages.

please let us know if you'd like anything else.
would a kernel crashdump help?
we are getting closer to being able to capture one of these.

cheers,
robin



 Comments   
Comment by Peter Jones [ 23/Sep/18 ]

Lai

Can you please advise?

Robin

Even if Lai can proceed without a crash dump on this occasion, I think that having this option is bound to be useful at some point.

Peter

Comment by SC Admin (Inactive) [ 24/Sep/18 ]

Hi,

there was a -116 in the logs in todays hang/crash. syslog attached.

in case it's not clear - this ticket is our highest priority of all our tickets.
thanks.

Peter - we have some compute node crash dump's working now, and will start to work on servers.

cheers,
robin

Comment by SC Admin (Inactive) [ 24/Sep/18 ]

another hang. I'll attach the syslog. this hang has no rc = -116.
first hung process at
Sep 24 14:57:22

for the 2 crashes today, I can't see any chmod/chgrp's on the system. there are a about 8 rsync's running on the login nodes from various users, so I suspect they are probably triggering it.

probably tangential, but there are new mdt_cross_open errors in the logs as of today too. can't see them in previous logs.

# zegrep 'declared for insert|invalid length 0 in llog|rc = -116|mdt_cross_open' /var/log/messages-201809* /var/log/messages
/var/log/messages-20180901.gz:Sep  1 02:19:25 warble1 kernel: Lustre: 400208:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4
/var/log/messages-20180904.gz:Sep  3 18:51:57 warble1 kernel: LustreError: 400206:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0002-osp-MDT0001: fail to cancel 0 of 1 llog-records: rc = -116
/var/log/messages-20180904.gz:Sep  3 19:46:28 warble2 kernel: Lustre: 254384:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4
/var/log/messages-20180904.gz:Sep  3 19:48:17 warble2 kernel: LustreError: 254382:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0002-osp-MDT0001: fail to cancel 0 of 1 llog-records: rc = -116
/var/log/messages-20180904.gz:Sep  3 20:42:32 warble2 kernel: Lustre: 23441:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4
/var/log/messages-20180904.gz:Sep  3 22:13:47 warble2 kernel: Lustre: 20681:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4
/var/log/messages-20180904.gz:Sep  3 22:30:36 warble2 kernel: LustreError: 19030:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0001-osp-MDT0000: fail to cancel 0 of 1 llog-records: rc = -116
/var/log/messages-20180904.gz:Sep  3 23:01:33 warble2 kernel: Lustre: 20333:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4
/var/log/messages-20180905.gz:Sep  5 02:04:43 warble2 kernel: LustreError: 25207:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0000-osp-MDT0002: fail to cancel 0 of 1 llog-records: rc = -116
/var/log/messages-20180905.gz:Sep  5 02:14:27 warble2 kernel: Lustre: 27387:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4
/var/log/messages-20180905.gz:Sep  5 02:26:36 warble2 kernel: LustreError: 20423:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0001-osp-MDT0000: fail to cancel 0 of 1 llog-records: rc = -116
/var/log/messages-20180905.gz:Sep  5 02:46:13 warble2 kernel: Lustre: 20430:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4
/var/log/messages-20180905.gz:Sep  5 02:47:44 warble2 kernel: LustreError: 20428:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0000-osp-MDT0001: fail to cancel 0 of 1 llog-records: rc = -116
/var/log/messages-20180905.gz:Sep  5 03:00:55 warble2 kernel: Lustre: 21170:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4
/var/log/messages-20180905.gz:Sep  5 03:01:01 warble2 kernel: LustreError: 34530:0:(osd_index.c:588:osd_dir_insert()) dagg-MDT0000-osd: [0x28003a5b7:0x4589:0x0] wasn't declared for insert
/var/log/messages-20180906.gz:Sep  5 08:02:43 warble2 kernel: LustreError: 21168:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0000-osp-MDT0001: fail to cancel 0 of 1 llog-records: rc = -116
/var/log/messages-20180906.gz:Sep  5 11:29:53 warble2 kernel: Lustre: 25097:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4
/var/log/messages-20180906.gz:Sep  5 11:29:55 warble2 kernel: LustreError: 39383:0:(osd_index.c:588:osd_dir_insert()) dagg-MDT0000-osd: [0x680036878:0x1f73f:0x0] wasn't declared for insert
/var/log/messages-20180906.gz:Sep  5 11:36:19 warble2 kernel: LustreError: 21644:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0002-osp-MDT0000: fail to cancel 0 of 1 llog-records: rc = -116
/var/log/messages-20180906.gz:Sep  5 12:15:49 warble2 kernel: Lustre: 20745:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4
/var/log/messages-20180912.gz:Sep 12 02:30:12 warble2 kernel: Lustre: 433349:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4
/var/log/messages-20180919.gz:Sep 19 01:04:01 warble2 kernel: Lustre: 21812:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4
/var/log/messages-20180919.gz:Sep 19 01:04:09 warble2 kernel: LustreError: 19129:0:(osd_index.c:588:osd_dir_insert()) dagg-MDT0000-osd: [0x68003d573:0x10e9d:0x0] wasn't declared for insert
/var/log/messages-20180923.gz:Sep 22 16:40:23 warble2 kernel: Lustre: 21330:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4
/var/log/messages-20180923.gz:Sep 22 16:52:47 warble2 kernel: Lustre: 30865:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4
/var/log/messages-20180923.gz:Sep 22 16:52:47 warble2 kernel: LustreError: 29262:0:(osd_index.c:588:osd_dir_insert()) dagg-MDT0000-osd: [0x68003e09a:0xb1cb:0x0] wasn't declared for insert
/var/log/messages-20180923.gz:Sep 23 02:15:37 warble2 kernel: Lustre: 20665:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4
/var/log/messages:Sep 24 12:11:13 warble2 kernel: LustreError: 31813:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0002: [0x68003e07a:0x17284:0x0] doesn't exist!: rc = -14
/var/log/messages:Sep 24 12:11:16 warble2 kernel: LustreError: 31746:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0001: [0x28004037a:0x12fdc:0x0] doesn't exist!: rc = -14
/var/log/messages:Sep 24 12:11:16 warble2 kernel: LustreError: 31746:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 1 previous similar message
/var/log/messages:Sep 24 12:17:02 warble2 kernel: LustreError: 24787:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0001-osp-MDT0002: fail to cancel 0 of 1 llog-records: rc = -116
/var/log/messages:Sep 24 13:57:21 warble2 kernel: Lustre: 23318:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4
/var/log/messages:Sep 24 14:04:10 warble2 kernel: LustreError: 44226:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0002: [0x68003e07a:0x17284:0x0] doesn't exist!: rc = -14
/var/log/messages:Sep 24 14:05:27 warble2 kernel: LustreError: 44595:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0001: [0x28004037a:0x12fdc:0x0] doesn't exist!: rc = -14
/var/log/messages:Sep 24 14:05:27 warble2 kernel: LustreError: 44595:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 1 previous similar message
/var/log/messages:Sep 24 14:27:55 warble2 kernel: LustreError: 47736:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0002: [0x68003e07a:0x17284:0x0] doesn't exist!: rc = -14
/var/log/messages:Sep 24 14:27:56 warble2 kernel: LustreError: 44449:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0001: [0x28004037a:0x12fdc:0x0] doesn't exist!: rc = -14
/var/log/messages:Sep 24 14:27:56 warble2 kernel: LustreError: 44449:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 1 previous similar message
/var/log/messages:Sep 24 17:59:59 warble2 kernel: Lustre: 21379:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4
/var/log/messages:Sep 24 17:59:59 warble2 kernel: LustreError: 42530:0:(osd_index.c:588:osd_dir_insert()) dagg-MDT0000-osd: [0x68003e097:0x669a:0x0] wasn't declared for insert

cheers,
robin

Comment by Lai Siyao [ 25/Sep/18 ]

The log looks similar to that of LU-11301, so the previous fix doesn't really solve this issue, I'm checking related code.

Comment by Gerrit Updater [ 27/Sep/18 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33248
Subject: LU-11418 osd-zfs: stop trans if transaction abort
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 518eff16a77d077d82bc92b85912327dbfe6a271

Comment by Lai Siyao [ 27/Sep/18 ]

I just uploaded a patch which fixes a corner case if transaction aborts, and also adds a few debug messages. You can cherry-pick it to 2.10 and test again.

Comment by SC Admin (Inactive) [ 28/Sep/18 ]

Hi Lai,

thanks very much!

I was just about to add that in, but then I noticed sanity-sec fails in gerrit. is that something we should be worried about?

cheers,
robin

Comment by SC Admin (Inactive) [ 05/Oct/18 ]

Hi Lai,

has this patch stabilised enough to try it out now do you think?

cheers,
robin

Comment by Lai Siyao [ 08/Oct/18 ]

Yes, you can try it now.

Comment by SC Admin (Inactive) [ 09/Oct/18 ]

Hi Lai,

ok. that does seem to have helped a lot.
the MDTs were "repeatably" hanging today, and then I added the version3 patch and they seem ok so far.
success?

longer version: there are a lot of rsync's running on login nodes and data-movers today, and warble2 (which has all 3 DNE dagg MDTs on it) hung again (as it did 2 days ago). warble2 needed to be reset to fix it but it just hung again a few minutes later. and then the same again.
the one odd thing about today's hangs was that for the first two hangs all 3 MDTs wouldn't umount. usually it's just one.
so then I added your version 3 patch from https://review.whamcloud.com/#/c/33248/ and warble2 hasn't hung again yet.

only warble2 is running this patched lustre at the moment.

dmesg below.
there are multiple -116 messages, but no stuck MDT threads.

[  107.621762] LNet: Using FMR for registration
[  107.626523] LNetError: 6808:0:(o2iblnd_cb.c:2299:kiblnd_passive_connect()) Can't accept conn from 192.168.44.13@o2ib44 on NA (ib0:0:192.168.44.22): bad dst nid 192.168.44.22@o2ib44
[  107.684310] LNet: Added LNI 192.168.44.22@o2ib44 [128/2048/0/180]
[  186.847844] LustreError: 137-5: dagg-MDT0002_UUID: not available for connect from 192.168.44.191@o2ib44 (no target). If you are running an HA pair check that the target is mounted on the other server.
[  186.925648] Lustre: dagg-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
[  186.964659] Lustre: dagg-MDT0000: Connection restored to 192.168.44.36@o2ib44 (at 192.168.44.36@o2ib44)
[  187.483723] LustreError: 137-5: dagg-MDT0002_UUID: not available for connect from 192.168.44.145@o2ib44 (no target). If you are running an HA pair check that the target is mounted on the other server.
[  187.937707] Lustre: dagg-MDT0000: Will be in recovery for at least 2:30, or until 130 clients reconnect
[  187.947555] Lustre: dagg-MDT0000: Connection restored to 2d8f72a9-431d-332b-5ffa-aa828d3be0ee (at 192.168.44.121@o2ib44)
[  188.550374] LustreError: 137-5: dagg-MDT0001_UUID: not available for connect from 192.168.44.129@o2ib44 (no target). If you are running an HA pair check that the target is mounted on the other server.
[  188.569071] LustreError: Skipped 2 previous similar messages
[  188.795236] LustreError: 11-0: dagg-MDT0000-osp-MDT0001: operation mds_connect to node 0@lo failed: rc = -114
[  188.853949] Lustre: dagg-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
[  190.146916] Lustre: dagg-MDT0001: Will be in recovery for at least 2:30, or until 129 clients reconnect
[  190.156945] Lustre: dagg-MDT0001: Connection restored to 652c9ef9-2a5d-b195-3eda-b821ae8894d0 (at 192.168.44.110@o2ib44)
[  190.168392] Lustre: Skipped 1 previous similar message
[  190.829441] LustreError: 11-0: dagg-MDT0000-osp-MDT0002: operation mds_connect to node 0@lo failed: rc = -114
[  190.947606] Lustre: dagg-MDT0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
[  191.348107] Lustre: dagg-MDT0002: Will be in recovery for at least 2:30, or until 128 clients reconnect
[  192.215123] Lustre: dagg-MDT0000: Connection restored to 9b72ae35-e33b-a946-9669-fa831346959d (at 192.168.44.115@o2ib44)
[  192.226665] Lustre: Skipped 30 previous similar messages
[  196.317682] Lustre: dagg-MDT0002: Connection restored to 8f4e7961-71a6-0de0-25cd-4be32d12d7d6 (at 192.168.44.116@o2ib44)
[  196.329207] Lustre: Skipped 215 previous similar messages
[  204.357812] Lustre: dagg-MDT0000: Connection restored to 5bac3166-dadc-89ba-730d-d36504f6ca68 (at 192.168.44.218@o2ib44)
[  204.369347] Lustre: Skipped 70 previous similar messages
[  216.020013] LustreError: 32173:0:(llog_cat.c:269:llog_cat_id2handle()) dagg-MDT0001-osp-MDT0002: error opening log id [0x46a:0x80046900:0x2]:0: rc = -2
[  216.034963] LustreError: 32173:0:(llog_cat.c:824:llog_cat_process_cb()) dagg-MDT0001-osp-MDT0002: cannot find handle for llog [0x46a:0x80046900:0x2]: rc = -2
[  220.462569] Lustre: dagg-MDT0000: Connection restored to c3d7aae9-47e8-dbda-61fe-2422c114c40a (at 192.168.44.140@o2ib44)
[  220.474173] Lustre: Skipped 70 previous similar messages
[  234.910484] Lustre: 32489:0:(ldlm_lib.c:1773:extend_recovery_timer()) dagg-MDT0002: extended recovery timer reaching hard limit: 900, extend: 1
[  236.043117] Lustre: 22198:0:(ldlm_lib.c:1773:extend_recovery_timer()) dagg-MDT0001: extended recovery timer reaching hard limit: 900, extend: 1
[  236.057457] Lustre: 22198:0:(ldlm_lib.c:1773:extend_recovery_timer()) Skipped 1 previous similar message
[  241.008948] Lustre: dagg-MDT0002: Recovery over after 0:50, of 130 clients 130 recovered and 0 were evicted.
[  241.046477] LustreError: 19852:0:(tgt_handler.c:509:tgt_filter_recovery_request()) @@@ not permitted during recovery  req@ffff921bcc313f00 x1613823206242752/t0(0) o601->dagg-MDT0000-lwp-MDT0002_UUID@0@lo:417/0 lens 336/0 e 0 to 0 dl 1539062632 ref 1 fl Interpret:/0/ffffffff rc 0/-1
[  241.073940] LustreError: 11-0: dagg-MDT0000-lwp-MDT0002: operation quota_acquire to node 0@lo failed: rc = -11
[  241.642213] Lustre: dagg-MDT0000: Recovery over after 0:54, of 130 clients 130 recovered and 0 were evicted.
[  241.672756] Lustre: 21509:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4
[  241.688069] LustreError: 21509:0:(lod_dev.c:420:lod_sub_recovery_thread()) dagg-MDT0000-osp-MDT0001 getting update log failed: rc = -22
[  248.792951] Lustre: 22198:0:(ldlm_lib.c:2059:target_recovery_overseer()) recovery is aborted, evict exports in recovery
[  248.804911] Lustre: dagg-MDT0001: disconnecting 6 stale clients
[  248.812515] LustreError: 22198:0:(ldlm_lib.c:1640:abort_lock_replay_queue()) @@@ aborted:  req@ffff921bd2851b00 x1613136754616240/t0(0) o101->9b72ae35-e33b-a946-9669-fa831346959d@192.168.44.115@o2ib44:368/0 lens 328/0 e 0 to 0 dl 1539062583 ref 1 fl Complete:/40/ffffffff rc 0/-1
[  248.854008] Lustre: dagg-MDT0001: Denying connection for new client 10e4c1fa-928c-56a4-d78e-017ed3a2d89c(at 192.168.44.129@o2ib44), waiting for 130 known clients (124 recovered, 0 in progress, and 6 evicted) to recover in 14:01
[  248.869553] Lustre: dagg-MDT0001: Recovery over after 0:59, of 130 clients 124 recovered and 6 were evicted.
[  248.886553] Lustre: Skipped 3 previous similar messages
[  250.354374] LustreError: 32739:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14
[  254.079712] Lustre: dagg-MDT0000: Connection restored to 192.168.44.35@o2ib44 (at 192.168.44.35@o2ib44)
[  254.090024] Lustre: Skipped 39 previous similar messages
[  282.212958] LustreError: 51902:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14
[  282.226582] LustreError: 51902:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 1 previous similar message
[  419.076513] LustreError: 51723:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14
[  419.090148] LustreError: 51723:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 4 previous similar messages
[  450.899773] LustreError: 51713:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14
[  450.913417] LustreError: 51713:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 1 previous similar message
[  451.353612] LustreError: 32170:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0001-osp-MDT0002: fail to cancel 0 of 1 llog-records: rc = -116
[  588.128553] LustreError: 32739:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14
[  588.142200] LustreError: 32739:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 4 previous similar messages
[  619.967048] LustreError: 55105:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14
[  619.980680] LustreError: 55105:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 1 previous similar message
[  755.646673] LustreError: 51708:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14
[  755.660306] LustreError: 51708:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 4 previous similar messages
[  810.287028] Lustre: dagg-MDT0000: Connection restored to aca1fdbe-fe45-b9db-701e-ccf7c902ae6e (at 192.168.44.199@o2ib44)
[  810.298854] Lustre: Skipped 14 previous similar messages
[  924.215258] LustreError: 51708:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14
[  924.228884] LustreError: 51708:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 6 previous similar messages
[ 1092.752842] LustreError: 54995:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14
[ 1092.766459] LustreError: 54995:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 6 previous similar messages
[ 1261.277065] LustreError: 32739:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14
[ 1261.290649] LustreError: 32739:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 6 previous similar messages
[ 1575.449329] LustreError: 32170:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0001-osp-MDT0002: fail to cancel 0 of 1 llog-records: rc = -116
[ 1575.464371] LustreError: 32170:0:(llog_cat.c:795:llog_cat_cancel_records()) Skipped 3 previous similar messages
[ 1586.959098] LustreError: 20233:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0002-osp-MDT0000: fail to cancel 0 of 1 llog-records: rc = -116
[ 1633.295825] LustreError: 19843:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14
[ 1633.309398] LustreError: 19843:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 6 previous similar messages
[ 2276.211972] LustreError: 54995:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14
[ 2276.225522] LustreError: 54995:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 20 previous similar messages

there are definitely new corrupted directories from today's resets/hangs, but I can't tell if they are worse than usual or not.

cheers,
robin

Comment by SC Admin (Inactive) [ 09/Oct/18 ]

so the corruption is perhaps different today...

I'm putting it in this ticket as it's perhaps related to these changes.
there are high level directories (not new leaf dirs) that now report "stale file handle".
2 users have reported this so far.

these are not new directories, but have been there for weeks/months . eg.

[root@john99 ~]# mkdir /fred/oz002/mlower/blah
mkdir: cannot create directory ‘/fred/oz002/mlower/blah’: Stale file handle
[root@john99 ~]# ls -ld  /fred/oz002/mlower/
drwxrwsr-x 6 mlower oz002 76800 Oct  9 17:18 /fred/oz002/mlower/
[root@john99 ~]# lfs getdirstripe /fred/oz002/mlower
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx    FID[seq:oid:ver]
     1    [0x280024621:0x196c4:0x0]
     2    [0x680024df2:0x19733:0x0]
     0    [0x2000153f2:0x196fb:0x0]
[root@john99 ~]# ls -l /fred/oz002/mlower/
total 324
drwx--S--- 9 mlower oz002 76800 Oct  5 16:50 glitches
drwxr-sr-x 5 mlower oz002 76800 Sep 21 14:18 mysoft
drwxr-sr-x 6 mlower oz002 76800 Oct  9 17:18 ppta_dr2
-rw-r--r-- 1 mlower oz002     5 Oct  9 17:18 test.txt
drwxrwsr-x 7 mlower oz002 76800 Sep 24 10:34 timing

and there are 1000's of old files under that directory tree.

it seems unlikely that these directories should be "normal" corruption from the MDS resets or client evicts today.

I'm going to reboot back into the unpatched version and see if the directories come back to life.

cheers,
robin

Comment by SC Admin (Inactive) [ 09/Oct/18 ]

under the unpatched lustre the directory is ok

[root@john99 oz002]# mkdir mlower/blah
[root@john99 oz002]# ls -ld mlower
drwxrwsr-x 7 mlower oz002 76800 Oct  9 18:28 mlower
[root@john99 oz002]# lfs getdirstripe mlower
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     1           [0x280024621:0x196c4:0x0]
     2           [0x680024df2:0x19733:0x0]
     0           [0x2000153f2:0x196fb:0x0]
[root@john99 oz002]# rmdir mlower/blah

also another directory that the other user reported as "stale"

[root@john99 oz002]# mkdir /fred/oz013/smutch/mhysa_paper/tiamat_runs/ha_said_that_before/smf-reion/redshift/blah
[root@john99 oz002]# lfs getdirstripe /fred/oz013/smutch/mhysa_paper/tiamat_runs/ha_said_that_before/smf-reion/redshift
lmv_stripe_count: 3 lmv_stripe_offset: 2 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     2           [0x6800426d1:0xf9ff:0x0]
     0           [0x20003117b:0xfa5e:0x0]
     1           [0x280043a4c:0xfa30:0x0]
[root@john99 oz002]# lfs getdirstripe /fred/oz013/smutch/mhysa_paper/tiamat_runs/ha_said_that_before/smf-reion/redshift/blah
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     0           [0x200034fc0:0x15:0x0] 
     1           [0x2800478a0:0x15:0x0] 
     2           [0x680046518:0x15:0x0] 
[root@john99 oz002]# rmdir /fred/oz013/smutch/mhysa_paper/tiamat_runs/ha_said_that_before/smf-reion/redshift/blah

please advise.

cheers,
robin

Comment by Lai Siyao [ 09/Oct/18 ]

Can you apply the patch https://review.whamcloud.com/#/c/33248/, and collect debug log on MDS when mkdir fail with -ESTALE? And it's better to enable more debug with 'lctl set_param debug=+trace' on MDS.

Comment by SC Admin (Inactive) [ 09/Oct/18 ]

I rebooted back into the patched lustre and now there are no "stale" messages from either of those 3 dirs. WTF.

I also created a test dir in all

*/*/

level dirs in the basic hierarchy, and they worked ok.

dmesg for this boot is

[  109.053212] LNet: Using FMR for registration
[  109.057484] LNetError: 105:0:(o2iblnd_cb.c:2299:kiblnd_passive_connect()) Can't accept conn from 192.168.44.112@o2ib44 on NA (ib0:0:192.168.44.22): bad dst nid 192.168.44.22@o2ib44
[  109.128113] LNet: Added LNI 192.168.44.22@o2ib44 [128/2048/0/180]
[  109.221450] LustreError: 137-5: dagg-MDT0002_UUID: not available for connect from 192.168.44.117@o2ib44 (no target). If you are running an HA pair check that the target is mounted on the other server.
[  109.280702] Lustre: dagg-MDT0000: Not available for connect from 192.168.44.151@o2ib44 (not set up)
[  109.290251] Lustre: Skipped 1 previous similar message
[  109.391658] Lustre: dagg-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
[  109.448235] Lustre: dagg-MDT0000: Will be in recovery for at least 2:30, or until 130 clients reconnect
[  109.458172] Lustre: dagg-MDT0000: Connection restored to bc8ef6f5-4660-0761-54cd-06d7cfb2235b (at 192.168.44.125@o2ib44)
[  109.722365] LustreError: 137-5: dagg-MDT0002_UUID: not available for connect from 192.168.44.115@o2ib44 (no target). If you are running an HA pair check that the target is mounted on the other server.
[  109.741256] LustreError: Skipped 62 previous similar messages
[  109.980629] Lustre: dagg-MDT0000: Connection restored to d50b151f-16b0-b385-88bc-e4b15e5bb52f (at 192.168.44.180@o2ib44)
[  109.992059] Lustre: Skipped 28 previous similar messages
[  110.259366] LustreError: 11-0: dagg-MDT0000-osp-MDT0001: operation mds_connect to node 0@lo failed: rc = -114
[  110.322967] Lustre: dagg-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
[  110.847170] Lustre: dagg-MDT0001: Will be in recovery for at least 2:30, or until 129 clients reconnect
[  111.108931] Lustre: dagg-MDT0001: Connection restored to 8e7cc0d0-30b4-27c2-b52a-ce7fce8d8cf6 (at 192.168.44.152@o2ib44)
[  111.120439] Lustre: Skipped 6 previous similar messages
[  111.748516] LustreError: 137-5: dagg-MDT0002_UUID: not available for connect from 192.168.44.127@o2ib44 (no target). If you are running an HA pair check that the target is mounted on the other server.
[  111.767612] LustreError: Skipped 39 previous similar messages
[  112.246792] LustreError: 11-0: dagg-MDT0000-osp-MDT0002: operation mds_connect to node 0@lo failed: rc = -114
[  112.302956] Lustre: dagg-MDT0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
[  112.818342] Lustre: dagg-MDT0002: Will be in recovery for at least 2:30, or until 129 clients reconnect
[  113.469392] Lustre: dagg-MDT0000: Connection restored to  (at 192.168.44.117@o2ib44)
[  113.477819] Lustre: Skipped 11 previous similar messages
[  117.503259] Lustre: dagg-MDT0001: Connection restored to 016d10c9-cf22-7918-dae9-afb4faddd7b8 (at 192.168.44.108@o2ib44)
[  117.514808] Lustre: Skipped 188 previous similar messages
[  127.124365] Lustre: dagg-MDT0002: Connection restored to d0cfc850-c50a-96c5-3cf8-857dda21371f (at 192.168.44.135@o2ib44)
[  127.135932] Lustre: Skipped 149 previous similar messages
[  137.464707] LustreError: 19037:0:(llog_cat.c:269:llog_cat_id2handle()) dagg-MDT0002-osp-MDT0000: error opening log id [0x396:0x80046518:0x6]:0: rc = -2
[  137.479741] LustreError: 19037:0:(llog_cat.c:824:llog_cat_process_cb()) dagg-MDT0002-osp-MDT0000: cannot find handle for llog [0x396:0x80046518:0x6]: rc = -2
[  145.320362] Lustre: dagg-MDT0000: Connection restored to 192.168.44.31@o2ib44 (at 192.168.44.31@o2ib44)
[  145.330575] Lustre: Skipped 27 previous similar messages
[  155.674050] Lustre: 33352:0:(ldlm_lib.c:1773:extend_recovery_timer()) dagg-MDT0002: extended recovery timer reaching hard limit: 900, extend: 1
[  162.208865] Lustre: dagg-MDT0000: Recovery over after 0:53, of 130 clients 130 recovered and 0 were evicted.
[  162.252259] Lustre: 20858:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4
[  162.265526] LustreError: 49123:0:(osd_oi.c:859:osd_idc_find_or_init()) dagg-MDT0000: can't lookup: rc = -2
[  162.277914] LustreError: 20858:0:(lod_dev.c:420:lod_sub_recovery_thread()) dagg-MDT0000-osp-MDT0001 getting update log failed: rc = -22
[  163.190156] Lustre: 21405:0:(ldlm_lib.c:2059:target_recovery_overseer()) recovery is aborted, evict exports in recovery
[  163.202218] Lustre: dagg-MDT0001: disconnecting 1 stale clients
[  163.236719] Lustre: dagg-MDT0001: Recovery over after 0:53, of 130 clients 129 recovered and 1 was evicted.
[  163.247382] Lustre: Skipped 1 previous similar message

warble2 was hung again before I rebooted it into this patched version, which is the same as the previous time we were in this patched kernel.

some more info on the "stale" state ->
the "stale" was repeatable and from multiple nodes.
all dirs below the "stale" dir were also "stale" when trying to create a new dir in them.
files and dirs in the "stale" hierarchy were readable, just basically immutable.

I'm very unsure whether it's safe to leave it in this lustre version...
it's not really practical to go over every dir and create a test file or dir in there and see if it succeeds or not.

any thoughts?

cheers,
robin

Comment by SC Admin (Inactive) [ 09/Oct/18 ]

Hi Lai,

I've done some more sweeps through what dirs I can, and (I think) somewhere in

/fred/*/*/*/*/

the below messages appear but I haven't narrowed down where.

I haven't found a definitively "stale" dir tree again yet.

[ 2156.824871] LustreError: 20853:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0000-osp-MDT0001: fail to cancel 0 of 1 llog-records: rc = -116
[ 2161.808418] LustreError: 20853:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0000-osp-MDT0001: fail to cancel 1 of 1 llog-records: rc = -116
[ 2166.803554] LustreError: 20853:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0000-osp-MDT0001: fail to cancel 1 of 1 llog-records: rc = -116
[ 2166.818497] LustreError: 20853:0:(llog_cat.c:795:llog_cat_cancel_records()) Skipped 39 previous similar messages
[ 2686.442216] LustreError: 51719:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000237f5:0xefeb:0x0] doesn't exist!: rc = -14
[ 3164.577969] LustreError: 18521:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000237f5:0xefeb:0x0] doesn't exist!: rc = -14
[ 3274.971014] LustreError: 51719:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000237f5:0xefeb:0x0] doesn't exist!: rc = -14

anyway, I've left it in this patched lustre version for now.
I don't understand why "stale" directory trees would come and go though.

further advice would be appreciated.
thanks

cheers,
robin

Comment by Lai Siyao [ 09/Oct/18 ]

The error messages report that distributed log fail to cancel, which is not fatal and won't cause anything wrong normally, and I'll review code to see whether mdt_cross_open() failure is related.

Fine, next time you met with any failure, if you can collect debug logs in time, it may help.

Comment by SC Admin (Inactive) [ 10/Oct/18 ]

Hi Lai,

can you please explain more about why we would get "stale" directories on all clients on 1 boot and not on another. we have never seen this behaviour before.

I will try to remember to gather logs more diligently, but obviously I was expecting the same behaviour the next time, so thought I could gather logs for you if requested by rebooting back into the patched lustre again. at the time it was more important to stop users jobs dying. as I said, we haven't seen behaviours change randomly from boot to boot like this before, and are concerned that it will just randomly happen again.

I can't see any significant differences in the two dmesg connection logs posted above (same lustre, one "stale", one not), but hopefully you folks can.

thanks.

cheers,
robin

Comment by Lai Siyao [ 10/Oct/18 ]

llog_cat_cancel_records() is called after distributed transaction commit to remove replay logs, if for unknown reason it's not found, it will return -ESTALE, but since transaction was committed already, it won't cause any trouble.

mdt_cross_open() is to open a remote file (whose inode and dirent are located on different MDTs), if the inode corresponding to the FID doesn't exist, it returns -ESTALE, this normally happens when a concurrent unlink removed this inode.

But for lack of debug logs, I can't really know what happened in your case.

Comment by SC Admin (Inactive) [ 10/Oct/18 ]

Hi Lai,

thanks, but those are just the ongoing messages.

instead can you please look at the 2 dmesg's in the comments above that start with "LNet: Using FMR for registration".

these are the mount logs for the 2 attempts with the same lustre version, but one is "stale" and the other isn't.

as the "stale" issue started at boot time, this is the place to look. no?

thanks.

cheers,
robin

Comment by SC Admin (Inactive) [ 13/Oct/18 ]

Hi Lai,

the stale directory problem has re-emerged.
I've uploaded debug logs to ftp in uploads/lu11418/
I'll attach warble2/john99/Lustre syslog for the day.

I started debug_daemon on the MDS that has all 3 dagg MDTs on it, with -1 debugging.
then on the client ran

[root@john99 ~]# cd /fred/oz073/sjoudaki/g2files ; ls ; mkdir blah101 ; ls ; cd
athena_1.54_nov2013  athena_1.54_simulations  athena_1.54.tar.gz  blah99  cosmomc2015febsjallocate7binstrapzdatameadfitmargtrapz_ccorrection  cosmomcjuly2015sj  g2home
athena_1.54_nov2013      athena_1.54.tar.gz  blah99                                                              cosmomcjuly2015sj
athena_1.54_simulations  blah101             cosmomc2015febsjallocate7binstrapzdatameadfitmargtrapz_ccorrection  g2home
[root@john99 ~]# cd /fred/oz073/sjoudaki/g2files ; ls ; mkdir blah102 ; ls ; cd
athena_1.54_nov2013      athena_1.54.tar.gz  blah99                                                              cosmomcjuly2015sj
athena_1.54_simulations  blah101             cosmomc2015febsjallocate7binstrapzdatameadfitmargtrapz_ccorrection  g2home
athena_1.54_nov2013      athena_1.54.tar.gz  blah102  cosmomc2015febsjallocate7binstrapzdatameadfitmargtrapz_ccorrection  g2home
athena_1.54_simulations  blah101             blah99   cosmomcjuly2015sj
[root@john99 ~]# cd /fred/oz073/sjoudaki/g2files ; ls ; mkdir blah103 ; ls ; cd
athena_1.54_nov2013      athena_1.54.tar.gz  blah102  cosmomc2015febsjallocate7binstrapzdatameadfitmargtrapz_ccorrection  g2home
athena_1.54_simulations  blah101             blah99   cosmomcjuly2015sj
mkdir: cannot create directory ‘blah103’: Stale file handle
athena_1.54_nov2013      athena_1.54.tar.gz  blah102  cosmomc2015febsjallocate7binstrapzdatameadfitmargtrapz_ccorrection  g2home
athena_1.54_simulations  blah101             blah99   cosmomcjuly2015sj

so a couple of directory creates succeeded, but then "blah102" failed.
maybe 30% to 50% of the time they fail.

I also picked a random directory on the filesystem, made a dir for myself, and created 21 dirs and the results are below.

[root@john99 oz025]# mkdir rjh
[root@john99 oz025]# cd rjh
[root@john99 rjh]# mkdir 00
[root@john99 rjh]# mkdir 01
mkdir: cannot create directory ‘01’: Stale file handle
[root@john99 rjh]# mkdir 02
[root@john99 rjh]# mkdir 03
[root@john99 rjh]# mkdir 04
mkdir: cannot create directory ‘04’: Stale file handle
[root@john99 rjh]# mkdir 05
[root@john99 rjh]# mkdir 06
[root@john99 rjh]# mkdir 07
mkdir: cannot create directory ‘07’: Stale file handle
[root@john99 rjh]# mkdir 08
[root@john99 rjh]# mkdir 09
[root@john99 rjh]# mkdir 10
[root@john99 rjh]# mkdir 11
mkdir: cannot create directory ‘11’: Stale file handle
[root@john99 rjh]# mkdir 12
[root@john99 rjh]# mkdir 13
[root@john99 rjh]# mkdir 14
[root@john99 rjh]# mkdir 15
[root@john99 rjh]# mkdir 16
mkdir: cannot create directory ‘16’: Stale file handle
[root@john99 rjh]# mkdir 17
[root@john99 rjh]# mkdir 18
mkdir: cannot create directory ‘18’: Stale file handle
[root@john99 rjh]# mkdir 19
[root@john99 rjh]# mkdir 20
mkdir: cannot create directory ‘20’: Stale file handle
[root@john99 rjh]# lfs getdirstripe *
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     1           [0x280048072:0x1bed0:0x0]
     2           [0x680046cea:0x1bf38:0x0]
     0           [0x200035792:0x1bf5f:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 2 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     2           [0x680046ce9:0x1aab1:0x0]
     0           [0x200035791:0x1ab20:0x0]
     1           [0x280048071:0x1ab38:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     1           [0x280048072:0x1bed1:0x0]
     2           [0x680046cea:0x1bf39:0x0]
     0           [0x200035792:0x1bf60:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 2 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     2           [0x680046ce9:0x1aab2:0x0]
     0           [0x200035791:0x1ab21:0x0]
     1           [0x280048071:0x1ab39:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     1           [0x280048072:0x1bed2:0x0]
     2           [0x680046cea:0x1bf3a:0x0]
     0           [0x200035792:0x1bf61:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 2 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     2           [0x680046ce9:0x1aab3:0x0]
     0           [0x200035791:0x1ab22:0x0]
     1           [0x280048071:0x1ab3a:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     1           [0x280048072:0x1bed3:0x0]
     2           [0x680046cea:0x1bf3b:0x0]
     0           [0x200035792:0x1bf62:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 2 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     2           [0x680046ce9:0x1aab4:0x0]
     0           [0x200035791:0x1ab23:0x0]
     1           [0x280048071:0x1ab3b:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     1           [0x280048072:0x1bed4:0x0]
     2           [0x680046cea:0x1bf3c:0x0]
     0           [0x200035792:0x1bf63:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 2 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     2           [0x680046ce9:0x1aab5:0x0]
     0           [0x200035791:0x1ab24:0x0]
     1           [0x280048071:0x1ab3c:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     1           [0x280048072:0x1bed5:0x0]
     2           [0x680046cea:0x1bf3d:0x0]
     0           [0x200035792:0x1bf64:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 2 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     2           [0x680046ce9:0x1aab6:0x0]
     0           [0x200035791:0x1ab25:0x0]
     1           [0x280048071:0x1ab3d:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     1           [0x280048072:0x1bed6:0x0]
     2           [0x680046cea:0x1bf3e:0x0]
     0           [0x200035792:0x1bf65:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     1           [0x280048072:0x1bed7:0x0]
     2           [0x680046cea:0x1bf3f:0x0]
     0           [0x200035792:0x1bf66:0x0]

so it looks like all attempts to mkdir on mdt0 are failing?

this "stale" problem has only just started again this evening.
the filesystem has been fine for a few days now until this.
I ran chgrp/chmod sweeps and they completed ok.
I ran a namespace lfsck this afternoon and that corrected a bunch of things, and seemed to complete ok too.

lctl lfsck_start -M dagg-MDT0000 -t namespace -A -r

sometime during the lfsck, lustre started getting quota enforcement badly wrong, so I turned off inode group quota enforcing on the dagg filesystem until we can look at it further, and now these stale directories seem to be appearing.
no idea if that is all coincidence or is interrelated, so I'm just writing it all here for you to decide about.

I suspect nobody is around to help on a weekend, so I'm going to reboot the MDS now.
I'll try rebooting back into this same osd-zfs patched lustre and see if they're still stale.

cheers,
robin

Comment by SC Admin (Inactive) [ 14/Oct/18 ]

Hi Lai,

MDS crashes on reboot into any of our recent lustre versions. see LU-11516.
I've applied Andreas's patch to allow the filesystem to mount.

I've now removed the version 3 patch from https://review.whamcloud.com/#/c/33248/ from our build.
we only started to see "stale" directories after applying that patch.

please let us know what you glean from the debug logs about stale dirs above.

cheers,
robin

Comment by SC Admin (Inactive) [ 15/Oct/18 ]

Hi Lai,

so firstly, I've just noticed that the chmod/chgrp sweep that we did on the filesystem starting on 2018-10-13-14:44:52 also came up with a bunch of "stale" directories. I started the lfsck around Oct 13 19:49:28 which is well after the chmod/chgrp sweeps had finished.

so that means that the "stale" issue is definitively (IMHO) related to the patch in this ticket, and not to the lfsck.

secondly, all 1739 of these " Stale file handle" were directories (no files) and all on mdt0.

thirdly, this sweep also hit 396 "Disk quota exceeded" errors on directories (no files) messages (which I think should never happen as root), and these were heavily on mdt1 and mdt2 and not on mdt0. ie.

 count mdtidx
      3 0
    177 1
    170 2

more information is below...

so I actually did 2 chmod/chgrp sweeps. the first ran from 2018-10-10-14:11:22 to Oct 13 07:37 and it saw no "stale" or "quota" problems.

the second sweep was expected to be quick as almost all incorrect mode/ownership had already been fixed by the first. it ran from 2018-10-13-14:44:52 to Oct 13 18:18. now I have noticed that this second sweep did see "stale" and "quota" error directories.

everything from Oct 13 about warble2 in syslog in this time interval is already attached to this ticket in messages-john99.warble2.Lustre.20181014.txt.gz
please let me know if you'd like everything from 2018-10-10 to say, 2018-10-14.

(perhaps fourthly), the other thing I've noticed is that all the groups (== directories on Lustre) that saw "Stale file handle" on some tiny fraction of their mdt0 directories in this second sweep were the same as the groups that saw the "Disk quota exceeded" errors ->

[root@john99 resetGroups]# grep -v ^oz *2018-10-13-14:44:52 | grep -v 'No such file or directory' | grep 'Stale file handle' | cut -d. -f1 | uniq -c
    177 oz037
     74 oz050
    667 oz060
    105 oz065
     19 oz071
     48 oz074
   1150 oz100
    103 oz114
      5 oz123
     12 oz992
[root@john99 resetGroups]# grep -v ^oz *2018-10-13-14:44:52 | grep -v 'No such file or directory' | grep 'Disk quota exceeded' | cut -d. -f1 | uniq -c
     54 oz037
     17 oz050
     50 oz060
    106 oz065
      3 oz071
      7 oz074
    137 oz100
     20 oz114
      2 oz992

the exception in the above is the group oz123, but that kinda makes sense 'cos the the number of "Disk quota exceeded" messages for each group's sweep is less than their "Stale file handle" errors, and oz123 had the least of those.

so perhaps the "stale" problem and the "quota" problem are related somehow.

cheers,
robin

Comment by Lai Siyao [ 16/Oct/18 ]

-116 comes from llog_osd_declare_write_rec() --> osp_md_declare_write(), where 'opo_stale' is set:

00000040:00000001:9.0:1539446568.707951:0:51694:0:(llog_osd.c:322:llog_osd_declare_write_rec()) Process entered
00000040:00000001:9.0:1539446568.707952:0:51694:0:(llog_osd.c:340:llog_osd_declare_write_rec()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c)
00000040:00000001:9.0:1539446568.707952:0:51694:0:(llog.c:960:llog_declare_write_rec()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c)
00000040:00000001:9.0:1539446568.707953:0:51694:0:(llog_cat.c:141:llog_cat_new_log()) Process leaving via out (rc=18446744073709551500 : -116 : 0xffffffffffffff8c)
00000004:00000001:9.0:1539446568.707954:0:51694:0:(osp_trans.c:1567:osp_trans_stop()) Process entered
00000004:00080000:9.0:1539446568.707954:0:51694:0:(osp_trans.c:1595:osp_trans_stop()) dagg-MDT0002-osp-MDT0000: add oth ffff8c79137fbcc0 with version 0
00000004:00000010:9.0:1539446568.707955:0:51694:0:(osp_trans.c:224:osp_update_request_destroy()) kfreed 'ours->ours_req': 4096 at ffff8c42a03e0000.
00000004:00000010:9.0:1539446568.707955:0:51694:0:(osp_trans.c:225:osp_update_request_destroy()) kfreed 'ours': 32 at ffff8c4219dfe760.
00000004:00000010:9.0:1539446568.707956:0:51694:0:(osp_trans.c:252:osp_update_request_destroy()) kfreed 'our': 120 at ffff8c78f5bb1900.
00000004:00000010:9.0:1539446568.707956:0:51694:0:(osp_trans.c:935:osp_thandle_destroy()) kfreed 'oth': 88 at ffff8c79137fbcc0.
00000004:00000001:9.0:1539446568.707957:0:51694:0:(osp_trans.c:1616:osp_trans_stop()) Process leaving (rc=0 : 0 : 0)
00000040:00000001:9.0:1539446568.707957:0:51694:0:(llog_cat.c:207:llog_cat_new_log()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c)
00000040:00000001:9.0:1539446568.707958:0:51694:0:(llog_cat.c:674:llog_cat_declare_add_rec()) Process leaving via out (rc=18446744073709551500 : -116 : 0xffffffffffffff8c)
00000040:00000001:9.0:1539446568.707959:0:51694:0:(llog_cat.c:688:llog_cat_declare_add_rec()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c)
00000040:00000001:9.0:1539446568.707959:0:51694:0:(llog.c:1047:llog_declare_add()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c)
00000020:00000001:9.0:1539446568.707960:0:51694:0:(update_trans.c:136:sub_declare_updates_write()) Process leaving via out_put (rc=18446744073709551500 : -116 : 0xffffffffffffff8c)
00000020:00000040:9.0:1539446568.707961:0:51694:0:(lustre_log.h:380:llog_ctxt_put()) PUTting ctxt ffff8c79b7fffc80 : new refcount 1
00000020:00000001:9.0:1539446568.707962:0:51694:0:(update_trans.c:770:prepare_multiple_node_trans()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c)
00000020:00000001:9.0:1539446568.707962:0:51694:0:(update_trans.c:808:top_trans_start()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c)
00000004:00000001:9.0:1539446568.707963:0:51694:0:(mdd_dir.c:2460:mdd_create()) Process leaving via out_stop (rc=18446744073709551500 : -116 : 0xffffffffffffff8c)

I'm still trying to understand why it's set, and because the stale flag is set for dagg-MDT0002-osp-MDT0000, all the create starting from MDT0 failed.

Comment by Gerrit Updater [ 19/Oct/18 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33401
Subject: LU-11418 llog: refresh remote llog upon -ESTALE
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a46b01230162e12b82e85c1c84d9f07c9350a71d

Comment by SC Admin (Inactive) [ 22/Oct/18 ]

Hi Lai,

so is the idea to apply just https://review.whamcloud.com/33401 or is that in addition to https://review.whamcloud.com/33248 ?

cheers,
robin

Comment by Lai Siyao [ 22/Oct/18 ]

I think both are needed.

Comment by SC Admin (Inactive) [ 23/Oct/18 ]

Hi Lai,

do you think this also explains the quota errors that we were seeing, or could that be a separate (also stale?) problem?

I've built an image with these 2 patches in, and will reboot all lustre servers into it later today. I guess I just really have to do the MDS's though... ?

cheers,
robin

Comment by Gerrit Updater [ 23/Oct/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33248/
Subject: LU-11418 osd-zfs: call stop_cb if transaction start fail
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d614ff561cb6bd9a9b35310c99dd991c55797d97

Comment by SC Admin (Inactive) [ 23/Oct/18 ]

Hi Lai,

we've been running with these 2 patches for a few hours now. so far so good, but one client LBUG seen. presumably it's not a coincidence, but I guess it might be.

Message from syslogd@john98 at Oct 24 02:11:26 ...
 kernel: LustreError: 423673:0:(llite_lib.c:2047:ll_delete_inode()) ASSERTION( inode->i_data.nrpages == 0 ) failed: inode=[0x680049431:0x159b8:0x0](ffff9dfe6cb87590) nr
pages=1, see LU-118

Message from syslogd@john98 at Oct 24 02:11:26 ...
 kernel: LustreError: 423673:0:(llite_lib.c:2047:ll_delete_inode()) LBUG
<ConMan> Console [j98] log at 2018-10-24 02:00:00 AEDT.
2018-10-24 02:11:15 [1500494.537166] bash (421992): drop_caches: 2
2018-10-24 02:11:26 [1500505.349268] LustreError: 423673:0:(llite_lib.c:2047:ll_delete_inode()) ASSERTION( inode->i_data.nrpages == 0 ) failed: inode=[0x680049431:0x159b8:0x0](ffff9dfe6cb87590) nrpages=1, see LU-118
2018-10-24 02:11:26 [1500505.371195] LustreError: 423673:0:(llite_lib.c:2047:ll_delete_inode()) LBUG
2018-10-24 02:11:26 [1500505.379176] Pid: 423673, comm: rm 3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018
2018-10-24 02:11:26 [1500505.388966] Call Trace:
2018-10-24 02:11:26 [1500505.392442]  [<ffffffffc03f97cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
2018-10-24 02:11:26 [1500505.399994]  [<ffffffffc03f987c>] lbug_with_loc+0x4c/0xa0 [libcfs]
2018-10-24 02:11:26 [1500505.407191]  [<ffffffffc0accc87>] ll_delete_inode+0x1b7/0x1c0 [lustre]
2018-10-24 02:11:26 [1500505.414783]  [<ffffffff88c3c504>] evict+0xb4/0x180
2018-10-24 02:11:26 [1500505.420646]  [<ffffffff88c3ce0c>] iput+0xfc/0x190
2018-10-24 02:11:26 [1500505.426410]  [<ffffffff88c3126e>] do_unlinkat+0x1ae/0x2d0
2018-10-24 02:11:26 [1500505.432773]  [<ffffffff88c322eb>] SyS_unlinkat+0x1b/0x40
2018-10-24 02:11:26 [1500505.439065]  [<ffffffff8912579b>] system_call_fastpath+0x22/0x27
2018-10-24 02:11:26 [1500505.446021]  [<ffffffffffffffff>] 0xffffffffffffffff
2018-10-24 02:11:26 [1500505.452010] Kernel panic - not syncing: LBUG
2018-10-24 02:11:26 [1500505.457170] CPU: 27 PID: 423673 Comm: rm Kdump: loaded Tainted: P           OE  ------------ T 3.10.0-862.14.4.el7.x86_64 #1
2018-10-24 02:11:26 [1500505.469966] Hardware name: Dell Inc. PowerEdge R740/06G98X, BIOS 1.4.9 06/29/2018
2018-10-24 02:11:26 [1500505.478319] Call Trace:
2018-10-24 02:11:26 [1500505.481653]  [<ffffffff89113754>] dump_stack+0x19/0x1b
2018-10-24 02:11:26 [1500505.487655]  [<ffffffff8910d29f>] panic+0xe8/0x21f
2018-10-24 02:11:26 [1500505.493301]  [<ffffffffc03f98cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
2018-10-24 02:11:26 [1500505.500339]  [<ffffffffc0accc87>] ll_delete_inode+0x1b7/0x1c0 [lustre]
2018-10-24 02:11:26 [1500505.507709]  [<ffffffff88c3c504>] evict+0xb4/0x180
2018-10-24 02:11:26 [1500505.513354]  [<ffffffff88c3ce0c>] iput+0xfc/0x190
2018-10-24 02:11:26 [1500505.518909]  [<ffffffff88c3126e>] do_unlinkat+0x1ae/0x2d0
2018-10-24 02:11:26 [1500505.525140]  [<ffffffff88c25304>] ? SYSC_newfstatat+0x34/0x60
2018-10-24 02:11:26 [1500505.531703]  [<ffffffff88c322eb>] SyS_unlinkat+0x1b/0x40
2018-10-24 02:11:26 [1500505.537817]  [<ffffffff8912579b>] system_call_fastpath+0x22/0x27
2018-10-24 02:11:26 [    0.000000] Initializing cgroup subsys cpuset

cheers,
robin

Comment by SC Admin (Inactive) [ 24/Oct/18 ]

Hi Lai,

2 more MDT hung threads/crashes today. will get logs in a min.

at the moment this process is using 100% cpu on the DNE MDS.

root 50259 99.7 0.0 0 0 ? R 12:44 53:47 [orph_cleanup_da]

load is 28, and perf top says

Samples: 64K of event 'cycles:ppp', Event count (approx.): 38756058742
Overhead  Shared O  Symbol
   6.29%  [kernel]  [k] mutex_lock
   5.86%  [kernel]  [k] _raw_qspin_lock
   4.94%  [kernel]  [k] mutex_unlock
   4.36%  [kernel]  [k] vsnprintf
   4.04%  [kernel]  [k] arc_buf_access
   3.30%  [kernel]  [k] format_decode
   2.77%  [kernel]  [k] zap_leaf_lookup_closest
   2.57%  [kernel]  [k] down_read
   2.29%  [kernel]  [k] up_read
   2.12%  [kernel]  [k] number.isra.2
   1.89%  [kernel]  [k] zap_hash
   1.79%  [kernel]  [k] kfree
   1.67%  [kernel]  [k] dbuf_hash
   1.62%  [kernel]  [k] __dbuf_hold_impl
   1.52%  [kernel]  [k] dbuf_rele_and_unlock
   1.48%  [kernel]  [k] __kmalloc_node
   1.35%  [kernel]  [k] zrl_add_impl
   1.25%  [kernel]  [k] arc_access.isra.22
   1.22%  [kernel]  [k] strlen
   1.21%  [kernel]  [k] _cond_resched
   1.07%  [kernel]  [k] lprocfs_counter_add
   1.03%  [kernel]  [k] zrl_remove
   0.93%  [kernel]  [k] __memcpy

cheers,
robin

Comment by SC Admin (Inactive) [ 24/Oct/18 ]

Hi Lai,

conman for today on warble2 is attached.

summary:

I rebooted warble2 (and warble1) into the lustre with the 2 patches in this ticket at 2018-10-23 23:20

it lasted ok until 4:40am at which point the filesystem was hung. a find for directories that I was running hung at that time.

the hung MDT thread stack trace was something we haven't seen before.

2018-10-24 04:08:38 [17238.179626] LustreError: 52406:0:(osd_handler.c:228:osd_trans_start()) dagg-MDT0000: can't assign tx: rc = -17
2018-10-24 04:16:13 [17692.238162] LustreError: 52406:0:(mdd_orphans.c:473:orph_index_iterate()) dagg-MDD0000: fail to get FID for orphan it: rc = -2
2018-10-24 04:18:40 [17838.835695] LustreError: 52406:0:(mdd_orphans.c:473:orph_index_iterate()) dagg-MDD0000: fail to get FID for orphan it: rc = -2
2018-10-24 04:26:12 [18290.135146] LustreError: 52406:0:(osd_handler.c:228:osd_trans_start()) dagg-MDT0000: can't assign tx: rc = -17
2018-10-24 04:46:41 [19515.898070] INFO: task mdt00_000:19928 blocked for more than 120 seconds.
2018-10-24 04:46:41 [19515.905759] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2018-10-24 04:46:41 [19515.914443] mdt00_000       D ffff8a5495781fa0     0 19928      2 0x00000000
2018-10-24 04:46:41 [19515.922347] Call Trace:
2018-10-24 04:46:41 [19515.925601]  [<ffffffff99314029>] schedule+0x29/0x70
2018-10-24 04:46:41 [19515.931347]  [<ffffffff99315915>] rwsem_down_write_failed+0x225/0x3a0
2018-10-24 04:46:41 [19515.938591]  [<ffffffffc138aa30>] ? dmu_tx_hold_spill+0x20/0x40 [zfs]
2018-10-24 04:46:41 [19515.945787]  [<ffffffff98f5b267>] call_rwsem_down_write_failed+0x17/0x30
2018-10-24 04:46:41 [19515.953239]  [<ffffffffc07611e8>] ? spl_kmem_zalloc+0xd8/0x180 [spl]
2018-10-24 04:46:41 [19515.960330]  [<ffffffff9931319d>] down_write+0x2d/0x3d
2018-10-24 04:46:41 [19515.966232]  [<ffffffffc0cfbd3d>] llog_cat_declare_add_rec+0x4fd/0x630 [obdclass]
2018-10-24 04:46:41 [19515.974464]  [<ffffffffc0cf3618>] llog_declare_add+0x78/0x1a0 [obdclass]
2018-10-24 04:46:41 [19515.981938]  [<ffffffffc0f920fe>] top_trans_start+0x17e/0x940 [ptlrpc]
2018-10-24 04:46:41 [19515.989164]  [<ffffffffc16ee3f1>] lod_trans_start+0x31/0x40 [lod]
2018-10-24 04:46:41 [19515.995939]  [<ffffffffc178e1ea>] mdd_trans_start+0x1a/0x20 [mdd]
2018-10-24 04:46:41 [19516.002715]  [<ffffffffc177bcbc>] mdd_rename+0x65c/0x14a0 [mdd]
2018-10-24 04:46:41 [19516.009307]  [<ffffffffc12cf07a>] mdt_reint_rename_internal.isra.36+0x166a/0x20c0 [mdt]
2018-10-24 04:46:41 [19516.017985]  [<ffffffffc12d13fb>] mdt_reint_rename_or_migrate.isra.39+0x19b/0x860 [mdt]
2018-10-24 04:46:41 [19516.026666]  [<ffffffffc0eeaff0>] ? ldlm_blocking_ast_nocheck+0x310/0x310 [ptlrpc]
2018-10-24 04:46:41 [19516.034870]  [<ffffffffc0ee5440>] ? ldlm_expired_completion_wait+0x240/0x240 [ptlrpc]
2018-10-24 04:46:41 [19516.043310]  [<ffffffffc12d1af3>] mdt_reint_rename+0x13/0x20 [mdt]
2018-10-24 04:46:41 [19516.050113]  [<ffffffffc12d5b33>] mdt_reint_rec+0x83/0x210 [mdt]
2018-10-24 04:46:41 [19516.056716]  [<ffffffffc12b736b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
2018-10-24 04:46:41 [19516.063843]  [<ffffffffc12c2f07>] mdt_reint+0x67/0x140 [mdt]
2018-10-24 04:46:41 [19516.070116]  [<ffffffffc0f7e38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
2018-10-24 04:46:41 [19516.077586]  [<ffffffffc0f26e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
2018-10-24 04:46:41 [19516.085825]  [<ffffffffc0f23478>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
2018-10-24 04:46:41 [19516.093172]  [<ffffffff98ccf682>] ? default_wake_function+0x12/0x20
2018-10-24 04:46:41 [19516.099979]  [<ffffffff98cc52ab>] ? __wake_up_common+0x5b/0x90
2018-10-24 04:46:41 [19516.106364]  [<ffffffffc0f2a592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
2018-10-24 04:46:41 [19516.113165]  [<ffffffffc0f29b00>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc]
2018-10-24 04:46:41 [19516.121062]  [<ffffffff98cbb621>] kthread+0xd1/0xe0
2018-10-24 04:46:41 [19516.126438]  [<ffffffff98cbb550>] ? insert_kthread_work+0x40/0x40
2018-10-24 04:46:41 [19516.133034]  [<ffffffff993205dd>] ret_from_fork_nospec_begin+0x7/0x21
2018-10-24 04:46:41 [19516.139970]  [<ffffffff98cbb550>] ? insert_kthread_work+0x40/0x40

the MDS was powered off (a MDT wouldn't umount like usual, had to power off) and rebooted at ~11:20 am.

almost immediately multiple MDT threads were hung

2018-10-24 11:26:26 [ 1303.399664] Lustre: Skipped 2 previous similar messages
2018-10-24 11:26:26 [ 1303.420867] Lustre: dagg-MDT0001: Recovery over after 0:58, of 130 clients 127 recovered and 3 were evicted.
2018-10-24 11:26:52 [ 1329.318263] Lustre: dagg-MDT0001: Connection restored to 2e6d8ce0-afbe-2146-2dc7-13fe07e6b623 (at 192.168.44.209@o2ib44)
2018-10-24 11:26:52 [ 1329.330126] Lustre: Skipped 14 previous similar messages
2018-10-24 11:30:43 [ 1559.865427] INFO: task mdt_rdpg01_001:22882 blocked for more than 120 seconds.
2018-10-24 11:30:43 [ 1559.873617] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2018-10-24 11:30:43 [ 1559.882363] mdt_rdpg01_001  D ffff98b37adbcf10     0 22882      2 0x00000000
2018-10-24 11:30:43 [ 1559.890324] Call Trace:
2018-10-24 11:30:43 [ 1559.893659]  [<ffffffffc088b1e8>] ? spl_kmem_zalloc+0xd8/0x180 [spl]
2018-10-24 11:30:43 [ 1559.900865]  [<ffffffff90514029>] schedule+0x29/0x70
2018-10-24 11:30:43 [ 1559.906672]  [<ffffffff90515915>] rwsem_down_write_failed+0x225/0x3a0
2018-10-24 11:30:43 [ 1559.913942]  [<ffffffff9015b267>] call_rwsem_down_write_failed+0x17/0x30
2018-10-24 11:30:43 [ 1559.921469]  [<ffffffffc088b1e8>] ? spl_kmem_zalloc+0xd8/0x180 [spl]
2018-10-24 11:30:43 [ 1559.928636]  [<ffffffff9051319d>] down_write+0x2d/0x3d
2018-10-24 11:30:43 [ 1559.934620]  [<ffffffffc1010d3d>] llog_cat_declare_add_rec+0x4fd/0x630 [obdclass]
2018-10-24 11:30:43 [ 1559.942896]  [<ffffffffc0d882e7>] ? osp_update_request_create+0xa7/0x200 [osp]
2018-10-24 11:30:43 [ 1559.950934]  [<ffffffffc1008618>] llog_declare_add+0x78/0x1a0 [obdclass]
2018-10-24 11:30:43 [ 1559.958479]  [<ffffffffc17f70fe>] top_trans_start+0x17e/0x940 [ptlrpc]
2018-10-24 11:30:43 [ 1559.965779]  [<ffffffffc11d93f1>] lod_trans_start+0x31/0x40 [lod]
2018-10-24 11:30:43 [ 1559.972625]  [<ffffffffc16061ea>] mdd_trans_start+0x1a/0x20 [mdd]
2018-10-24 11:30:43 [ 1559.979462]  [<ffffffffc15fbb69>] mdd_attr_set+0x589/0xce0 [mdd]
2018-10-24 11:30:43 [ 1559.986244]  [<ffffffffc177fd57>] ? lustre_msg_add_version+0x27/0xa0 [ptlrpc]
2018-10-24 11:30:43 [ 1559.994119]  [<ffffffffc0f4b206>] mdt_mfd_close+0x1a6/0x610 [mdt]
2018-10-24 11:30:43 [ 1560.000932]  [<ffffffffc0f50981>] mdt_close_internal+0x121/0x220 [mdt]
2018-10-24 11:30:43 [ 1560.008164]  [<ffffffffc0f50ca0>] mdt_close+0x220/0x780 [mdt]
2018-10-24 11:30:43 [ 1560.014616]  [<ffffffffc17e338a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
2018-10-24 11:30:43 [ 1560.022177]  [<ffffffffc178be4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
2018-10-24 11:30:43 [ 1560.030502]  [<ffffffffc1788478>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
2018-10-24 11:30:43 [ 1560.037939]  [<ffffffff8fecf682>] ? default_wake_function+0x12/0x20
2018-10-24 11:30:43 [ 1560.044843]  [<ffffffff8fec52ab>] ? __wake_up_common+0x5b/0x90
2018-10-24 11:30:43 [ 1560.051313]  [<ffffffffc178f592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
2018-10-24 11:30:43 [ 1560.058208]  [<ffffffffc178eb00>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc]
2018-10-24 11:30:43 [ 1560.066197]  [<ffffffff8febb621>] kthread+0xd1/0xe0
2018-10-24 11:30:43 [ 1560.071670]  [<ffffffff8febb550>] ? insert_kthread_work+0x40/0x40
2018-10-24 11:30:43 [ 1560.078344]  [<ffffffff905205dd>] ret_from_fork_nospec_begin+0x7/0x21
2018-10-24 11:30:43 [ 1560.085351]  [<ffffffff8febb550>] ? insert_kthread_work+0x40/0x40
2018-10-24 11:30:43 [ 1560.092021] INFO: task dist_txn-2:37466 blocked for more than 120 seconds.
2018-10-24 11:30:43 [ 1560.099440] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2018-10-24 11:30:43 [ 1560.107812] dist_txn-2      D ffff985343136eb0     0 37466      2 0x00000000
2018-10-24 11:30:43 [ 1560.115448] Call Trace:
2018-10-24 11:30:43 [ 1560.118444]  [<ffffffff90514029>] schedule+0x29/0x70
2018-10-24 11:30:43 [ 1560.123940]  [<ffffffff90515915>] rwsem_down_write_failed+0x225/0x3a0
2018-10-24 11:30:43 [ 1560.130911]  [<ffffffffc0918718>] ? libcfs_debug_vmsg2+0x6d8/0xb40 [libcfs]
2018-10-24 11:30:43 [ 1560.138385]  [<ffffffff9015b267>] call_rwsem_down_write_failed+0x17/0x30
2018-10-24 11:30:43 [ 1560.145605]  [<ffffffff9051319d>] down_write+0x2d/0x3d
2018-10-24 11:30:43 [ 1560.151260]  [<ffffffffc10111b9>] llog_cat_id2handle+0x69/0x5b0 [obdclass]
2018-10-24 11:30:43 [ 1560.158616]  [<ffffffffc10124e0>] llog_cat_cancel_records+0x120/0x3d0 [obdclass]
2018-10-24 11:30:43 [ 1560.166479]  [<ffffffffc0918bd7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
2018-10-24 11:30:43 [ 1560.173580]  [<ffffffffc17f55e9>] distribute_txn_commit_thread+0x599/0xca0 [ptlrpc]
2018-10-24 11:30:43 [ 1560.181686]  [<ffffffff8fecf670>] ? wake_up_state+0x20/0x20
2018-10-24 11:30:43 [ 1560.187730]  [<ffffffffc17f5050>] ? sub_trans_commit_cb+0x20/0x20 [ptlrpc]
2018-10-24 11:30:43 [ 1560.195039]  [<ffffffff8febb621>] kthread+0xd1/0xe0
2018-10-24 11:30:43 [ 1560.200369]  [<ffffffff8febb550>] ? insert_kthread_work+0x40/0x40
2018-10-24 11:30:43 [ 1560.206892]  [<ffffffff905205dd>] ret_from_fork_nospec_begin+0x7/0x21
2018-10-24 11:30:43 [ 1560.213761]  [<ffffffff8febb550>] ? insert_kthread_work+0x40/0x40
2018-10-24 11:30:43 [ 1560.220277] INFO: task mdt01_008:39252 blocked for more than 120 seconds.
2018-10-24 11:30:43 [ 1560.227487] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2018-10-24 11:30:43 [ 1560.235740] mdt01_008       D ffff97f5ece00fd0     0 39252      2 0x00000000
2018-10-24 11:30:43 [ 1560.243269] Call Trace:
2018-10-24 11:30:43 [ 1560.246198]  [<ffffffffc12840f7>] ? dbuf_rele_and_unlock+0x357/0x4b0 [zfs]
...

and the same again after another reboot at 12:45

I took a -1 debug log at ~13:30.

at 14:00 we gave up on this, and rebooted warble2 back into the centos-7.5-minimal-server-06/ image which is the same as 07/ except does not have these 2 patches in it.

we then got a "normal" MDT thread hang very quickly after boot

2018-10-24 14:08:30 [  453.651553] LNet: Service thread pid 18826 was inactive for 200.28s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
2018-10-24 14:08:30 [  453.670221] Pid: 18826, comm: mdt_rdpg01_000 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018
2018-10-24 14:08:30 [  453.680691] Call Trace:
2018-10-24 14:08:30 [  453.683975]  [<ffffffffc1137047>] top_trans_wait_result+0xa6/0x155 [ptlrpc]
2018-10-24 14:08:30 [  453.691845]  [<ffffffffc111891b>] top_trans_stop+0x42b/0x930 [ptlrpc]
2018-10-24 14:08:30 [  453.699177]  [<ffffffffc14775f9>] lod_trans_stop+0x259/0x340 [lod]
2018-10-24 14:08:30 [  453.706189]  [<ffffffffc0de621a>] mdd_trans_stop+0x2a/0x46 [mdd]
2018-10-24 14:08:30 [  453.713014]  [<ffffffffc0ddbbcb>] mdd_attr_set+0x5eb/0xce0 [mdd]
2018-10-24 14:08:30 [  453.719810]  [<ffffffffc13f6206>] mdt_mfd_close+0x1a6/0x610 [mdt]
2018-10-24 14:08:30 [  453.726703]  [<ffffffffc13fb981>] mdt_close_internal+0x121/0x220 [mdt]
2018-10-24 14:08:30 [  453.734030]  [<ffffffffc13fbca0>] mdt_close+0x220/0x780 [mdt]
2018-10-24 14:08:30 [  453.740547]  [<ffffffffc110538a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
2018-10-24 14:08:30 [  453.748254]  [<ffffffffc10ade4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
2018-10-24 14:08:30 [  453.756697]  [<ffffffffc10b1592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
2018-10-24 14:08:30 [  453.763736]  [<ffffffff8b2bb621>] kthread+0xd1/0xe0
2018-10-24 14:08:30 [  453.769363]  [<ffffffff8b9205dd>] ret_from_fork_nospec_begin+0x7/0x21
2018-10-24 14:08:30 [  453.776543]  [<ffffffffffffffff>] 0xffffffffffffffff
2018-10-24 14:08:30 [  453.782255] LustreError: dumping log to /tmp/lustre-log.1540350510.18826

I suspended all jobs, then rebooted warble1 into this 06/ image without these 2 patches and then rebooted warble2 again at ~2018-10-24 14:39

better luck this time, and it's been running jobs and ok for half an hour or so now... presumably the reduced load from having jobs suspended when it mounted the fs's helped.

I'll upload the -1 debug daemon logs to the ftp now.

additional information is that when umount'ing the MDTs at 14:30 I noticed the same [orph_cleanup_da] process running as in the previous comment in this ticket which around ~13:30. so it kinda looked like the MDTs were umounting/reconnecting even though they up but hung.

cheers,
robin

Comment by SC Admin (Inactive) [ 24/Oct/18 ]

darn, sorry. forgot to upload the debug log until now. kinda assumed it would be useless to you so it slipped my mind.
it's on the ftp at /uploads/lu11418/20181024/

cheers,
robin

Comment by Lai Siyao [ 25/Oct/18 ]

https://review.whamcloud.com/33401 is not correct, which may cause such hung, I'll update it soon.

Comment by SC Admin (Inactive) [ 31/Oct/18 ]

Hi Lai,

should we be giving patch set 2 a try?

BTW for some reason the machine has been fairly stable (without the patches in this ticket) for almost a couple of weeks now. perhaps the namespace lfsck that finished and corrected some things has helped.

cheers,
robin

Comment by Lai Siyao [ 31/Oct/18 ]

yes, I added a test for set 2, which shouldn't cause system hung any more, you can give it a try now.

Comment by SC Admin (Inactive) [ 07/Nov/18 ]

Hi Lai,

we've just started running version 3 of the patch now. will let you know how it goes.
thanks!

cheers,
robin

Comment by SC Admin (Inactive) [ 08/Nov/18 ]

Hi Lai,

no obvious signs of problems yet. I'll start a chgrp/chmod correcting sweep soon I think.

FYI syslog messages so far are

Nov  7 15:00:58 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2
Nov  7 15:01:16 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2
Nov  7 15:03:13 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2
Nov  7 16:03:56 warble2 kernel: LustreError: 23402:0:(llog_cat.c:737:llog_cat_cancel_records()) dagg-MDT0000-osp-MDT0002: fail to cancel 1 of 1 llog-records: rc = -116
Nov  7 21:30:22 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:473:orph_index_iterate()) dagg-MDD0000: fail to get FID for orphan it: rc = -2
Nov  7 21:36:20 warble2 kernel: LustreError: 62076:0:(osd_handler.c:228:osd_trans_start()) dagg-MDT0000: can't assign tx: rc = -17
Nov  7 21:36:21 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:473:orph_index_iterate()) dagg-MDD0000: fail to get FID for orphan it: rc = -2
Nov  8 09:44:06 warble2 kernel: LustreError: 62076:0:(osd_handler.c:228:osd_trans_start()) dagg-MDT0000: can't assign tx: rc = -17
Nov  8 09:47:23 warble2 kernel: LustreError: 62076:0:(osd_handler.c:228:osd_trans_start()) dagg-MDT0000: can't assign tx: rc = -17
Nov  8 10:20:37 warble2 kernel: LustreError: 62076:0:(osd_handler.c:228:osd_trans_start()) dagg-MDT0000: can't assign tx: rc = -17
Nov  8 10:28:13 warble2 kernel: LustreError: 62076:0:(osd_handler.c:228:osd_trans_start()) dagg-MDT0000: can't assign tx: rc = -17
Nov  8 12:05:28 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:473:orph_index_iterate()) dagg-MDD0000: fail to get FID for orphan it: rc = -2
Nov  8 12:06:26 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2
Nov  8 12:06:27 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2
Nov  8 12:07:11 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2
Nov  8 12:07:35 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2
Nov  8 12:07:43 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2
Nov  8 12:07:43 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) Skipped 1 previous similar message
Nov  8 12:07:58 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2
Nov  8 12:07:58 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) Skipped 1 previous similar message
Nov  8 12:08:07 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:473:orph_index_iterate()) dagg-MDD0000: fail to get FID for orphan it: rc = -2
Nov  8 12:08:21 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2
Nov  8 12:10:09 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2
Nov  8 12:10:09 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) Skipped 1 previous similar message

as the last namespace lfsck completed ok (with the previous version of the patches in this ticket), should we now try a lfsck with -C to try and get rid of some of the above?

we currently mount with -o skip_lfsck. should we umount and mount without this first, in order that an oi scrub runs first? or just try an "all" types lfsck (dry-run to start with I guess)?

cheers,
robin

Comment by Lai Siyao [ 08/Nov/18 ]

Do you know which operation generated these messages? And did it cause any error?

'lfsck' with '-C' is to create missing MDT objects, I'm afraid it won't fix the the orphan_destory() failure. I'd suggest you do a dry-run and post the result here. IMhO for the orphan issue, we need to wait for https://review.whamcloud.com/#/c/32959/ to become ready.

Comment by SC Admin (Inactive) [ 09/Nov/18 ]

Hi Lai,

perhaps those error messages could print a bit more info eg. uid/gid/fid/path? playing "guess which file or dir" across a fs with 100M files and 100's of users is kinda hard...

or if you have a systemtap script or similar then I could try that...

hmm. so I have just noticed that this process has been running 100% of the time since boot.
from 'top':

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+  P WCHAN      COMMAND                                                                                                                                           
 62076 root      20   0       0      0      0 R 100.0  0.0   3671:06 10 -          orph_cleanup_da                                                                                                                                   
[warble2]root: uptime
 03:41:49 up 2 days, 13:28,  1 user,  load average: 4.37, 5.05, 3.71

ie. ~= 3688 minutes uptime and 3671 minutes of system time used by "orph_cleanup_da".

the process is only running on the MDS that has the 3 DNE MDTs on it. and not the other MDS that has non-DNE MDTs for the smaller filesystems.

I presume this process is a problem? I haven't noticed it using a whole core (or any time) previously, but TBH I haven't run 'top' than many times on the MDS.

otherwise, things seem to be going well... no hangs, no crashes, no reports from users, and the chgrp/chmod script ran and finished ok.

thanks for the lfsck advice. I ran this on the DNE MDS

lctl lfsck_start -M dagg-MDT0000 -A -r -s 2000 -C -n

and got this from layout (which is confusing 'cos it seems to never have started a phase2, nor completed)

[warble2]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_layout
name: lfsck_layout
magic: 0xb17371b9
version: 2
status: scanning-phase2
flags: scanned-once
param: dryrun,all_targets,create_mdtobj
last_completed_time: N/A
time_since_last_completed: N/A
latest_start_time: 1541733352
time_since_latest_start: 48563 seconds
last_checkpoint_time: 1541770550
time_since_last_checkpoint: 11365 seconds
latest_start_position: 266
last_checkpoint_position: 35184372088832
first_failure_position: 454780
success_count: 0
inconsistent_dangling: 0
inconsistent_unmatched_pair: 0
inconsistent_multiple_referenced: 0
inconsistent_orphan: 0
inconsistent_inconsistent_owner: 485295
inconsistent_others: 0
skipped: 0
failed_phase1: 0
failed_phase2: 0
checked_phase1: 30949901
checked_phase2: 0
run_time_phase1: 37186 seconds
run_time_phase2: 11365 seconds
average_speed_phase1: 832 items/sec
average_speed_phase2: 0 items/sec
real-time_speed_phase1: N/A
real-time_speed_phase2: 0 items/sec
current_position: [0x0:0x0:0x0]
name: lfsck_layout
magic: 0xb17371b9
version: 2
status: scanning-phase2
flags: scanned-once
param: dryrun,all_targets,create_mdtobj
last_completed_time: N/A
time_since_last_completed: N/A
latest_start_time: 1541733353
time_since_latest_start: 48562 seconds
last_checkpoint_time: 1541769672
time_since_last_checkpoint: 12243 seconds
latest_start_position: 266
last_checkpoint_position: 35184372088832
first_failure_position: 100527
success_count: 0
inconsistent_dangling: 0
inconsistent_unmatched_pair: 0
inconsistent_multiple_referenced: 0
inconsistent_orphan: 0
inconsistent_inconsistent_owner: 476352
inconsistent_others: 0
skipped: 0
failed_phase1: 0
failed_phase2: 0
checked_phase1: 30021155
checked_phase2: 0
run_time_phase1: 36305 seconds
run_time_phase2: 12243 seconds
average_speed_phase1: 826 items/sec
average_speed_phase2: 0 items/sec
real-time_speed_phase1: N/A
real-time_speed_phase2: 0 items/sec
current_position: [0x0:0x0:0x0]
name: lfsck_layout
magic: 0xb17371b9
version: 2
status: scanning-phase2
flags: scanned-once
param: dryrun,all_targets,create_mdtobj
last_completed_time: N/A
time_since_last_completed: N/A
latest_start_time: 1541733353
time_since_latest_start: 48562 seconds
last_checkpoint_time: 1541769893
time_since_last_checkpoint: 12022 seconds
latest_start_position: 266
last_checkpoint_position: 35184372088832
first_failure_position: 103676
success_count: 0
inconsistent_dangling: 1
inconsistent_unmatched_pair: 0
inconsistent_multiple_referenced: 0
inconsistent_orphan: 0
inconsistent_inconsistent_owner: 477885
inconsistent_others: 0
skipped: 0
failed_phase1: 0
failed_phase2: 0
checked_phase1: 30231327
checked_phase2: 0
run_time_phase1: 36526 seconds
run_time_phase2: 12022 seconds
average_speed_phase1: 827 items/sec
average_speed_phase2: 0 items/sec
real-time_speed_phase1: N/A
real-time_speed_phase2: 0 items/sec
current_position: [0x0:0x0:0x0]

and this from namespace

[warble2]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags: inconsistent
param: dryrun,all_targets,create_mdtobj
last_completed_time: 1541777443
time_since_last_completed: 4530 seconds
latest_start_time: 1541733353
time_since_latest_start: 48620 seconds
last_checkpoint_time: 1541777443
time_since_last_checkpoint: 4530 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: 5603881, [0x200011573:0x1:0x0], 0x2e1b6311f67b0000
checked_phase1: 35926809
checked_phase2: 13462985
inconsistent_phase1: 5
inconsistent_phase2: 23
failed_phase1: 0
failed_phase2: 1
directories: 4854362
dirent_inconsistent: 0
linkea_inconsistent: 0
nlinks_inconsistent: 0
multiple_linked_checked: 548868
multiple_linked_inconsistent: 0
unknown_inconsistency: 0
unmatched_pairs_inconsistent: 0
dangling_inconsistent: 34
multiple_referenced_inconsistent: 0
bad_file_type_inconsistent: 0
lost_dirent_inconsistent: 1
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 1
striped_dirs_inconsistent: 1
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 3625256
striped_shards_inconsistent: 0
striped_shards_failed: 0
striped_shards_skipped: 261
name_hash_inconsistent: 0
linkea_overflow_inconsistent: 0
success_count: 13
run_time_phase1: 37186 seconds
run_time_phase2: 6892 seconds
average_speed_phase1: 966 items/sec
average_speed_phase2: 1953 objs/sec
average_speed_total: 1120 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags: inconsistent
param: dryrun,all_targets,create_mdtobj
last_completed_time: 1541777356
time_since_last_completed: 4617 seconds
latest_start_time: 1541733353
time_since_latest_start: 48620 seconds
last_checkpoint_time: 1541777356
time_since_last_checkpoint: 4617 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: 557060, [0x280020f73:0x1:0x0], 0x637c6569a8f78000
checked_phase1: 35257340
checked_phase2: 13279657
inconsistent_phase1: 5
inconsistent_phase2: 0
failed_phase1: 0
failed_phase2: 2
directories: 4833782
dirent_inconsistent: 0
linkea_inconsistent: 0
nlinks_inconsistent: 0
multiple_linked_checked: 548660
multiple_linked_inconsistent: 0
unknown_inconsistency: 0
unmatched_pairs_inconsistent: 0
dangling_inconsistent: 90
multiple_referenced_inconsistent: 0
bad_file_type_inconsistent: 0
lost_dirent_inconsistent: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 2
striped_dirs_inconsistent: 2
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 3625273
striped_shards_inconsistent: 0
striped_shards_failed: 0
striped_shards_skipped: 261
name_hash_inconsistent: 0
linkea_overflow_inconsistent: 0
success_count: 17
run_time_phase1: 36305 seconds
run_time_phase2: 6805 seconds
average_speed_phase1: 971 items/sec
average_speed_phase2: 1951 objs/sec
average_speed_total: 1125 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags: inconsistent
param: dryrun,all_targets,create_mdtobj
last_completed_time: 1541777353
time_since_last_completed: 4620 seconds
latest_start_time: 1541733353
time_since_latest_start: 48620 seconds
last_checkpoint_time: 1541777353
time_since_last_checkpoint: 4620 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: 24644277, [0x68003ff89:0x20:0x0], 0x1a11d7ef27b38000
checked_phase1: 35471422
checked_phase2: 13283300
inconsistent_phase1: 26
inconsistent_phase2: 0
failed_phase1: 0
failed_phase2: 1
directories: 4841332
dirent_inconsistent: 0
linkea_inconsistent: 0
nlinks_inconsistent: 0
multiple_linked_checked: 549112
multiple_linked_inconsistent: 0
unknown_inconsistency: 0
unmatched_pairs_inconsistent: 0
dangling_inconsistent: 102
multiple_referenced_inconsistent: 0
bad_file_type_inconsistent: 0
lost_dirent_inconsistent: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 1
striped_dirs_inconsistent: 1
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 3625341
striped_shards_inconsistent: 0
striped_shards_failed: 0
striped_shards_skipped: 261
name_hash_inconsistent: 0
linkea_overflow_inconsistent: 0
success_count: 17
run_time_phase1: 36526 seconds
run_time_phase2: 6802 seconds
average_speed_phase1: 971 items/sec
average_speed_phase2: 1952 objs/sec
average_speed_total: 1125 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A

further advice would be much appreciated, but I will try not to do anything further today 'cos I don't want to break things over the weekend.

cheers,
robin

Comment by SC Admin (Inactive) [ 10/Nov/18 ]

Hi Lai,

FYI the "orph_cleanup_da" thread seems to be permanently looping doing ~900k reads/second to ZFS.

[warble2]root: arcstat.py 5
    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c
16:45:07   766     0      0     0    0     0    0     0    0    73G   74G
16:45:12  899K    54      0    54    0     0    0    54    0    73G   74G
16:45:17  899K    52      0    52    0     0    0    52    0    73G   74G
16:45:22  900K    50      0    50    0     0    0    50    0    73G   74G
...

the 900k rate is independent of (and at least 50x greater than) the actual iops we graph going to the dagg MDTs (typically 1k to 5k iops - probably 766 over the lifetime of the pool - the first number in arcstat). so I think the reads/s to ZFS's ARC are just that thread doing something crazy and nothing to do with any actual iops to the MDTs.

presumably it's in an infinite loop of some sort, and 900k is just as fast as it can go. hence the 100% system time on that core.

would it help to get a sysrq stack listing or a 'perf top' or something for you?
I don't see any large increase in IB traffic to the MDS that would explain 900k anything, but I could probably grab a lustre debug log if you like too.

cheers,
robin

Comment by SC Admin (Inactive) [ 10/Nov/18 ]

ah, ok - the MDS graphs of ZFS stats seem to indicate that the LU-11516 https://review.whamcloud.com/33368 workaround is probably the thing causing the thread to use 100% system time and do all the ZFS reads (rather than LBUG). ARC reads are 900k for most of the last month, but with a break of a week until this latest reboot.

cheers,
robin

Comment by Lai Siyao [ 12/Nov/18 ]

Yes, it looks to be so, I'll check code to see how to improve it.

As for the lfsck result, I need some time to verify, will update you later.

Comment by Gerrit Updater [ 13/Nov/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33401/
Subject: LU-11418 llog: refresh remote llog upon -ESTALE
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 71f409c9b31b90fa432f1f46ad4e612fb65c7fcc

Comment by Peter Jones [ 13/Nov/18 ]

So do I understand correctly that there will likely be at least one more patch to come tracked by this ticket?

Comment by Lai Siyao [ 13/Nov/18 ]

yes, Peter.

Comment by Lai Siyao [ 15/Nov/18 ]

I just uploaded a patch to fix the issue of orphan cleanup dead loop:
master branch: https://review.whamcloud.com/#/c/33661/
2.10 branch: https://review.whamcloud.com/#/c/33662/

You can wait autotest finish and then update MDS, if everything is okay. Then you can try lfsck in dry-run mode again, and post the result here.

Comment by SC Admin (Inactive) [ 22/Nov/18 ]

Hi Lai,

thanks. we've been running the 2nd patch in https://review.whamcloud.com/#/c/33662/ on the MDS's for almost 2 days now.
all good so far

I will update the OSS's to this version too and lfsck when there's a quiet time - unfortunately the filesystem is very busy at the moment.

cheers,
robin

Comment by Lai Siyao [ 06/Dec/18 ]

Any update?

Comment by SC Admin (Inactive) [ 06/Dec/18 ]

Hi Lai,

lustre's been stable for the last 2 week since that patch.
all OSS's were rebooted to match the MDS version about 10 days ago.
so looking really good.

I haven't run the lfsck yet, or explicitly tried to run stress tests to try to break anything.
just enjoying the stability for a while

I'll kick off a lfsck dry run today.

cheers,
robin

Comment by Peter Jones [ 09/Dec/18 ]

Robin

Is no news good news?

Peter

Comment by SC Admin (Inactive) [ 09/Dec/18 ]

Hi Peter and Lai,

not sure if it's good or bad
the lfsck just takes a while 'cos I set it to a slow rate...

this is namespace which has completed now

[warble2]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags: inconsistent
param: dryrun,all_targets,create_mdtobj
last_completed_time: 1544252914
time_since_last_completed: 75442 seconds
latest_start_time: 1544075850
time_since_latest_start: 252506 seconds
last_checkpoint_time: 1544252914
time_since_last_checkpoint: 75442 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: 5603849, [0x200011573:0x1:0x0], 0x3d0b02e56e00000
checked_phase1: 150469049
checked_phase2: 21386549
inconsistent_phase1: 6
inconsistent_phase2: 22
failed_phase1: 0
failed_phase2: 1
directories: 10159430
dirent_inconsistent: 0
linkea_inconsistent: 0
nlinks_inconsistent: 0
multiple_linked_checked: 886537
multiple_linked_inconsistent: 0
unknown_inconsistency: 0
unmatched_pairs_inconsistent: 0
dangling_inconsistent: 35
multiple_referenced_inconsistent: 0
bad_file_type_inconsistent: 0
lost_dirent_inconsistent: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 1
striped_dirs_inconsistent: 1
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 4492951
striped_shards_inconsistent: 0
striped_shards_failed: 0
striped_shards_skipped: 261
name_hash_inconsistent: 0
linkea_overflow_inconsistent: 0
success_count: 14
run_time_phase1: 164481 seconds
run_time_phase2: 12404 seconds
average_speed_phase1: 914 items/sec
average_speed_phase2: 1724 objs/sec
average_speed_total: 971 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags: inconsistent
param: dryrun,all_targets,create_mdtobj
last_completed_time: 1544251904
time_since_last_completed: 76452 seconds
latest_start_time: 1544075851
time_since_latest_start: 252505 seconds
last_checkpoint_time: 1544251904
time_since_last_checkpoint: 76452 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: 557060, [0x280020f73:0x1:0x0], 0x198ca8b508be8000
checked_phase1: 48829719
checked_phase2: 20829132
inconsistent_phase1: 5
inconsistent_phase2: 0
failed_phase1: 0
failed_phase2: 2
directories: 5965672
dirent_inconsistent: 0
linkea_inconsistent: 0
nlinks_inconsistent: 0
multiple_linked_checked: 478400
multiple_linked_inconsistent: 0
unknown_inconsistency: 0
unmatched_pairs_inconsistent: 0
dangling_inconsistent: 163
multiple_referenced_inconsistent: 0
bad_file_type_inconsistent: 0
lost_dirent_inconsistent: 0
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 2
striped_dirs_inconsistent: 2
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 4474054
striped_shards_inconsistent: 0
striped_shards_failed: 0
striped_shards_skipped: 261
name_hash_inconsistent: 0
linkea_overflow_inconsistent: 0
success_count: 18
run_time_phase1: 56418 seconds
run_time_phase2: 11394 seconds
average_speed_phase1: 865 items/sec
average_speed_phase2: 1828 objs/sec
average_speed_total: 1027 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A
name: lfsck_namespace
magic: 0xa0621a0b
version: 2
status: completed
flags: inconsistent
param: dryrun,all_targets,create_mdtobj
last_completed_time: 1544251913
time_since_last_completed: 76443 seconds
latest_start_time: 1544075851
time_since_latest_start: 252505 seconds
last_checkpoint_time: 1544251913
time_since_last_checkpoint: 76443 seconds
latest_start_position: 266, N/A, N/A
last_checkpoint_position: 35184372088832, N/A, N/A
first_failure_position: 24644277, [0x68003ff89:0x20:0x0], 0x1a11d7ef27b38000
checked_phase1: 49406594
checked_phase2: 20846709
inconsistent_phase1: 26
inconsistent_phase2: 1
failed_phase1: 0
failed_phase2: 1
directories: 5984370
dirent_inconsistent: 0
linkea_inconsistent: 0
nlinks_inconsistent: 0
multiple_linked_checked: 479388
multiple_linked_inconsistent: 0
unknown_inconsistency: 0
unmatched_pairs_inconsistent: 0
dangling_inconsistent: 160
multiple_referenced_inconsistent: 0
bad_file_type_inconsistent: 0
lost_dirent_inconsistent: 1
local_lost_found_scanned: 0
local_lost_found_moved: 0
local_lost_found_skipped: 0
local_lost_found_failed: 0
striped_dirs_scanned: 1
striped_dirs_inconsistent: 1
striped_dirs_failed: 0
striped_dirs_disabled: 0
striped_dirs_skipped: 0
striped_shards_scanned: 4474199
striped_shards_inconsistent: 0
striped_shards_failed: 0
striped_shards_skipped: 261
name_hash_inconsistent: 0
linkea_overflow_inconsistent: 0
success_count: 18
run_time_phase1: 57069 seconds
run_time_phase2: 11403 seconds
average_speed_phase1: 865 items/sec
average_speed_phase2: 1828 objs/sec
average_speed_total: 1026 items/sec
real_time_speed_phase1: N/A
real_time_speed_phase2: N/A
current_position: N/A

layout is still running. I think it's stuck

[warble2]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_layout
name: lfsck_layout
magic: 0xb17371b9
version: 2
status: scanning-phase2
flags: scanned-once
param: dryrun,all_targets,create_mdtobj
last_completed_time: 1541789825
time_since_last_completed: 2538610 seconds
latest_start_time: 1544075847
time_since_latest_start: 252588 seconds
last_checkpoint_time: 1544240510
time_since_last_checkpoint: 87925 seconds
latest_start_position: 266
last_checkpoint_position: 35184372088832
first_failure_position: 436180
success_count: 1
inconsistent_dangling: 0
inconsistent_unmatched_pair: 0
inconsistent_multiple_referenced: 0
inconsistent_orphan: 0
inconsistent_inconsistent_owner: 532160
inconsistent_others: 0
skipped: 0
failed_phase1: 0
failed_phase2: 0
checked_phase1: 140103851
checked_phase2: 0
run_time_phase1: 164481 seconds
run_time_phase2: 87926 seconds
average_speed_phase1: 851 items/sec
average_speed_phase2: 0 items/sec
real-time_speed_phase1: N/A
real-time_speed_phase2: 0 items/sec
current_position: [0x0:0x0:0x0]
name: lfsck_layout
magic: 0xb17371b9
version: 2
status: scanning-phase2
flags: scanned-once
param: dryrun,all_targets,create_mdtobj
last_completed_time: 1541789834
time_since_last_completed: 2538601 seconds
latest_start_time: 1544075851
time_since_latest_start: 252584 seconds
last_checkpoint_time: 1544132317
time_since_last_checkpoint: 196118 seconds
latest_start_position: 266
last_checkpoint_position: 35184372088832
first_failure_position: 100570
success_count: 1
inconsistent_dangling: 0
inconsistent_unmatched_pair: 0
inconsistent_multiple_referenced: 0
inconsistent_orphan: 0
inconsistent_inconsistent_owner: 519932
inconsistent_others: 0
skipped: 0
failed_phase1: 0
failed_phase2: 0
checked_phase1: 42500398
checked_phase2: 0
run_time_phase1: 56418 seconds
run_time_phase2: 196119 seconds
average_speed_phase1: 753 items/sec
average_speed_phase2: 0 items/sec
real-time_speed_phase1: N/A
real-time_speed_phase2: 0 items/sec
current_position: [0x0:0x0:0x0]
name: lfsck_layout
magic: 0xb17371b9
version: 2
status: scanning-phase2
flags: scanned-once
param: dryrun,all_targets,create_mdtobj
last_completed_time: 1541789846
time_since_last_completed: 2538589 seconds
latest_start_time: 1544075851
time_since_latest_start: 252584 seconds
last_checkpoint_time: 1544132971
time_since_last_checkpoint: 195464 seconds
latest_start_position: 266
last_checkpoint_position: 35184372088832
first_failure_position: 103878
success_count: 1
inconsistent_dangling: 2
inconsistent_unmatched_pair: 0
inconsistent_multiple_referenced: 0
inconsistent_orphan: 0
inconsistent_inconsistent_owner: 521695
inconsistent_others: 0
skipped: 0
failed_phase1: 0
failed_phase2: 0
checked_phase1: 43095736
checked_phase2: 0
run_time_phase1: 57069 seconds
run_time_phase2: 195464 seconds
average_speed_phase1: 755 items/sec
average_speed_phase2: 0 items/sec
real-time_speed_phase1: N/A
real-time_speed_phase2: 0 items/sec
current_position: [0x0:0x0:0x0]

the only thing changing in the output above is the time counters. the lfsck/lfsk_layout processes are still there but not showing up as being active in 'top'.

I've stopped it now ->

[warble2]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_layout
name: lfsck_layout
magic: 0xb17371b9
version: 2
status: stopped
flags: scanned-once
param: dryrun,all_targets,create_mdtobj
last_completed_time: 1541789825
time_since_last_completed: 2539497 seconds
latest_start_time: 1544075847
time_since_latest_start: 253475 seconds
last_checkpoint_time: 1544329255
time_since_last_checkpoint: 67 seconds
latest_start_position: 266
last_checkpoint_position: 35184372088832
first_failure_position: 436180
success_count: 1
inconsistent_dangling: 0
inconsistent_unmatched_pair: 0
inconsistent_multiple_referenced: 0
inconsistent_orphan: 0
inconsistent_inconsistent_owner: 532160
inconsistent_others: 0
inconsistent_inconsistent_owner: 532160
inconsistent_others: 0
skipped: 0
failed_phase1: 0
failed_phase2: 0
checked_phase1: 140103851
checked_phase2: 0
run_time_phase1: 164481 seconds
run_time_phase2: 88745 seconds
average_speed_phase1: 851 items/sec
average_speed_phase2: 0 objs/sec
real-time_speed_phase1: N/A
real-time_speed_phase2: N/A
current_position: N/A
name: lfsck_layout
magic: 0xb17371b9
version: 2
status: stopped
flags: scanned-once
param: dryrun,all_targets,create_mdtobj
last_completed_time: 1541789834
time_since_last_completed: 2539488 seconds
latest_start_time: 1544075851
time_since_latest_start: 253471 seconds
last_checkpoint_time: 1544329255
time_since_last_checkpoint: 67 seconds
latest_start_position: 266
last_checkpoint_position: 35184372088832
first_failure_position: 100570
success_count: 1
inconsistent_dangling: 0
inconsistent_unmatched_pair: 0
inconsistent_multiple_referenced: 0
inconsistent_orphan: 0
inconsistent_inconsistent_owner: 519932
inconsistent_others: 0
skipped: 0
failed_phase1: 0
failed_phase2: 0
checked_phase1: 42500398
checked_phase2: 0
run_time_phase1: 56418 seconds
run_time_phase2: 196938 seconds
run_time_phase1: 56418 seconds
run_time_phase2: 196938 seconds
average_speed_phase1: 753 items/sec
average_speed_phase2: 0 objs/sec
real-time_speed_phase1: N/A
real-time_speed_phase2: N/A
current_position: N/A
name: lfsck_layout
magic: 0xb17371b9
version: 2
status: stopped
flags: scanned-once
param: dryrun,all_targets,create_mdtobj
last_completed_time: 1541789846
time_since_last_completed: 2539476 seconds
latest_start_time: 1544075851
time_since_latest_start: 253471 seconds
last_checkpoint_time: 1544329255
time_since_last_checkpoint: 67 seconds
latest_start_position: 266
last_checkpoint_position: 35184372088832
first_failure_position: 103878
success_count: 1
inconsistent_dangling: 2
inconsistent_unmatched_pair: 0
inconsistent_multiple_referenced: 0
inconsistent_orphan: 0
inconsistent_inconsistent_owner: 521695
inconsistent_others: 0
skipped: 0
failed_phase1: 0
failed_phase2: 0
checked_phase1: 43095736
checked_phase2: 0
run_time_phase1: 57069 seconds
run_time_phase2: 196283 seconds
average_speed_phase1: 755 items/sec
average_speed_phase2: 0 objs/sec
real-time_speed_phase1: N/A
real-time_speed_phase2: N/A
current_position: N/A

I should point out that we're still mounting these 3 MDTs with -o skip_lfsck.
should we mount them in the normal way first, before doing these lfsck's? I'm not sure of the difference.

cheers,
robin

Comment by Lai Siyao [ 10/Dec/18 ]

Robin, I'm still working on a patch to fix bugs in striped directory lfsck, I'll update you when it's finished, and I think it may help because many inconsistencies in your system are in striped directory. And next time when you think system stalls, can you use sysrq to dump all processes backtraces on all servers, and it's also helps to dump debug logs.

Comment by Gerrit Updater [ 23/Jan/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33661/
Subject: LU-11418 mdd: delete name if orphan doesn't exist
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: fffef5c29e3bdf0f96168abc3d0488bad06f33bb

Comment by Peter Jones [ 23/Jan/19 ]

The existing patch has landed for 2.13 and could now potentially be included in 2.10.x or 2.12.x maintenance releases. Lai, if you're still working on a further patch, what ticket is it being tracked under?

Comment by Lai Siyao [ 24/Jan/19 ]

Peter, it's tracked under LU-11681, when it passed reviews, I'll backport them to 2.10.

Comment by Gerrit Updater [ 26/Feb/19 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34326
Subject: LU-11418 mdd: delete name if orphan doesn't exist
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: ec0944fc22a351b44332984050606f0efb1d3b63

Comment by Gerrit Updater [ 26/Feb/19 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34327
Subject: LU-11418 mdd: delete name if orphan doesn't exist
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 694a92ec774d5bd958a61f457fc64380feb95db2

Comment by Gerrit Updater [ 01/Apr/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34326/
Subject: LU-11418 mdd: delete name if orphan doesn't exist
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 6a412a8671d3d76b5da55c08ada011e7aeea1e8c

Generated at Sat Feb 10 02:43:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.