[LU-11418] hung threads on MDT and MDT won't umount Created: 22/Sep/18 Updated: 02/May/22 Resolved: 23/Jan/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.4 |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.1 |
| Type: | Bug | Priority: | Minor |
| Reporter: | SC Admin (Inactive) | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
x86_64, zfs, 3 MDTs, all on 1 MDS, , 2.10.4 + many patches ~= 2.10.5 to 2.12 |
||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Epic/Theme: | dne | ||||||||||||||||||||||||||||||||||||||||
| Severity: | 2 | ||||||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||||||||||
| Description |
|
Hi, unfortunately once again similar/same symptoms as chgrp/chmod sweep across files and directories results in eventual total hang of the filesystem. hung MDT threads. one MDT won't umount. MDS has to be powered off to fix the fs. processes that are stuck on the client doing the sweep are root 142716 0.0 0.0 108252 116 pts/1 S 01:33 0:34 xargs -0 -n5 chgrp -h oz044 root 236217 0.0 0.0 108252 116 pts/1 S 01:15 0:25 xargs -0 -n5 chgrp -h oz065 root 385816 0.0 0.0 108252 116 pts/1 S 05:34 0:15 xargs -0 -n5 chgrp -h oz100 root 418923 0.0 0.0 120512 136 pts/1 S 09:34 0:00 chgrp -h oz100 oz100/pipes/DWF_PIPE/MARY_WORK/Antlia_170206_msystembis5_8/ccd5/catalogs/candidates.cat oz100/pipes/DWF_PIPE/MARY_WORK/Antlia_170206_msystembis5_8/ccd5/catalogs/candidates_ranked.cat oz100/pipes/DWF_PIPE/MARY_WORK/Antlia_170206_msystembis5_8/ccd5/catalogs/candidates_full.cat oz100/pipes/DWF_PIPE/MARY_WORK/Antlia_170206_msystembis5_8/ccd5/images oz100/pipes/DWF_PIPE/MARY_WORK/Antlia_170206_msystembis5_8/ccd46 root 418944 0.0 0.0 120512 136 pts/1 S 09:34 0:01 chgrp -h oz044 oz044/mbernet/c_cpp/dust_prc/src/pgplot/sys_msdos/msdriv.f oz044/mbernet/c_cpp/dust_prc/src/pgplot/sys_msdos/grexec.f oz044/mbernet/c_cpp/dust_prc/src/pgplot/sys_msdos/grdos.f oz044/mbernet/c_cpp/dust_prc/src/pgplot/makemake oz044/mbernet/c_cpp/dust_prc/src/pgplot/sys root 418947 0.0 0.0 120512 136 pts/1 S 09:34 0:00 chgrp -h oz065 oz065/OpenFOAM/szhu-v1806/run/Deen/LES/run09_multiperforation_periodic/3x3/fine_GraceDrag_constantLift_defaultLES_probe2_u_0p005_ozstar/processor14/85/uniform/functionObjects oz065/OpenFOAM/szhu-v1806/run/Deen/LES/run09_multiperforation_periodic/3x3/fine_GraceDrag_constantLift_defaultLES_probe2_u_0p005_ozstar/processor14/85/uniform/functionObjects/functionObjectProperties oz065/OpenFOAM/szhu-v1806/run/Deen/LES/run09_multiperforation_periodic/3x3/fine_GraceDrag_constantLift_defaultLES_probe2_u_0p005_ozstar/processor14/85/alpha.water oz065/OpenFOAM/szhu-v1806/run/Deen/LES/run09_multiperforation_periodic/3x3/fine_GraceDrag_constantLift_defaultLES_probe2_u_0p005_ozstar/processor14/85/Ur oz065/OpenFOAM/szhu-v1806/run/Deen/LES/run09_multiperforation_periodic/3x3/fine_GraceDrag_constantLift_defaultLES_probe2_u_0p005_ozstar/processor14/85/p... I can't see any rc=-116 in the logs this time. first hung thread is Sep 22 09:37:39 warble2 kernel: LNet: Service thread pid 458124 was inactive for 200.31s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Sep 22 09:37:39 warble2 kernel: Pid: 458124, comm: mdt01_095 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018 Sep 22 09:37:39 warble2 kernel: Call Trace: Sep 22 09:37:39 warble2 kernel: [<ffffffffc159c047>] top_trans_wait_result+0xa6/0x155 [ptlrpc] Sep 22 09:37:39 warble2 kernel: [<ffffffffc157d91b>] top_trans_stop+0x42b/0x930 [ptlrpc] Sep 22 09:37:39 warble2 kernel: [<ffffffffc16d65f9>] lod_trans_stop+0x259/0x340 [lod] Sep 22 09:37:39 warble2 kernel: [<ffffffffc177423a>] mdd_trans_stop+0x2a/0x46 [mdd] Sep 22 09:37:39 warble2 kernel: [<ffffffffc1769bcb>] mdd_attr_set+0x5eb/0xce0 [mdd] Sep 22 09:37:39 warble2 kernel: [<ffffffffc0ff65f5>] mdt_reint_setattr+0xba5/0x1060 [mdt] Sep 22 09:37:39 warble2 kernel: [<ffffffffc0ff6b33>] mdt_reint_rec+0x83/0x210 [mdt] Sep 22 09:37:39 warble2 kernel: [<ffffffffc0fd836b>] mdt_reint_internal+0x5fb/0x9c0 [mdt] Sep 22 09:37:39 warble2 kernel: [<ffffffffc0fe3f07>] mdt_reint+0x67/0x140 [mdt] Sep 22 09:37:39 warble2 kernel: [<ffffffffc156a38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc] Sep 22 09:37:39 warble2 kernel: [<ffffffffc1512e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] Sep 22 09:37:39 warble2 kernel: [<ffffffffc1516592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] Sep 22 09:37:39 warble2 kernel: [<ffffffffb64bb621>] kthread+0xd1/0xe0 Sep 22 09:37:39 warble2 kernel: [<ffffffffb6b205dd>] ret_from_fork_nospec_begin+0x7/0x21 Sep 22 09:37:39 warble2 kernel: [<ffffffffffffffff>] 0xffffffffffffffff Sep 22 09:37:39 warble2 kernel: LustreError: dumping log to /tmp/lustre-log.1537573059.458124 there was a subnet manager crash and restart about 15 minutes before the MDS threads hung this time, but I don't think that's related. first lustre-log for warble2 and syslog for the cluster are attached. I also did a sryrq 't' and 'w' before resetting warble2, so that may be of help to you. please let us know if you'd like anything else. cheers, |
| Comments |
| Comment by Peter Jones [ 23/Sep/18 ] |
|
Lai Can you please advise? Robin Even if Lai can proceed without a crash dump on this occasion, I think that having this option is bound to be useful at some point. Peter |
| Comment by SC Admin (Inactive) [ 24/Sep/18 ] |
|
Hi, there was a -116 in the logs in todays hang/crash. syslog attached. in case it's not clear - this ticket is our highest priority of all our tickets. Peter - we have some compute node crash dump's working now, and will start to work on servers. cheers, |
| Comment by SC Admin (Inactive) [ 24/Sep/18 ] |
|
another hang. I'll attach the syslog. this hang has no rc = -116. for the 2 crashes today, I can't see any chmod/chgrp's on the system. there are a about 8 rsync's running on the login nodes from various users, so I suspect they are probably triggering it. probably tangential, but there are new mdt_cross_open errors in the logs as of today too. can't see them in previous logs. # zegrep 'declared for insert|invalid length 0 in llog|rc = -116|mdt_cross_open' /var/log/messages-201809* /var/log/messages /var/log/messages-20180901.gz:Sep 1 02:19:25 warble1 kernel: Lustre: 400208:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4 /var/log/messages-20180904.gz:Sep 3 18:51:57 warble1 kernel: LustreError: 400206:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0002-osp-MDT0001: fail to cancel 0 of 1 llog-records: rc = -116 /var/log/messages-20180904.gz:Sep 3 19:46:28 warble2 kernel: Lustre: 254384:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4 /var/log/messages-20180904.gz:Sep 3 19:48:17 warble2 kernel: LustreError: 254382:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0002-osp-MDT0001: fail to cancel 0 of 1 llog-records: rc = -116 /var/log/messages-20180904.gz:Sep 3 20:42:32 warble2 kernel: Lustre: 23441:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4 /var/log/messages-20180904.gz:Sep 3 22:13:47 warble2 kernel: Lustre: 20681:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4 /var/log/messages-20180904.gz:Sep 3 22:30:36 warble2 kernel: LustreError: 19030:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0001-osp-MDT0000: fail to cancel 0 of 1 llog-records: rc = -116 /var/log/messages-20180904.gz:Sep 3 23:01:33 warble2 kernel: Lustre: 20333:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4 /var/log/messages-20180905.gz:Sep 5 02:04:43 warble2 kernel: LustreError: 25207:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0000-osp-MDT0002: fail to cancel 0 of 1 llog-records: rc = -116 /var/log/messages-20180905.gz:Sep 5 02:14:27 warble2 kernel: Lustre: 27387:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4 /var/log/messages-20180905.gz:Sep 5 02:26:36 warble2 kernel: LustreError: 20423:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0001-osp-MDT0000: fail to cancel 0 of 1 llog-records: rc = -116 /var/log/messages-20180905.gz:Sep 5 02:46:13 warble2 kernel: Lustre: 20430:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4 /var/log/messages-20180905.gz:Sep 5 02:47:44 warble2 kernel: LustreError: 20428:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0000-osp-MDT0001: fail to cancel 0 of 1 llog-records: rc = -116 /var/log/messages-20180905.gz:Sep 5 03:00:55 warble2 kernel: Lustre: 21170:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4 /var/log/messages-20180905.gz:Sep 5 03:01:01 warble2 kernel: LustreError: 34530:0:(osd_index.c:588:osd_dir_insert()) dagg-MDT0000-osd: [0x28003a5b7:0x4589:0x0] wasn't declared for insert /var/log/messages-20180906.gz:Sep 5 08:02:43 warble2 kernel: LustreError: 21168:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0000-osp-MDT0001: fail to cancel 0 of 1 llog-records: rc = -116 /var/log/messages-20180906.gz:Sep 5 11:29:53 warble2 kernel: Lustre: 25097:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4 /var/log/messages-20180906.gz:Sep 5 11:29:55 warble2 kernel: LustreError: 39383:0:(osd_index.c:588:osd_dir_insert()) dagg-MDT0000-osd: [0x680036878:0x1f73f:0x0] wasn't declared for insert /var/log/messages-20180906.gz:Sep 5 11:36:19 warble2 kernel: LustreError: 21644:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0002-osp-MDT0000: fail to cancel 0 of 1 llog-records: rc = -116 /var/log/messages-20180906.gz:Sep 5 12:15:49 warble2 kernel: Lustre: 20745:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4 /var/log/messages-20180912.gz:Sep 12 02:30:12 warble2 kernel: Lustre: 433349:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4 /var/log/messages-20180919.gz:Sep 19 01:04:01 warble2 kernel: Lustre: 21812:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4 /var/log/messages-20180919.gz:Sep 19 01:04:09 warble2 kernel: LustreError: 19129:0:(osd_index.c:588:osd_dir_insert()) dagg-MDT0000-osd: [0x68003d573:0x10e9d:0x0] wasn't declared for insert /var/log/messages-20180923.gz:Sep 22 16:40:23 warble2 kernel: Lustre: 21330:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4 /var/log/messages-20180923.gz:Sep 22 16:52:47 warble2 kernel: Lustre: 30865:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4 /var/log/messages-20180923.gz:Sep 22 16:52:47 warble2 kernel: LustreError: 29262:0:(osd_index.c:588:osd_dir_insert()) dagg-MDT0000-osd: [0x68003e09a:0xb1cb:0x0] wasn't declared for insert /var/log/messages-20180923.gz:Sep 23 02:15:37 warble2 kernel: Lustre: 20665:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4 /var/log/messages:Sep 24 12:11:13 warble2 kernel: LustreError: 31813:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0002: [0x68003e07a:0x17284:0x0] doesn't exist!: rc = -14 /var/log/messages:Sep 24 12:11:16 warble2 kernel: LustreError: 31746:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0001: [0x28004037a:0x12fdc:0x0] doesn't exist!: rc = -14 /var/log/messages:Sep 24 12:11:16 warble2 kernel: LustreError: 31746:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 1 previous similar message /var/log/messages:Sep 24 12:17:02 warble2 kernel: LustreError: 24787:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0001-osp-MDT0002: fail to cancel 0 of 1 llog-records: rc = -116 /var/log/messages:Sep 24 13:57:21 warble2 kernel: Lustre: 23318:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4 /var/log/messages:Sep 24 14:04:10 warble2 kernel: LustreError: 44226:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0002: [0x68003e07a:0x17284:0x0] doesn't exist!: rc = -14 /var/log/messages:Sep 24 14:05:27 warble2 kernel: LustreError: 44595:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0001: [0x28004037a:0x12fdc:0x0] doesn't exist!: rc = -14 /var/log/messages:Sep 24 14:05:27 warble2 kernel: LustreError: 44595:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 1 previous similar message /var/log/messages:Sep 24 14:27:55 warble2 kernel: LustreError: 47736:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0002: [0x68003e07a:0x17284:0x0] doesn't exist!: rc = -14 /var/log/messages:Sep 24 14:27:56 warble2 kernel: LustreError: 44449:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0001: [0x28004037a:0x12fdc:0x0] doesn't exist!: rc = -14 /var/log/messages:Sep 24 14:27:56 warble2 kernel: LustreError: 44449:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 1 previous similar message /var/log/messages:Sep 24 17:59:59 warble2 kernel: Lustre: 21379:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4 /var/log/messages:Sep 24 17:59:59 warble2 kernel: LustreError: 42530:0:(osd_index.c:588:osd_dir_insert()) dagg-MDT0000-osd: [0x68003e097:0x669a:0x0] wasn't declared for insert cheers, |
| Comment by Lai Siyao [ 25/Sep/18 ] |
|
The log looks similar to that of |
| Comment by Gerrit Updater [ 27/Sep/18 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33248 |
| Comment by Lai Siyao [ 27/Sep/18 ] |
|
I just uploaded a patch which fixes a corner case if transaction aborts, and also adds a few debug messages. You can cherry-pick it to 2.10 and test again. |
| Comment by SC Admin (Inactive) [ 28/Sep/18 ] |
|
Hi Lai, thanks very much! I was just about to add that in, but then I noticed sanity-sec fails in gerrit. is that something we should be worried about? cheers, |
| Comment by SC Admin (Inactive) [ 05/Oct/18 ] |
|
Hi Lai, has this patch stabilised enough to try it out now do you think? cheers, |
| Comment by Lai Siyao [ 08/Oct/18 ] |
|
Yes, you can try it now. |
| Comment by SC Admin (Inactive) [ 09/Oct/18 ] |
|
Hi Lai, ok. that does seem to have helped a lot. longer version: there are a lot of rsync's running on login nodes and data-movers today, and warble2 (which has all 3 DNE dagg MDTs on it) hung again (as it did 2 days ago). warble2 needed to be reset to fix it but it just hung again a few minutes later. and then the same again. only warble2 is running this patched lustre at the moment. dmesg below. [ 107.621762] LNet: Using FMR for registration [ 107.626523] LNetError: 6808:0:(o2iblnd_cb.c:2299:kiblnd_passive_connect()) Can't accept conn from 192.168.44.13@o2ib44 on NA (ib0:0:192.168.44.22): bad dst nid 192.168.44.22@o2ib44 [ 107.684310] LNet: Added LNI 192.168.44.22@o2ib44 [128/2048/0/180] [ 186.847844] LustreError: 137-5: dagg-MDT0002_UUID: not available for connect from 192.168.44.191@o2ib44 (no target). If you are running an HA pair check that the target is mounted on the other server. [ 186.925648] Lustre: dagg-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [ 186.964659] Lustre: dagg-MDT0000: Connection restored to 192.168.44.36@o2ib44 (at 192.168.44.36@o2ib44) [ 187.483723] LustreError: 137-5: dagg-MDT0002_UUID: not available for connect from 192.168.44.145@o2ib44 (no target). If you are running an HA pair check that the target is mounted on the other server. [ 187.937707] Lustre: dagg-MDT0000: Will be in recovery for at least 2:30, or until 130 clients reconnect [ 187.947555] Lustre: dagg-MDT0000: Connection restored to 2d8f72a9-431d-332b-5ffa-aa828d3be0ee (at 192.168.44.121@o2ib44) [ 188.550374] LustreError: 137-5: dagg-MDT0001_UUID: not available for connect from 192.168.44.129@o2ib44 (no target). If you are running an HA pair check that the target is mounted on the other server. [ 188.569071] LustreError: Skipped 2 previous similar messages [ 188.795236] LustreError: 11-0: dagg-MDT0000-osp-MDT0001: operation mds_connect to node 0@lo failed: rc = -114 [ 188.853949] Lustre: dagg-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [ 190.146916] Lustre: dagg-MDT0001: Will be in recovery for at least 2:30, or until 129 clients reconnect [ 190.156945] Lustre: dagg-MDT0001: Connection restored to 652c9ef9-2a5d-b195-3eda-b821ae8894d0 (at 192.168.44.110@o2ib44) [ 190.168392] Lustre: Skipped 1 previous similar message [ 190.829441] LustreError: 11-0: dagg-MDT0000-osp-MDT0002: operation mds_connect to node 0@lo failed: rc = -114 [ 190.947606] Lustre: dagg-MDT0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [ 191.348107] Lustre: dagg-MDT0002: Will be in recovery for at least 2:30, or until 128 clients reconnect [ 192.215123] Lustre: dagg-MDT0000: Connection restored to 9b72ae35-e33b-a946-9669-fa831346959d (at 192.168.44.115@o2ib44) [ 192.226665] Lustre: Skipped 30 previous similar messages [ 196.317682] Lustre: dagg-MDT0002: Connection restored to 8f4e7961-71a6-0de0-25cd-4be32d12d7d6 (at 192.168.44.116@o2ib44) [ 196.329207] Lustre: Skipped 215 previous similar messages [ 204.357812] Lustre: dagg-MDT0000: Connection restored to 5bac3166-dadc-89ba-730d-d36504f6ca68 (at 192.168.44.218@o2ib44) [ 204.369347] Lustre: Skipped 70 previous similar messages [ 216.020013] LustreError: 32173:0:(llog_cat.c:269:llog_cat_id2handle()) dagg-MDT0001-osp-MDT0002: error opening log id [0x46a:0x80046900:0x2]:0: rc = -2 [ 216.034963] LustreError: 32173:0:(llog_cat.c:824:llog_cat_process_cb()) dagg-MDT0001-osp-MDT0002: cannot find handle for llog [0x46a:0x80046900:0x2]: rc = -2 [ 220.462569] Lustre: dagg-MDT0000: Connection restored to c3d7aae9-47e8-dbda-61fe-2422c114c40a (at 192.168.44.140@o2ib44) [ 220.474173] Lustre: Skipped 70 previous similar messages [ 234.910484] Lustre: 32489:0:(ldlm_lib.c:1773:extend_recovery_timer()) dagg-MDT0002: extended recovery timer reaching hard limit: 900, extend: 1 [ 236.043117] Lustre: 22198:0:(ldlm_lib.c:1773:extend_recovery_timer()) dagg-MDT0001: extended recovery timer reaching hard limit: 900, extend: 1 [ 236.057457] Lustre: 22198:0:(ldlm_lib.c:1773:extend_recovery_timer()) Skipped 1 previous similar message [ 241.008948] Lustre: dagg-MDT0002: Recovery over after 0:50, of 130 clients 130 recovered and 0 were evicted. [ 241.046477] LustreError: 19852:0:(tgt_handler.c:509:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff921bcc313f00 x1613823206242752/t0(0) o601->dagg-MDT0000-lwp-MDT0002_UUID@0@lo:417/0 lens 336/0 e 0 to 0 dl 1539062632 ref 1 fl Interpret:/0/ffffffff rc 0/-1 [ 241.073940] LustreError: 11-0: dagg-MDT0000-lwp-MDT0002: operation quota_acquire to node 0@lo failed: rc = -11 [ 241.642213] Lustre: dagg-MDT0000: Recovery over after 0:54, of 130 clients 130 recovered and 0 were evicted. [ 241.672756] Lustre: 21509:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4 [ 241.688069] LustreError: 21509:0:(lod_dev.c:420:lod_sub_recovery_thread()) dagg-MDT0000-osp-MDT0001 getting update log failed: rc = -22 [ 248.792951] Lustre: 22198:0:(ldlm_lib.c:2059:target_recovery_overseer()) recovery is aborted, evict exports in recovery [ 248.804911] Lustre: dagg-MDT0001: disconnecting 6 stale clients [ 248.812515] LustreError: 22198:0:(ldlm_lib.c:1640:abort_lock_replay_queue()) @@@ aborted: req@ffff921bd2851b00 x1613136754616240/t0(0) o101->9b72ae35-e33b-a946-9669-fa831346959d@192.168.44.115@o2ib44:368/0 lens 328/0 e 0 to 0 dl 1539062583 ref 1 fl Complete:/40/ffffffff rc 0/-1 [ 248.854008] Lustre: dagg-MDT0001: Denying connection for new client 10e4c1fa-928c-56a4-d78e-017ed3a2d89c(at 192.168.44.129@o2ib44), waiting for 130 known clients (124 recovered, 0 in progress, and 6 evicted) to recover in 14:01 [ 248.869553] Lustre: dagg-MDT0001: Recovery over after 0:59, of 130 clients 124 recovered and 6 were evicted. [ 248.886553] Lustre: Skipped 3 previous similar messages [ 250.354374] LustreError: 32739:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14 [ 254.079712] Lustre: dagg-MDT0000: Connection restored to 192.168.44.35@o2ib44 (at 192.168.44.35@o2ib44) [ 254.090024] Lustre: Skipped 39 previous similar messages [ 282.212958] LustreError: 51902:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14 [ 282.226582] LustreError: 51902:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 1 previous similar message [ 419.076513] LustreError: 51723:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14 [ 419.090148] LustreError: 51723:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 4 previous similar messages [ 450.899773] LustreError: 51713:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14 [ 450.913417] LustreError: 51713:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 1 previous similar message [ 451.353612] LustreError: 32170:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0001-osp-MDT0002: fail to cancel 0 of 1 llog-records: rc = -116 [ 588.128553] LustreError: 32739:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14 [ 588.142200] LustreError: 32739:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 4 previous similar messages [ 619.967048] LustreError: 55105:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14 [ 619.980680] LustreError: 55105:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 1 previous similar message [ 755.646673] LustreError: 51708:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14 [ 755.660306] LustreError: 51708:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 4 previous similar messages [ 810.287028] Lustre: dagg-MDT0000: Connection restored to aca1fdbe-fe45-b9db-701e-ccf7c902ae6e (at 192.168.44.199@o2ib44) [ 810.298854] Lustre: Skipped 14 previous similar messages [ 924.215258] LustreError: 51708:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14 [ 924.228884] LustreError: 51708:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 6 previous similar messages [ 1092.752842] LustreError: 54995:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14 [ 1092.766459] LustreError: 54995:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 6 previous similar messages [ 1261.277065] LustreError: 32739:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14 [ 1261.290649] LustreError: 32739:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 6 previous similar messages [ 1575.449329] LustreError: 32170:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0001-osp-MDT0002: fail to cancel 0 of 1 llog-records: rc = -116 [ 1575.464371] LustreError: 32170:0:(llog_cat.c:795:llog_cat_cancel_records()) Skipped 3 previous similar messages [ 1586.959098] LustreError: 20233:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0002-osp-MDT0000: fail to cancel 0 of 1 llog-records: rc = -116 [ 1633.295825] LustreError: 19843:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14 [ 1633.309398] LustreError: 19843:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 6 previous similar messages [ 2276.211972] LustreError: 54995:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000330be:0x28c9:0x0] doesn't exist!: rc = -14 [ 2276.225522] LustreError: 54995:0:(mdt_open.c:1167:mdt_cross_open()) Skipped 20 previous similar messages there are definitely new corrupted directories from today's resets/hangs, but I can't tell if they are worse than usual or not. cheers, |
| Comment by SC Admin (Inactive) [ 09/Oct/18 ] |
|
so the corruption is perhaps different today... I'm putting it in this ticket as it's perhaps related to these changes. these are not new directories, but have been there for weeks/months . eg. [root@john99 ~]# mkdir /fred/oz002/mlower/blah
mkdir: cannot create directory ‘/fred/oz002/mlower/blah’: Stale file handle
[root@john99 ~]# ls -ld /fred/oz002/mlower/
drwxrwsr-x 6 mlower oz002 76800 Oct 9 17:18 /fred/oz002/mlower/
[root@john99 ~]# lfs getdirstripe /fred/oz002/mlower
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
1 [0x280024621:0x196c4:0x0]
2 [0x680024df2:0x19733:0x0]
0 [0x2000153f2:0x196fb:0x0]
[root@john99 ~]# ls -l /fred/oz002/mlower/
total 324
drwx--S--- 9 mlower oz002 76800 Oct 5 16:50 glitches
drwxr-sr-x 5 mlower oz002 76800 Sep 21 14:18 mysoft
drwxr-sr-x 6 mlower oz002 76800 Oct 9 17:18 ppta_dr2
-rw-r--r-- 1 mlower oz002 5 Oct 9 17:18 test.txt
drwxrwsr-x 7 mlower oz002 76800 Sep 24 10:34 timing
and there are 1000's of old files under that directory tree. it seems unlikely that these directories should be "normal" corruption from the MDS resets or client evicts today. I'm going to reboot back into the unpatched version and see if the directories come back to life. cheers, |
| Comment by SC Admin (Inactive) [ 09/Oct/18 ] |
|
under the unpatched lustre the directory is ok [root@john99 oz002]# mkdir mlower/blah
[root@john99 oz002]# ls -ld mlower
drwxrwsr-x 7 mlower oz002 76800 Oct 9 18:28 mlower
[root@john99 oz002]# lfs getdirstripe mlower
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
1 [0x280024621:0x196c4:0x0]
2 [0x680024df2:0x19733:0x0]
0 [0x2000153f2:0x196fb:0x0]
[root@john99 oz002]# rmdir mlower/blah
also another directory that the other user reported as "stale" [root@john99 oz002]# mkdir /fred/oz013/smutch/mhysa_paper/tiamat_runs/ha_said_that_before/smf-reion/redshift/blah
[root@john99 oz002]# lfs getdirstripe /fred/oz013/smutch/mhysa_paper/tiamat_runs/ha_said_that_before/smf-reion/redshift
lmv_stripe_count: 3 lmv_stripe_offset: 2 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
2 [0x6800426d1:0xf9ff:0x0]
0 [0x20003117b:0xfa5e:0x0]
1 [0x280043a4c:0xfa30:0x0]
[root@john99 oz002]# lfs getdirstripe /fred/oz013/smutch/mhysa_paper/tiamat_runs/ha_said_that_before/smf-reion/redshift/blah
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
0 [0x200034fc0:0x15:0x0]
1 [0x2800478a0:0x15:0x0]
2 [0x680046518:0x15:0x0]
[root@john99 oz002]# rmdir /fred/oz013/smutch/mhysa_paper/tiamat_runs/ha_said_that_before/smf-reion/redshift/blah
please advise. cheers, |
| Comment by Lai Siyao [ 09/Oct/18 ] |
|
Can you apply the patch https://review.whamcloud.com/#/c/33248/, and collect debug log on MDS when mkdir fail with -ESTALE? And it's better to enable more debug with 'lctl set_param debug=+trace' on MDS. |
| Comment by SC Admin (Inactive) [ 09/Oct/18 ] |
|
I rebooted back into the patched lustre and now there are no "stale" messages from either of those 3 dirs. WTF. I also created a test dir in all */*/ level dirs in the basic hierarchy, and they worked ok. dmesg for this boot is [ 109.053212] LNet: Using FMR for registration [ 109.057484] LNetError: 105:0:(o2iblnd_cb.c:2299:kiblnd_passive_connect()) Can't accept conn from 192.168.44.112@o2ib44 on NA (ib0:0:192.168.44.22): bad dst nid 192.168.44.22@o2ib44 [ 109.128113] LNet: Added LNI 192.168.44.22@o2ib44 [128/2048/0/180] [ 109.221450] LustreError: 137-5: dagg-MDT0002_UUID: not available for connect from 192.168.44.117@o2ib44 (no target). If you are running an HA pair check that the target is mounted on the other server. [ 109.280702] Lustre: dagg-MDT0000: Not available for connect from 192.168.44.151@o2ib44 (not set up) [ 109.290251] Lustre: Skipped 1 previous similar message [ 109.391658] Lustre: dagg-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [ 109.448235] Lustre: dagg-MDT0000: Will be in recovery for at least 2:30, or until 130 clients reconnect [ 109.458172] Lustre: dagg-MDT0000: Connection restored to bc8ef6f5-4660-0761-54cd-06d7cfb2235b (at 192.168.44.125@o2ib44) [ 109.722365] LustreError: 137-5: dagg-MDT0002_UUID: not available for connect from 192.168.44.115@o2ib44 (no target). If you are running an HA pair check that the target is mounted on the other server. [ 109.741256] LustreError: Skipped 62 previous similar messages [ 109.980629] Lustre: dagg-MDT0000: Connection restored to d50b151f-16b0-b385-88bc-e4b15e5bb52f (at 192.168.44.180@o2ib44) [ 109.992059] Lustre: Skipped 28 previous similar messages [ 110.259366] LustreError: 11-0: dagg-MDT0000-osp-MDT0001: operation mds_connect to node 0@lo failed: rc = -114 [ 110.322967] Lustre: dagg-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [ 110.847170] Lustre: dagg-MDT0001: Will be in recovery for at least 2:30, or until 129 clients reconnect [ 111.108931] Lustre: dagg-MDT0001: Connection restored to 8e7cc0d0-30b4-27c2-b52a-ce7fce8d8cf6 (at 192.168.44.152@o2ib44) [ 111.120439] Lustre: Skipped 6 previous similar messages [ 111.748516] LustreError: 137-5: dagg-MDT0002_UUID: not available for connect from 192.168.44.127@o2ib44 (no target). If you are running an HA pair check that the target is mounted on the other server. [ 111.767612] LustreError: Skipped 39 previous similar messages [ 112.246792] LustreError: 11-0: dagg-MDT0000-osp-MDT0002: operation mds_connect to node 0@lo failed: rc = -114 [ 112.302956] Lustre: dagg-MDT0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [ 112.818342] Lustre: dagg-MDT0002: Will be in recovery for at least 2:30, or until 129 clients reconnect [ 113.469392] Lustre: dagg-MDT0000: Connection restored to (at 192.168.44.117@o2ib44) [ 113.477819] Lustre: Skipped 11 previous similar messages [ 117.503259] Lustre: dagg-MDT0001: Connection restored to 016d10c9-cf22-7918-dae9-afb4faddd7b8 (at 192.168.44.108@o2ib44) [ 117.514808] Lustre: Skipped 188 previous similar messages [ 127.124365] Lustre: dagg-MDT0002: Connection restored to d0cfc850-c50a-96c5-3cf8-857dda21371f (at 192.168.44.135@o2ib44) [ 127.135932] Lustre: Skipped 149 previous similar messages [ 137.464707] LustreError: 19037:0:(llog_cat.c:269:llog_cat_id2handle()) dagg-MDT0002-osp-MDT0000: error opening log id [0x396:0x80046518:0x6]:0: rc = -2 [ 137.479741] LustreError: 19037:0:(llog_cat.c:824:llog_cat_process_cb()) dagg-MDT0002-osp-MDT0000: cannot find handle for llog [0x396:0x80046518:0x6]: rc = -2 [ 145.320362] Lustre: dagg-MDT0000: Connection restored to 192.168.44.31@o2ib44 (at 192.168.44.31@o2ib44) [ 145.330575] Lustre: Skipped 27 previous similar messages [ 155.674050] Lustre: 33352:0:(ldlm_lib.c:1773:extend_recovery_timer()) dagg-MDT0002: extended recovery timer reaching hard limit: 900, extend: 1 [ 162.208865] Lustre: dagg-MDT0000: Recovery over after 0:53, of 130 clients 130 recovered and 0 were evicted. [ 162.252259] Lustre: 20858:0:(llog.c:572:llog_process_thread()) dagg-MDT0000-osp-MDT0001: invalid length 0 in llog [0x1:0x19a42:0x2]record for index 0/4 [ 162.265526] LustreError: 49123:0:(osd_oi.c:859:osd_idc_find_or_init()) dagg-MDT0000: can't lookup: rc = -2 [ 162.277914] LustreError: 20858:0:(lod_dev.c:420:lod_sub_recovery_thread()) dagg-MDT0000-osp-MDT0001 getting update log failed: rc = -22 [ 163.190156] Lustre: 21405:0:(ldlm_lib.c:2059:target_recovery_overseer()) recovery is aborted, evict exports in recovery [ 163.202218] Lustre: dagg-MDT0001: disconnecting 1 stale clients [ 163.236719] Lustre: dagg-MDT0001: Recovery over after 0:53, of 130 clients 129 recovered and 1 was evicted. [ 163.247382] Lustre: Skipped 1 previous similar message warble2 was hung again before I rebooted it into this patched version, which is the same as the previous time we were in this patched kernel. some more info on the "stale" state -> I'm very unsure whether it's safe to leave it in this lustre version... any thoughts? cheers, |
| Comment by SC Admin (Inactive) [ 09/Oct/18 ] |
|
Hi Lai, I've done some more sweeps through what dirs I can, and (I think) somewhere in /fred/*/*/*/*/ the below messages appear but I haven't narrowed down where. I haven't found a definitively "stale" dir tree again yet. [ 2156.824871] LustreError: 20853:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0000-osp-MDT0001: fail to cancel 0 of 1 llog-records: rc = -116 [ 2161.808418] LustreError: 20853:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0000-osp-MDT0001: fail to cancel 1 of 1 llog-records: rc = -116 [ 2166.803554] LustreError: 20853:0:(llog_cat.c:795:llog_cat_cancel_records()) dagg-MDT0000-osp-MDT0001: fail to cancel 1 of 1 llog-records: rc = -116 [ 2166.818497] LustreError: 20853:0:(llog_cat.c:795:llog_cat_cancel_records()) Skipped 39 previous similar messages [ 2686.442216] LustreError: 51719:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000237f5:0xefeb:0x0] doesn't exist!: rc = -14 [ 3164.577969] LustreError: 18521:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000237f5:0xefeb:0x0] doesn't exist!: rc = -14 [ 3274.971014] LustreError: 51719:0:(mdt_open.c:1167:mdt_cross_open()) dagg-MDT0000: [0x2000237f5:0xefeb:0x0] doesn't exist!: rc = -14 anyway, I've left it in this patched lustre version for now. further advice would be appreciated. cheers, |
| Comment by Lai Siyao [ 09/Oct/18 ] |
|
The error messages report that distributed log fail to cancel, which is not fatal and won't cause anything wrong normally, and I'll review code to see whether mdt_cross_open() failure is related. Fine, next time you met with any failure, if you can collect debug logs in time, it may help. |
| Comment by SC Admin (Inactive) [ 10/Oct/18 ] |
|
Hi Lai, can you please explain more about why we would get "stale" directories on all clients on 1 boot and not on another. we have never seen this behaviour before. I will try to remember to gather logs more diligently, but obviously I was expecting the same behaviour the next time, so thought I could gather logs for you if requested by rebooting back into the patched lustre again. at the time it was more important to stop users jobs dying. as I said, we haven't seen behaviours change randomly from boot to boot like this before, and are concerned that it will just randomly happen again. I can't see any significant differences in the two dmesg connection logs posted above (same lustre, one "stale", one not), but hopefully you folks can. thanks. cheers, |
| Comment by Lai Siyao [ 10/Oct/18 ] |
|
llog_cat_cancel_records() is called after distributed transaction commit to remove replay logs, if for unknown reason it's not found, it will return -ESTALE, but since transaction was committed already, it won't cause any trouble. mdt_cross_open() is to open a remote file (whose inode and dirent are located on different MDTs), if the inode corresponding to the FID doesn't exist, it returns -ESTALE, this normally happens when a concurrent unlink removed this inode. But for lack of debug logs, I can't really know what happened in your case. |
| Comment by SC Admin (Inactive) [ 10/Oct/18 ] |
|
Hi Lai, thanks, but those are just the ongoing messages. instead can you please look at the 2 dmesg's in the comments above that start with "LNet: Using FMR for registration". these are the mount logs for the 2 attempts with the same lustre version, but one is "stale" and the other isn't. as the "stale" issue started at boot time, this is the place to look. no? thanks. cheers, |
| Comment by SC Admin (Inactive) [ 13/Oct/18 ] |
|
Hi Lai, the stale directory problem has re-emerged. I started debug_daemon on the MDS that has all 3 dagg MDTs on it, with -1 debugging. [root@john99 ~]# cd /fred/oz073/sjoudaki/g2files ; ls ; mkdir blah101 ; ls ; cd athena_1.54_nov2013 athena_1.54_simulations athena_1.54.tar.gz blah99 cosmomc2015febsjallocate7binstrapzdatameadfitmargtrapz_ccorrection cosmomcjuly2015sj g2home athena_1.54_nov2013 athena_1.54.tar.gz blah99 cosmomcjuly2015sj athena_1.54_simulations blah101 cosmomc2015febsjallocate7binstrapzdatameadfitmargtrapz_ccorrection g2home [root@john99 ~]# cd /fred/oz073/sjoudaki/g2files ; ls ; mkdir blah102 ; ls ; cd athena_1.54_nov2013 athena_1.54.tar.gz blah99 cosmomcjuly2015sj athena_1.54_simulations blah101 cosmomc2015febsjallocate7binstrapzdatameadfitmargtrapz_ccorrection g2home athena_1.54_nov2013 athena_1.54.tar.gz blah102 cosmomc2015febsjallocate7binstrapzdatameadfitmargtrapz_ccorrection g2home athena_1.54_simulations blah101 blah99 cosmomcjuly2015sj [root@john99 ~]# cd /fred/oz073/sjoudaki/g2files ; ls ; mkdir blah103 ; ls ; cd athena_1.54_nov2013 athena_1.54.tar.gz blah102 cosmomc2015febsjallocate7binstrapzdatameadfitmargtrapz_ccorrection g2home athena_1.54_simulations blah101 blah99 cosmomcjuly2015sj mkdir: cannot create directory ‘blah103’: Stale file handle athena_1.54_nov2013 athena_1.54.tar.gz blah102 cosmomc2015febsjallocate7binstrapzdatameadfitmargtrapz_ccorrection g2home athena_1.54_simulations blah101 blah99 cosmomcjuly2015sj so a couple of directory creates succeeded, but then "blah102" failed. I also picked a random directory on the filesystem, made a dir for myself, and created 21 dirs and the results are below. [root@john99 oz025]# mkdir rjh [root@john99 oz025]# cd rjh [root@john99 rjh]# mkdir 00 [root@john99 rjh]# mkdir 01 mkdir: cannot create directory ‘01’: Stale file handle [root@john99 rjh]# mkdir 02 [root@john99 rjh]# mkdir 03 [root@john99 rjh]# mkdir 04 mkdir: cannot create directory ‘04’: Stale file handle [root@john99 rjh]# mkdir 05 [root@john99 rjh]# mkdir 06 [root@john99 rjh]# mkdir 07 mkdir: cannot create directory ‘07’: Stale file handle [root@john99 rjh]# mkdir 08 [root@john99 rjh]# mkdir 09 [root@john99 rjh]# mkdir 10 [root@john99 rjh]# mkdir 11 mkdir: cannot create directory ‘11’: Stale file handle [root@john99 rjh]# mkdir 12 [root@john99 rjh]# mkdir 13 [root@john99 rjh]# mkdir 14 [root@john99 rjh]# mkdir 15 [root@john99 rjh]# mkdir 16 mkdir: cannot create directory ‘16’: Stale file handle [root@john99 rjh]# mkdir 17 [root@john99 rjh]# mkdir 18 mkdir: cannot create directory ‘18’: Stale file handle [root@john99 rjh]# mkdir 19 [root@john99 rjh]# mkdir 20 mkdir: cannot create directory ‘20’: Stale file handle [root@john99 rjh]# lfs getdirstripe *
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
1 [0x280048072:0x1bed0:0x0]
2 [0x680046cea:0x1bf38:0x0]
0 [0x200035792:0x1bf5f:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 2 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
2 [0x680046ce9:0x1aab1:0x0]
0 [0x200035791:0x1ab20:0x0]
1 [0x280048071:0x1ab38:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
1 [0x280048072:0x1bed1:0x0]
2 [0x680046cea:0x1bf39:0x0]
0 [0x200035792:0x1bf60:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 2 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
2 [0x680046ce9:0x1aab2:0x0]
0 [0x200035791:0x1ab21:0x0]
1 [0x280048071:0x1ab39:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
1 [0x280048072:0x1bed2:0x0]
2 [0x680046cea:0x1bf3a:0x0]
0 [0x200035792:0x1bf61:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 2 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
2 [0x680046ce9:0x1aab3:0x0]
0 [0x200035791:0x1ab22:0x0]
1 [0x280048071:0x1ab3a:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
1 [0x280048072:0x1bed3:0x0]
2 [0x680046cea:0x1bf3b:0x0]
0 [0x200035792:0x1bf62:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 2 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
2 [0x680046ce9:0x1aab4:0x0]
0 [0x200035791:0x1ab23:0x0]
1 [0x280048071:0x1ab3b:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
1 [0x280048072:0x1bed4:0x0]
2 [0x680046cea:0x1bf3c:0x0]
0 [0x200035792:0x1bf63:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 2 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
2 [0x680046ce9:0x1aab5:0x0]
0 [0x200035791:0x1ab24:0x0]
1 [0x280048071:0x1ab3c:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
1 [0x280048072:0x1bed5:0x0]
2 [0x680046cea:0x1bf3d:0x0]
0 [0x200035792:0x1bf64:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 2 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
2 [0x680046ce9:0x1aab6:0x0]
0 [0x200035791:0x1ab25:0x0]
1 [0x280048071:0x1ab3d:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
1 [0x280048072:0x1bed6:0x0]
2 [0x680046cea:0x1bf3e:0x0]
0 [0x200035792:0x1bf65:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
1 [0x280048072:0x1bed7:0x0]
2 [0x680046cea:0x1bf3f:0x0]
0 [0x200035792:0x1bf66:0x0]
so it looks like all attempts to mkdir on mdt0 are failing? this "stale" problem has only just started again this evening. lctl lfsck_start -M dagg-MDT0000 -t namespace -A -r sometime during the lfsck, lustre started getting quota enforcement badly wrong, so I turned off inode group quota enforcing on the dagg filesystem until we can look at it further, and now these stale directories seem to be appearing. I suspect nobody is around to help on a weekend, so I'm going to reboot the MDS now. cheers, |
| Comment by SC Admin (Inactive) [ 14/Oct/18 ] |
|
Hi Lai, MDS crashes on reboot into any of our recent lustre versions. see I've now removed the version 3 patch from https://review.whamcloud.com/#/c/33248/ from our build. please let us know what you glean from the debug logs about stale dirs above. cheers, |
| Comment by SC Admin (Inactive) [ 15/Oct/18 ] |
|
Hi Lai, so firstly, I've just noticed that the chmod/chgrp sweep that we did on the filesystem starting on 2018-10-13-14:44:52 also came up with a bunch of "stale" directories. I started the lfsck around Oct 13 19:49:28 which is well after the chmod/chgrp sweeps had finished. so that means that the "stale" issue is definitively (IMHO) related to the patch in this ticket, and not to the lfsck. secondly, all 1739 of these " Stale file handle" were directories (no files) and all on mdt0. thirdly, this sweep also hit 396 "Disk quota exceeded" errors on directories (no files) messages (which I think should never happen as root), and these were heavily on mdt1 and mdt2 and not on mdt0. ie. count mdtidx
3 0
177 1
170 2
more information is below... so I actually did 2 chmod/chgrp sweeps. the first ran from 2018-10-10-14:11:22 to Oct 13 07:37 and it saw no "stale" or "quota" problems. the second sweep was expected to be quick as almost all incorrect mode/ownership had already been fixed by the first. it ran from 2018-10-13-14:44:52 to Oct 13 18:18. now I have noticed that this second sweep did see "stale" and "quota" error directories. everything from Oct 13 about warble2 in syslog in this time interval is already attached to this ticket in messages-john99.warble2.Lustre.20181014.txt.gz (perhaps fourthly), the other thing I've noticed is that all the groups (== directories on Lustre) that saw "Stale file handle" on some tiny fraction of their mdt0 directories in this second sweep were the same as the groups that saw the "Disk quota exceeded" errors -> [root@john99 resetGroups]# grep -v ^oz *2018-10-13-14:44:52 | grep -v 'No such file or directory' | grep 'Stale file handle' | cut -d. -f1 | uniq -c
177 oz037
74 oz050
667 oz060
105 oz065
19 oz071
48 oz074
1150 oz100
103 oz114
5 oz123
12 oz992
[root@john99 resetGroups]# grep -v ^oz *2018-10-13-14:44:52 | grep -v 'No such file or directory' | grep 'Disk quota exceeded' | cut -d. -f1 | uniq -c
54 oz037
17 oz050
50 oz060
106 oz065
3 oz071
7 oz074
137 oz100
20 oz114
2 oz992
the exception in the above is the group oz123, but that kinda makes sense 'cos the the number of "Disk quota exceeded" messages for each group's sweep is less than their "Stale file handle" errors, and oz123 had the least of those. so perhaps the "stale" problem and the "quota" problem are related somehow. cheers, |
| Comment by Lai Siyao [ 16/Oct/18 ] |
|
-116 comes from llog_osd_declare_write_rec() --> osp_md_declare_write(), where 'opo_stale' is set: 00000040:00000001:9.0:1539446568.707951:0:51694:0:(llog_osd.c:322:llog_osd_declare_write_rec()) Process entered 00000040:00000001:9.0:1539446568.707952:0:51694:0:(llog_osd.c:340:llog_osd_declare_write_rec()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c) 00000040:00000001:9.0:1539446568.707952:0:51694:0:(llog.c:960:llog_declare_write_rec()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c) 00000040:00000001:9.0:1539446568.707953:0:51694:0:(llog_cat.c:141:llog_cat_new_log()) Process leaving via out (rc=18446744073709551500 : -116 : 0xffffffffffffff8c) 00000004:00000001:9.0:1539446568.707954:0:51694:0:(osp_trans.c:1567:osp_trans_stop()) Process entered 00000004:00080000:9.0:1539446568.707954:0:51694:0:(osp_trans.c:1595:osp_trans_stop()) dagg-MDT0002-osp-MDT0000: add oth ffff8c79137fbcc0 with version 0 00000004:00000010:9.0:1539446568.707955:0:51694:0:(osp_trans.c:224:osp_update_request_destroy()) kfreed 'ours->ours_req': 4096 at ffff8c42a03e0000. 00000004:00000010:9.0:1539446568.707955:0:51694:0:(osp_trans.c:225:osp_update_request_destroy()) kfreed 'ours': 32 at ffff8c4219dfe760. 00000004:00000010:9.0:1539446568.707956:0:51694:0:(osp_trans.c:252:osp_update_request_destroy()) kfreed 'our': 120 at ffff8c78f5bb1900. 00000004:00000010:9.0:1539446568.707956:0:51694:0:(osp_trans.c:935:osp_thandle_destroy()) kfreed 'oth': 88 at ffff8c79137fbcc0. 00000004:00000001:9.0:1539446568.707957:0:51694:0:(osp_trans.c:1616:osp_trans_stop()) Process leaving (rc=0 : 0 : 0) 00000040:00000001:9.0:1539446568.707957:0:51694:0:(llog_cat.c:207:llog_cat_new_log()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c) 00000040:00000001:9.0:1539446568.707958:0:51694:0:(llog_cat.c:674:llog_cat_declare_add_rec()) Process leaving via out (rc=18446744073709551500 : -116 : 0xffffffffffffff8c) 00000040:00000001:9.0:1539446568.707959:0:51694:0:(llog_cat.c:688:llog_cat_declare_add_rec()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c) 00000040:00000001:9.0:1539446568.707959:0:51694:0:(llog.c:1047:llog_declare_add()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c) 00000020:00000001:9.0:1539446568.707960:0:51694:0:(update_trans.c:136:sub_declare_updates_write()) Process leaving via out_put (rc=18446744073709551500 : -116 : 0xffffffffffffff8c) 00000020:00000040:9.0:1539446568.707961:0:51694:0:(lustre_log.h:380:llog_ctxt_put()) PUTting ctxt ffff8c79b7fffc80 : new refcount 1 00000020:00000001:9.0:1539446568.707962:0:51694:0:(update_trans.c:770:prepare_multiple_node_trans()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c) 00000020:00000001:9.0:1539446568.707962:0:51694:0:(update_trans.c:808:top_trans_start()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c) 00000004:00000001:9.0:1539446568.707963:0:51694:0:(mdd_dir.c:2460:mdd_create()) Process leaving via out_stop (rc=18446744073709551500 : -116 : 0xffffffffffffff8c) I'm still trying to understand why it's set, and because the stale flag is set for dagg-MDT0002-osp-MDT0000, all the create starting from MDT0 failed. |
| Comment by Gerrit Updater [ 19/Oct/18 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33401 |
| Comment by SC Admin (Inactive) [ 22/Oct/18 ] |
|
Hi Lai, so is the idea to apply just https://review.whamcloud.com/33401 or is that in addition to https://review.whamcloud.com/33248 ? cheers, |
| Comment by Lai Siyao [ 22/Oct/18 ] |
|
I think both are needed. |
| Comment by SC Admin (Inactive) [ 23/Oct/18 ] |
|
Hi Lai, do you think this also explains the quota errors that we were seeing, or could that be a separate (also stale?) problem? I've built an image with these 2 patches in, and will reboot all lustre servers into it later today. I guess I just really have to do the MDS's though... ? cheers, |
| Comment by Gerrit Updater [ 23/Oct/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33248/ |
| Comment by SC Admin (Inactive) [ 23/Oct/18 ] |
|
Hi Lai, we've been running with these 2 patches for a few hours now. so far so good, but one client LBUG seen. presumably it's not a coincidence, but I guess it might be. Message from syslogd@john98 at Oct 24 02:11:26 ... kernel: LustreError: 423673:0:(llite_lib.c:2047:ll_delete_inode()) ASSERTION( inode->i_data.nrpages == 0 ) failed: inode=[0x680049431:0x159b8:0x0](ffff9dfe6cb87590) nr pages=1, see LU-118 Message from syslogd@john98 at Oct 24 02:11:26 ... kernel: LustreError: 423673:0:(llite_lib.c:2047:ll_delete_inode()) LBUG <ConMan> Console [j98] log at 2018-10-24 02:00:00 AEDT. 2018-10-24 02:11:15 [1500494.537166] bash (421992): drop_caches: 2 2018-10-24 02:11:26 [1500505.349268] LustreError: 423673:0:(llite_lib.c:2047:ll_delete_inode()) ASSERTION( inode->i_data.nrpages == 0 ) failed: inode=[0x680049431:0x159b8:0x0](ffff9dfe6cb87590) nrpages=1, see LU-118 2018-10-24 02:11:26 [1500505.371195] LustreError: 423673:0:(llite_lib.c:2047:ll_delete_inode()) LBUG 2018-10-24 02:11:26 [1500505.379176] Pid: 423673, comm: rm 3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018 2018-10-24 02:11:26 [1500505.388966] Call Trace: 2018-10-24 02:11:26 [1500505.392442] [<ffffffffc03f97cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] 2018-10-24 02:11:26 [1500505.399994] [<ffffffffc03f987c>] lbug_with_loc+0x4c/0xa0 [libcfs] 2018-10-24 02:11:26 [1500505.407191] [<ffffffffc0accc87>] ll_delete_inode+0x1b7/0x1c0 [lustre] 2018-10-24 02:11:26 [1500505.414783] [<ffffffff88c3c504>] evict+0xb4/0x180 2018-10-24 02:11:26 [1500505.420646] [<ffffffff88c3ce0c>] iput+0xfc/0x190 2018-10-24 02:11:26 [1500505.426410] [<ffffffff88c3126e>] do_unlinkat+0x1ae/0x2d0 2018-10-24 02:11:26 [1500505.432773] [<ffffffff88c322eb>] SyS_unlinkat+0x1b/0x40 2018-10-24 02:11:26 [1500505.439065] [<ffffffff8912579b>] system_call_fastpath+0x22/0x27 2018-10-24 02:11:26 [1500505.446021] [<ffffffffffffffff>] 0xffffffffffffffff 2018-10-24 02:11:26 [1500505.452010] Kernel panic - not syncing: LBUG 2018-10-24 02:11:26 [1500505.457170] CPU: 27 PID: 423673 Comm: rm Kdump: loaded Tainted: P OE ------------ T 3.10.0-862.14.4.el7.x86_64 #1 2018-10-24 02:11:26 [1500505.469966] Hardware name: Dell Inc. PowerEdge R740/06G98X, BIOS 1.4.9 06/29/2018 2018-10-24 02:11:26 [1500505.478319] Call Trace: 2018-10-24 02:11:26 [1500505.481653] [<ffffffff89113754>] dump_stack+0x19/0x1b 2018-10-24 02:11:26 [1500505.487655] [<ffffffff8910d29f>] panic+0xe8/0x21f 2018-10-24 02:11:26 [1500505.493301] [<ffffffffc03f98cb>] lbug_with_loc+0x9b/0xa0 [libcfs] 2018-10-24 02:11:26 [1500505.500339] [<ffffffffc0accc87>] ll_delete_inode+0x1b7/0x1c0 [lustre] 2018-10-24 02:11:26 [1500505.507709] [<ffffffff88c3c504>] evict+0xb4/0x180 2018-10-24 02:11:26 [1500505.513354] [<ffffffff88c3ce0c>] iput+0xfc/0x190 2018-10-24 02:11:26 [1500505.518909] [<ffffffff88c3126e>] do_unlinkat+0x1ae/0x2d0 2018-10-24 02:11:26 [1500505.525140] [<ffffffff88c25304>] ? SYSC_newfstatat+0x34/0x60 2018-10-24 02:11:26 [1500505.531703] [<ffffffff88c322eb>] SyS_unlinkat+0x1b/0x40 2018-10-24 02:11:26 [1500505.537817] [<ffffffff8912579b>] system_call_fastpath+0x22/0x27 2018-10-24 02:11:26 [ 0.000000] Initializing cgroup subsys cpuset cheers, |
| Comment by SC Admin (Inactive) [ 24/Oct/18 ] |
|
Hi Lai, 2 more MDT hung threads/crashes today. will get logs in a min. at the moment this process is using 100% cpu on the DNE MDS. root 50259 99.7 0.0 0 0 ? R 12:44 53:47 [orph_cleanup_da] load is 28, and perf top says Samples: 64K of event 'cycles:ppp', Event count (approx.): 38756058742 Overhead Shared O Symbol 6.29% [kernel] [k] mutex_lock 5.86% [kernel] [k] _raw_qspin_lock 4.94% [kernel] [k] mutex_unlock 4.36% [kernel] [k] vsnprintf 4.04% [kernel] [k] arc_buf_access 3.30% [kernel] [k] format_decode 2.77% [kernel] [k] zap_leaf_lookup_closest 2.57% [kernel] [k] down_read 2.29% [kernel] [k] up_read 2.12% [kernel] [k] number.isra.2 1.89% [kernel] [k] zap_hash 1.79% [kernel] [k] kfree 1.67% [kernel] [k] dbuf_hash 1.62% [kernel] [k] __dbuf_hold_impl 1.52% [kernel] [k] dbuf_rele_and_unlock 1.48% [kernel] [k] __kmalloc_node 1.35% [kernel] [k] zrl_add_impl 1.25% [kernel] [k] arc_access.isra.22 1.22% [kernel] [k] strlen 1.21% [kernel] [k] _cond_resched 1.07% [kernel] [k] lprocfs_counter_add 1.03% [kernel] [k] zrl_remove 0.93% [kernel] [k] __memcpy cheers, |
| Comment by SC Admin (Inactive) [ 24/Oct/18 ] |
|
Hi Lai, conman for today on warble2 is attached. summary: I rebooted warble2 (and warble1) into the lustre with the 2 patches in this ticket at 2018-10-23 23:20 it lasted ok until 4:40am at which point the filesystem was hung. a find for directories that I was running hung at that time. the hung MDT thread stack trace was something we haven't seen before. 2018-10-24 04:08:38 [17238.179626] LustreError: 52406:0:(osd_handler.c:228:osd_trans_start()) dagg-MDT0000: can't assign tx: rc = -17 2018-10-24 04:16:13 [17692.238162] LustreError: 52406:0:(mdd_orphans.c:473:orph_index_iterate()) dagg-MDD0000: fail to get FID for orphan it: rc = -2 2018-10-24 04:18:40 [17838.835695] LustreError: 52406:0:(mdd_orphans.c:473:orph_index_iterate()) dagg-MDD0000: fail to get FID for orphan it: rc = -2 2018-10-24 04:26:12 [18290.135146] LustreError: 52406:0:(osd_handler.c:228:osd_trans_start()) dagg-MDT0000: can't assign tx: rc = -17 2018-10-24 04:46:41 [19515.898070] INFO: task mdt00_000:19928 blocked for more than 120 seconds. 2018-10-24 04:46:41 [19515.905759] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-24 04:46:41 [19515.914443] mdt00_000 D ffff8a5495781fa0 0 19928 2 0x00000000 2018-10-24 04:46:41 [19515.922347] Call Trace: 2018-10-24 04:46:41 [19515.925601] [<ffffffff99314029>] schedule+0x29/0x70 2018-10-24 04:46:41 [19515.931347] [<ffffffff99315915>] rwsem_down_write_failed+0x225/0x3a0 2018-10-24 04:46:41 [19515.938591] [<ffffffffc138aa30>] ? dmu_tx_hold_spill+0x20/0x40 [zfs] 2018-10-24 04:46:41 [19515.945787] [<ffffffff98f5b267>] call_rwsem_down_write_failed+0x17/0x30 2018-10-24 04:46:41 [19515.953239] [<ffffffffc07611e8>] ? spl_kmem_zalloc+0xd8/0x180 [spl] 2018-10-24 04:46:41 [19515.960330] [<ffffffff9931319d>] down_write+0x2d/0x3d 2018-10-24 04:46:41 [19515.966232] [<ffffffffc0cfbd3d>] llog_cat_declare_add_rec+0x4fd/0x630 [obdclass] 2018-10-24 04:46:41 [19515.974464] [<ffffffffc0cf3618>] llog_declare_add+0x78/0x1a0 [obdclass] 2018-10-24 04:46:41 [19515.981938] [<ffffffffc0f920fe>] top_trans_start+0x17e/0x940 [ptlrpc] 2018-10-24 04:46:41 [19515.989164] [<ffffffffc16ee3f1>] lod_trans_start+0x31/0x40 [lod] 2018-10-24 04:46:41 [19515.995939] [<ffffffffc178e1ea>] mdd_trans_start+0x1a/0x20 [mdd] 2018-10-24 04:46:41 [19516.002715] [<ffffffffc177bcbc>] mdd_rename+0x65c/0x14a0 [mdd] 2018-10-24 04:46:41 [19516.009307] [<ffffffffc12cf07a>] mdt_reint_rename_internal.isra.36+0x166a/0x20c0 [mdt] 2018-10-24 04:46:41 [19516.017985] [<ffffffffc12d13fb>] mdt_reint_rename_or_migrate.isra.39+0x19b/0x860 [mdt] 2018-10-24 04:46:41 [19516.026666] [<ffffffffc0eeaff0>] ? ldlm_blocking_ast_nocheck+0x310/0x310 [ptlrpc] 2018-10-24 04:46:41 [19516.034870] [<ffffffffc0ee5440>] ? ldlm_expired_completion_wait+0x240/0x240 [ptlrpc] 2018-10-24 04:46:41 [19516.043310] [<ffffffffc12d1af3>] mdt_reint_rename+0x13/0x20 [mdt] 2018-10-24 04:46:41 [19516.050113] [<ffffffffc12d5b33>] mdt_reint_rec+0x83/0x210 [mdt] 2018-10-24 04:46:41 [19516.056716] [<ffffffffc12b736b>] mdt_reint_internal+0x5fb/0x9c0 [mdt] 2018-10-24 04:46:41 [19516.063843] [<ffffffffc12c2f07>] mdt_reint+0x67/0x140 [mdt] 2018-10-24 04:46:41 [19516.070116] [<ffffffffc0f7e38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc] 2018-10-24 04:46:41 [19516.077586] [<ffffffffc0f26e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] 2018-10-24 04:46:41 [19516.085825] [<ffffffffc0f23478>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] 2018-10-24 04:46:41 [19516.093172] [<ffffffff98ccf682>] ? default_wake_function+0x12/0x20 2018-10-24 04:46:41 [19516.099979] [<ffffffff98cc52ab>] ? __wake_up_common+0x5b/0x90 2018-10-24 04:46:41 [19516.106364] [<ffffffffc0f2a592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] 2018-10-24 04:46:41 [19516.113165] [<ffffffffc0f29b00>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc] 2018-10-24 04:46:41 [19516.121062] [<ffffffff98cbb621>] kthread+0xd1/0xe0 2018-10-24 04:46:41 [19516.126438] [<ffffffff98cbb550>] ? insert_kthread_work+0x40/0x40 2018-10-24 04:46:41 [19516.133034] [<ffffffff993205dd>] ret_from_fork_nospec_begin+0x7/0x21 2018-10-24 04:46:41 [19516.139970] [<ffffffff98cbb550>] ? insert_kthread_work+0x40/0x40 the MDS was powered off (a MDT wouldn't umount like usual, had to power off) and rebooted at ~11:20 am. almost immediately multiple MDT threads were hung 2018-10-24 11:26:26 [ 1303.399664] Lustre: Skipped 2 previous similar messages 2018-10-24 11:26:26 [ 1303.420867] Lustre: dagg-MDT0001: Recovery over after 0:58, of 130 clients 127 recovered and 3 were evicted. 2018-10-24 11:26:52 [ 1329.318263] Lustre: dagg-MDT0001: Connection restored to 2e6d8ce0-afbe-2146-2dc7-13fe07e6b623 (at 192.168.44.209@o2ib44) 2018-10-24 11:26:52 [ 1329.330126] Lustre: Skipped 14 previous similar messages 2018-10-24 11:30:43 [ 1559.865427] INFO: task mdt_rdpg01_001:22882 blocked for more than 120 seconds. 2018-10-24 11:30:43 [ 1559.873617] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-24 11:30:43 [ 1559.882363] mdt_rdpg01_001 D ffff98b37adbcf10 0 22882 2 0x00000000 2018-10-24 11:30:43 [ 1559.890324] Call Trace: 2018-10-24 11:30:43 [ 1559.893659] [<ffffffffc088b1e8>] ? spl_kmem_zalloc+0xd8/0x180 [spl] 2018-10-24 11:30:43 [ 1559.900865] [<ffffffff90514029>] schedule+0x29/0x70 2018-10-24 11:30:43 [ 1559.906672] [<ffffffff90515915>] rwsem_down_write_failed+0x225/0x3a0 2018-10-24 11:30:43 [ 1559.913942] [<ffffffff9015b267>] call_rwsem_down_write_failed+0x17/0x30 2018-10-24 11:30:43 [ 1559.921469] [<ffffffffc088b1e8>] ? spl_kmem_zalloc+0xd8/0x180 [spl] 2018-10-24 11:30:43 [ 1559.928636] [<ffffffff9051319d>] down_write+0x2d/0x3d 2018-10-24 11:30:43 [ 1559.934620] [<ffffffffc1010d3d>] llog_cat_declare_add_rec+0x4fd/0x630 [obdclass] 2018-10-24 11:30:43 [ 1559.942896] [<ffffffffc0d882e7>] ? osp_update_request_create+0xa7/0x200 [osp] 2018-10-24 11:30:43 [ 1559.950934] [<ffffffffc1008618>] llog_declare_add+0x78/0x1a0 [obdclass] 2018-10-24 11:30:43 [ 1559.958479] [<ffffffffc17f70fe>] top_trans_start+0x17e/0x940 [ptlrpc] 2018-10-24 11:30:43 [ 1559.965779] [<ffffffffc11d93f1>] lod_trans_start+0x31/0x40 [lod] 2018-10-24 11:30:43 [ 1559.972625] [<ffffffffc16061ea>] mdd_trans_start+0x1a/0x20 [mdd] 2018-10-24 11:30:43 [ 1559.979462] [<ffffffffc15fbb69>] mdd_attr_set+0x589/0xce0 [mdd] 2018-10-24 11:30:43 [ 1559.986244] [<ffffffffc177fd57>] ? lustre_msg_add_version+0x27/0xa0 [ptlrpc] 2018-10-24 11:30:43 [ 1559.994119] [<ffffffffc0f4b206>] mdt_mfd_close+0x1a6/0x610 [mdt] 2018-10-24 11:30:43 [ 1560.000932] [<ffffffffc0f50981>] mdt_close_internal+0x121/0x220 [mdt] 2018-10-24 11:30:43 [ 1560.008164] [<ffffffffc0f50ca0>] mdt_close+0x220/0x780 [mdt] 2018-10-24 11:30:43 [ 1560.014616] [<ffffffffc17e338a>] tgt_request_handle+0x92a/0x1370 [ptlrpc] 2018-10-24 11:30:43 [ 1560.022177] [<ffffffffc178be4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] 2018-10-24 11:30:43 [ 1560.030502] [<ffffffffc1788478>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] 2018-10-24 11:30:43 [ 1560.037939] [<ffffffff8fecf682>] ? default_wake_function+0x12/0x20 2018-10-24 11:30:43 [ 1560.044843] [<ffffffff8fec52ab>] ? __wake_up_common+0x5b/0x90 2018-10-24 11:30:43 [ 1560.051313] [<ffffffffc178f592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] 2018-10-24 11:30:43 [ 1560.058208] [<ffffffffc178eb00>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc] 2018-10-24 11:30:43 [ 1560.066197] [<ffffffff8febb621>] kthread+0xd1/0xe0 2018-10-24 11:30:43 [ 1560.071670] [<ffffffff8febb550>] ? insert_kthread_work+0x40/0x40 2018-10-24 11:30:43 [ 1560.078344] [<ffffffff905205dd>] ret_from_fork_nospec_begin+0x7/0x21 2018-10-24 11:30:43 [ 1560.085351] [<ffffffff8febb550>] ? insert_kthread_work+0x40/0x40 2018-10-24 11:30:43 [ 1560.092021] INFO: task dist_txn-2:37466 blocked for more than 120 seconds. 2018-10-24 11:30:43 [ 1560.099440] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-24 11:30:43 [ 1560.107812] dist_txn-2 D ffff985343136eb0 0 37466 2 0x00000000 2018-10-24 11:30:43 [ 1560.115448] Call Trace: 2018-10-24 11:30:43 [ 1560.118444] [<ffffffff90514029>] schedule+0x29/0x70 2018-10-24 11:30:43 [ 1560.123940] [<ffffffff90515915>] rwsem_down_write_failed+0x225/0x3a0 2018-10-24 11:30:43 [ 1560.130911] [<ffffffffc0918718>] ? libcfs_debug_vmsg2+0x6d8/0xb40 [libcfs] 2018-10-24 11:30:43 [ 1560.138385] [<ffffffff9015b267>] call_rwsem_down_write_failed+0x17/0x30 2018-10-24 11:30:43 [ 1560.145605] [<ffffffff9051319d>] down_write+0x2d/0x3d 2018-10-24 11:30:43 [ 1560.151260] [<ffffffffc10111b9>] llog_cat_id2handle+0x69/0x5b0 [obdclass] 2018-10-24 11:30:43 [ 1560.158616] [<ffffffffc10124e0>] llog_cat_cancel_records+0x120/0x3d0 [obdclass] 2018-10-24 11:30:43 [ 1560.166479] [<ffffffffc0918bd7>] ? libcfs_debug_msg+0x57/0x80 [libcfs] 2018-10-24 11:30:43 [ 1560.173580] [<ffffffffc17f55e9>] distribute_txn_commit_thread+0x599/0xca0 [ptlrpc] 2018-10-24 11:30:43 [ 1560.181686] [<ffffffff8fecf670>] ? wake_up_state+0x20/0x20 2018-10-24 11:30:43 [ 1560.187730] [<ffffffffc17f5050>] ? sub_trans_commit_cb+0x20/0x20 [ptlrpc] 2018-10-24 11:30:43 [ 1560.195039] [<ffffffff8febb621>] kthread+0xd1/0xe0 2018-10-24 11:30:43 [ 1560.200369] [<ffffffff8febb550>] ? insert_kthread_work+0x40/0x40 2018-10-24 11:30:43 [ 1560.206892] [<ffffffff905205dd>] ret_from_fork_nospec_begin+0x7/0x21 2018-10-24 11:30:43 [ 1560.213761] [<ffffffff8febb550>] ? insert_kthread_work+0x40/0x40 2018-10-24 11:30:43 [ 1560.220277] INFO: task mdt01_008:39252 blocked for more than 120 seconds. 2018-10-24 11:30:43 [ 1560.227487] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2018-10-24 11:30:43 [ 1560.235740] mdt01_008 D ffff97f5ece00fd0 0 39252 2 0x00000000 2018-10-24 11:30:43 [ 1560.243269] Call Trace: 2018-10-24 11:30:43 [ 1560.246198] [<ffffffffc12840f7>] ? dbuf_rele_and_unlock+0x357/0x4b0 [zfs] ... and the same again after another reboot at 12:45 I took a -1 debug log at ~13:30. at 14:00 we gave up on this, and rebooted warble2 back into the centos-7.5-minimal-server-06/ image which is the same as 07/ except does not have these 2 patches in it. we then got a "normal" MDT thread hang very quickly after boot 2018-10-24 14:08:30 [ 453.651553] LNet: Service thread pid 18826 was inactive for 200.28s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: 2018-10-24 14:08:30 [ 453.670221] Pid: 18826, comm: mdt_rdpg01_000 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018 2018-10-24 14:08:30 [ 453.680691] Call Trace: 2018-10-24 14:08:30 [ 453.683975] [<ffffffffc1137047>] top_trans_wait_result+0xa6/0x155 [ptlrpc] 2018-10-24 14:08:30 [ 453.691845] [<ffffffffc111891b>] top_trans_stop+0x42b/0x930 [ptlrpc] 2018-10-24 14:08:30 [ 453.699177] [<ffffffffc14775f9>] lod_trans_stop+0x259/0x340 [lod] 2018-10-24 14:08:30 [ 453.706189] [<ffffffffc0de621a>] mdd_trans_stop+0x2a/0x46 [mdd] 2018-10-24 14:08:30 [ 453.713014] [<ffffffffc0ddbbcb>] mdd_attr_set+0x5eb/0xce0 [mdd] 2018-10-24 14:08:30 [ 453.719810] [<ffffffffc13f6206>] mdt_mfd_close+0x1a6/0x610 [mdt] 2018-10-24 14:08:30 [ 453.726703] [<ffffffffc13fb981>] mdt_close_internal+0x121/0x220 [mdt] 2018-10-24 14:08:30 [ 453.734030] [<ffffffffc13fbca0>] mdt_close+0x220/0x780 [mdt] 2018-10-24 14:08:30 [ 453.740547] [<ffffffffc110538a>] tgt_request_handle+0x92a/0x1370 [ptlrpc] 2018-10-24 14:08:30 [ 453.748254] [<ffffffffc10ade4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] 2018-10-24 14:08:30 [ 453.756697] [<ffffffffc10b1592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] 2018-10-24 14:08:30 [ 453.763736] [<ffffffff8b2bb621>] kthread+0xd1/0xe0 2018-10-24 14:08:30 [ 453.769363] [<ffffffff8b9205dd>] ret_from_fork_nospec_begin+0x7/0x21 2018-10-24 14:08:30 [ 453.776543] [<ffffffffffffffff>] 0xffffffffffffffff 2018-10-24 14:08:30 [ 453.782255] LustreError: dumping log to /tmp/lustre-log.1540350510.18826 I suspended all jobs, then rebooted warble1 into this 06/ image without these 2 patches and then rebooted warble2 again at ~2018-10-24 14:39 better luck this time, and it's been running jobs and ok for half an hour or so now... presumably the reduced load from having jobs suspended when it mounted the fs's helped. I'll upload the -1 debug daemon logs to the ftp now. additional information is that when umount'ing the MDTs at 14:30 I noticed the same [orph_cleanup_da] process running as in the previous comment in this ticket which around ~13:30. so it kinda looked like the MDTs were umounting/reconnecting even though they up but hung. cheers, |
| Comment by SC Admin (Inactive) [ 24/Oct/18 ] |
|
darn, sorry. forgot to upload the debug log until now. kinda assumed it would be useless to you so it slipped my mind. cheers, |
| Comment by Lai Siyao [ 25/Oct/18 ] |
|
https://review.whamcloud.com/33401 is not correct, which may cause such hung, I'll update it soon. |
| Comment by SC Admin (Inactive) [ 31/Oct/18 ] |
|
Hi Lai, should we be giving patch set 2 a try? BTW for some reason the machine has been fairly stable (without the patches in this ticket) for almost a couple of weeks now. perhaps the namespace lfsck that finished and corrected some things has helped. cheers, |
| Comment by Lai Siyao [ 31/Oct/18 ] |
|
yes, I added a test for set 2, which shouldn't cause system hung any more, you can give it a try now. |
| Comment by SC Admin (Inactive) [ 07/Nov/18 ] |
|
Hi Lai, we've just started running version 3 of the patch now. will let you know how it goes. cheers, |
| Comment by SC Admin (Inactive) [ 08/Nov/18 ] |
|
Hi Lai, no obvious signs of problems yet. I'll start a chgrp/chmod correcting sweep soon I think. FYI syslog messages so far are Nov 7 15:00:58 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2 Nov 7 15:01:16 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2 Nov 7 15:03:13 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2 Nov 7 16:03:56 warble2 kernel: LustreError: 23402:0:(llog_cat.c:737:llog_cat_cancel_records()) dagg-MDT0000-osp-MDT0002: fail to cancel 1 of 1 llog-records: rc = -116 Nov 7 21:30:22 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:473:orph_index_iterate()) dagg-MDD0000: fail to get FID for orphan it: rc = -2 Nov 7 21:36:20 warble2 kernel: LustreError: 62076:0:(osd_handler.c:228:osd_trans_start()) dagg-MDT0000: can't assign tx: rc = -17 Nov 7 21:36:21 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:473:orph_index_iterate()) dagg-MDD0000: fail to get FID for orphan it: rc = -2 Nov 8 09:44:06 warble2 kernel: LustreError: 62076:0:(osd_handler.c:228:osd_trans_start()) dagg-MDT0000: can't assign tx: rc = -17 Nov 8 09:47:23 warble2 kernel: LustreError: 62076:0:(osd_handler.c:228:osd_trans_start()) dagg-MDT0000: can't assign tx: rc = -17 Nov 8 10:20:37 warble2 kernel: LustreError: 62076:0:(osd_handler.c:228:osd_trans_start()) dagg-MDT0000: can't assign tx: rc = -17 Nov 8 10:28:13 warble2 kernel: LustreError: 62076:0:(osd_handler.c:228:osd_trans_start()) dagg-MDT0000: can't assign tx: rc = -17 Nov 8 12:05:28 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:473:orph_index_iterate()) dagg-MDD0000: fail to get FID for orphan it: rc = -2 Nov 8 12:06:26 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2 Nov 8 12:06:27 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2 Nov 8 12:07:11 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2 Nov 8 12:07:35 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2 Nov 8 12:07:43 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2 Nov 8 12:07:43 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) Skipped 1 previous similar message Nov 8 12:07:58 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2 Nov 8 12:07:58 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) Skipped 1 previous similar message Nov 8 12:08:07 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:473:orph_index_iterate()) dagg-MDD0000: fail to get FID for orphan it: rc = -2 Nov 8 12:08:21 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2 Nov 8 12:10:09 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) could not delete object: rc = -2 Nov 8 12:10:09 warble2 kernel: LustreError: 62076:0:(mdd_orphans.c:361:orphan_destroy()) Skipped 1 previous similar message as the last namespace lfsck completed ok (with the previous version of the patches in this ticket), should we now try a lfsck with -C to try and get rid of some of the above? we currently mount with -o skip_lfsck. should we umount and mount without this first, in order that an oi scrub cheers, |
| Comment by Lai Siyao [ 08/Nov/18 ] |
|
Do you know which operation generated these messages? And did it cause any error? 'lfsck' with '-C' is to create missing MDT objects, I'm afraid it won't fix the the orphan_destory() failure. I'd suggest you do a dry-run and post the result here. IMhO for the orphan issue, we need to wait for https://review.whamcloud.com/#/c/32959/ to become ready. |
| Comment by SC Admin (Inactive) [ 09/Nov/18 ] |
|
Hi Lai, perhaps those error messages could print a bit more info eg. uid/gid/fid/path? playing "guess which file or dir" across a fs with 100M files and 100's of users is kinda hard... or if you have a systemtap script or similar then I could try that... hmm. so I have just noticed that this process has been running 100% of the time since boot. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P WCHAN COMMAND 62076 root 20 0 0 0 0 R 100.0 0.0 3671:06 10 - orph_cleanup_da [warble2]root: uptime 03:41:49 up 2 days, 13:28, 1 user, load average: 4.37, 5.05, 3.71 ie. ~= 3688 minutes uptime and 3671 minutes of system time used by "orph_cleanup_da". the process is only running on the MDS that has the 3 DNE MDTs on it. and not the other MDS that has non-DNE MDTs for the smaller filesystems. I presume this process is a problem? I haven't noticed it using a whole core (or any time) previously, but TBH I haven't run 'top' than many times on the MDS. otherwise, things seem to be going well... no hangs, no crashes, no reports from users, and the chgrp/chmod script ran and finished ok. thanks for the lfsck advice. I ran this on the DNE MDS lctl lfsck_start -M dagg-MDT0000 -A -r -s 2000 -C -n and got this from layout (which is confusing 'cos it seems to never have started a phase2, nor completed) [warble2]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_layout name: lfsck_layout magic: 0xb17371b9 version: 2 status: scanning-phase2 flags: scanned-once param: dryrun,all_targets,create_mdtobj last_completed_time: N/A time_since_last_completed: N/A latest_start_time: 1541733352 time_since_latest_start: 48563 seconds last_checkpoint_time: 1541770550 time_since_last_checkpoint: 11365 seconds latest_start_position: 266 last_checkpoint_position: 35184372088832 first_failure_position: 454780 success_count: 0 inconsistent_dangling: 0 inconsistent_unmatched_pair: 0 inconsistent_multiple_referenced: 0 inconsistent_orphan: 0 inconsistent_inconsistent_owner: 485295 inconsistent_others: 0 skipped: 0 failed_phase1: 0 failed_phase2: 0 checked_phase1: 30949901 checked_phase2: 0 run_time_phase1: 37186 seconds run_time_phase2: 11365 seconds average_speed_phase1: 832 items/sec average_speed_phase2: 0 items/sec real-time_speed_phase1: N/A real-time_speed_phase2: 0 items/sec current_position: [0x0:0x0:0x0] name: lfsck_layout magic: 0xb17371b9 version: 2 status: scanning-phase2 flags: scanned-once param: dryrun,all_targets,create_mdtobj last_completed_time: N/A time_since_last_completed: N/A latest_start_time: 1541733353 time_since_latest_start: 48562 seconds last_checkpoint_time: 1541769672 time_since_last_checkpoint: 12243 seconds latest_start_position: 266 last_checkpoint_position: 35184372088832 first_failure_position: 100527 success_count: 0 inconsistent_dangling: 0 inconsistent_unmatched_pair: 0 inconsistent_multiple_referenced: 0 inconsistent_orphan: 0 inconsistent_inconsistent_owner: 476352 inconsistent_others: 0 skipped: 0 failed_phase1: 0 failed_phase2: 0 checked_phase1: 30021155 checked_phase2: 0 run_time_phase1: 36305 seconds run_time_phase2: 12243 seconds average_speed_phase1: 826 items/sec average_speed_phase2: 0 items/sec real-time_speed_phase1: N/A real-time_speed_phase2: 0 items/sec current_position: [0x0:0x0:0x0] name: lfsck_layout magic: 0xb17371b9 version: 2 status: scanning-phase2 flags: scanned-once param: dryrun,all_targets,create_mdtobj last_completed_time: N/A time_since_last_completed: N/A latest_start_time: 1541733353 time_since_latest_start: 48562 seconds last_checkpoint_time: 1541769893 time_since_last_checkpoint: 12022 seconds latest_start_position: 266 last_checkpoint_position: 35184372088832 first_failure_position: 103676 success_count: 0 inconsistent_dangling: 1 inconsistent_unmatched_pair: 0 inconsistent_multiple_referenced: 0 inconsistent_orphan: 0 inconsistent_inconsistent_owner: 477885 inconsistent_others: 0 skipped: 0 failed_phase1: 0 failed_phase2: 0 checked_phase1: 30231327 checked_phase2: 0 run_time_phase1: 36526 seconds run_time_phase2: 12022 seconds average_speed_phase1: 827 items/sec average_speed_phase2: 0 items/sec real-time_speed_phase1: N/A real-time_speed_phase2: 0 items/sec current_position: [0x0:0x0:0x0] and this from namespace [warble2]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace name: lfsck_namespace magic: 0xa0621a0b version: 2 status: completed flags: inconsistent param: dryrun,all_targets,create_mdtobj last_completed_time: 1541777443 time_since_last_completed: 4530 seconds latest_start_time: 1541733353 time_since_latest_start: 48620 seconds last_checkpoint_time: 1541777443 time_since_last_checkpoint: 4530 seconds latest_start_position: 266, N/A, N/A last_checkpoint_position: 35184372088832, N/A, N/A first_failure_position: 5603881, [0x200011573:0x1:0x0], 0x2e1b6311f67b0000 checked_phase1: 35926809 checked_phase2: 13462985 inconsistent_phase1: 5 inconsistent_phase2: 23 failed_phase1: 0 failed_phase2: 1 directories: 4854362 dirent_inconsistent: 0 linkea_inconsistent: 0 nlinks_inconsistent: 0 multiple_linked_checked: 548868 multiple_linked_inconsistent: 0 unknown_inconsistency: 0 unmatched_pairs_inconsistent: 0 dangling_inconsistent: 34 multiple_referenced_inconsistent: 0 bad_file_type_inconsistent: 0 lost_dirent_inconsistent: 1 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 1 striped_dirs_inconsistent: 1 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 3625256 striped_shards_inconsistent: 0 striped_shards_failed: 0 striped_shards_skipped: 261 name_hash_inconsistent: 0 linkea_overflow_inconsistent: 0 success_count: 13 run_time_phase1: 37186 seconds run_time_phase2: 6892 seconds average_speed_phase1: 966 items/sec average_speed_phase2: 1953 objs/sec average_speed_total: 1120 items/sec real_time_speed_phase1: N/A real_time_speed_phase2: N/A current_position: N/A name: lfsck_namespace magic: 0xa0621a0b version: 2 status: completed flags: inconsistent param: dryrun,all_targets,create_mdtobj last_completed_time: 1541777356 time_since_last_completed: 4617 seconds latest_start_time: 1541733353 time_since_latest_start: 48620 seconds last_checkpoint_time: 1541777356 time_since_last_checkpoint: 4617 seconds latest_start_position: 266, N/A, N/A last_checkpoint_position: 35184372088832, N/A, N/A first_failure_position: 557060, [0x280020f73:0x1:0x0], 0x637c6569a8f78000 checked_phase1: 35257340 checked_phase2: 13279657 inconsistent_phase1: 5 inconsistent_phase2: 0 failed_phase1: 0 failed_phase2: 2 directories: 4833782 dirent_inconsistent: 0 linkea_inconsistent: 0 nlinks_inconsistent: 0 multiple_linked_checked: 548660 multiple_linked_inconsistent: 0 unknown_inconsistency: 0 unmatched_pairs_inconsistent: 0 dangling_inconsistent: 90 multiple_referenced_inconsistent: 0 bad_file_type_inconsistent: 0 lost_dirent_inconsistent: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 2 striped_dirs_inconsistent: 2 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 3625273 striped_shards_inconsistent: 0 striped_shards_failed: 0 striped_shards_skipped: 261 name_hash_inconsistent: 0 linkea_overflow_inconsistent: 0 success_count: 17 run_time_phase1: 36305 seconds run_time_phase2: 6805 seconds average_speed_phase1: 971 items/sec average_speed_phase2: 1951 objs/sec average_speed_total: 1125 items/sec real_time_speed_phase1: N/A real_time_speed_phase2: N/A current_position: N/A name: lfsck_namespace magic: 0xa0621a0b version: 2 status: completed flags: inconsistent param: dryrun,all_targets,create_mdtobj last_completed_time: 1541777353 time_since_last_completed: 4620 seconds latest_start_time: 1541733353 time_since_latest_start: 48620 seconds last_checkpoint_time: 1541777353 time_since_last_checkpoint: 4620 seconds latest_start_position: 266, N/A, N/A last_checkpoint_position: 35184372088832, N/A, N/A first_failure_position: 24644277, [0x68003ff89:0x20:0x0], 0x1a11d7ef27b38000 checked_phase1: 35471422 checked_phase2: 13283300 inconsistent_phase1: 26 inconsistent_phase2: 0 failed_phase1: 0 failed_phase2: 1 directories: 4841332 dirent_inconsistent: 0 linkea_inconsistent: 0 nlinks_inconsistent: 0 multiple_linked_checked: 549112 multiple_linked_inconsistent: 0 unknown_inconsistency: 0 unmatched_pairs_inconsistent: 0 dangling_inconsistent: 102 multiple_referenced_inconsistent: 0 bad_file_type_inconsistent: 0 lost_dirent_inconsistent: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 1 striped_dirs_inconsistent: 1 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 3625341 striped_shards_inconsistent: 0 striped_shards_failed: 0 striped_shards_skipped: 261 name_hash_inconsistent: 0 linkea_overflow_inconsistent: 0 success_count: 17 run_time_phase1: 36526 seconds run_time_phase2: 6802 seconds average_speed_phase1: 971 items/sec average_speed_phase2: 1952 objs/sec average_speed_total: 1125 items/sec real_time_speed_phase1: N/A real_time_speed_phase2: N/A current_position: N/A further advice would be much appreciated, but I will try not to do anything further today 'cos I don't want to break things over the weekend. cheers, |
| Comment by SC Admin (Inactive) [ 10/Nov/18 ] |
|
Hi Lai, FYI the "orph_cleanup_da" thread seems to be permanently looping doing ~900k reads/second to ZFS. [warble2]root: arcstat.py 5
time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
16:45:07 766 0 0 0 0 0 0 0 0 73G 74G
16:45:12 899K 54 0 54 0 0 0 54 0 73G 74G
16:45:17 899K 52 0 52 0 0 0 52 0 73G 74G
16:45:22 900K 50 0 50 0 0 0 50 0 73G 74G
...
the 900k rate is independent of (and at least 50x greater than) the actual iops we graph going to the dagg MDTs (typically 1k to 5k iops - probably 766 over the lifetime of the pool - the first number in arcstat). so I think the reads/s to ZFS's ARC are just that thread doing something crazy and nothing to do with any actual iops to the MDTs. presumably it's in an infinite loop of some sort, and 900k is just as fast as it can go. hence the 100% system time on that core. would it help to get a sysrq stack listing or a 'perf top' or something for you? cheers, |
| Comment by SC Admin (Inactive) [ 10/Nov/18 ] |
|
ah, ok - the MDS graphs of ZFS stats seem to indicate that the cheers, |
| Comment by Lai Siyao [ 12/Nov/18 ] |
|
Yes, it looks to be so, I'll check code to see how to improve it. As for the lfsck result, I need some time to verify, will update you later. |
| Comment by Gerrit Updater [ 13/Nov/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33401/ |
| Comment by Peter Jones [ 13/Nov/18 ] |
|
So do I understand correctly that there will likely be at least one more patch to come tracked by this ticket? |
| Comment by Lai Siyao [ 13/Nov/18 ] |
|
yes, Peter. |
| Comment by Lai Siyao [ 15/Nov/18 ] |
|
I just uploaded a patch to fix the issue of orphan cleanup dead loop: You can wait autotest finish and then update MDS, if everything is okay. Then you can try lfsck in dry-run mode again, and post the result here. |
| Comment by SC Admin (Inactive) [ 22/Nov/18 ] |
|
Hi Lai, thanks. we've been running the 2nd patch in https://review.whamcloud.com/#/c/33662/ on the MDS's for almost 2 days now. I will update the OSS's to this version too and lfsck when there's a quiet time - unfortunately the filesystem is very busy at the moment. cheers, |
| Comment by Lai Siyao [ 06/Dec/18 ] |
|
Any update? |
| Comment by SC Admin (Inactive) [ 06/Dec/18 ] |
|
Hi Lai, lustre's been stable for the last 2 week since that patch. I haven't run the lfsck yet, or explicitly tried to run stress tests to try to break anything. I'll kick off a lfsck dry run today. cheers, |
| Comment by Peter Jones [ 09/Dec/18 ] |
|
Robin Is no news good news? Peter |
| Comment by SC Admin (Inactive) [ 09/Dec/18 ] |
|
Hi Peter and Lai, not sure if it's good or bad this is namespace which has completed now [warble2]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace name: lfsck_namespace magic: 0xa0621a0b version: 2 status: completed flags: inconsistent param: dryrun,all_targets,create_mdtobj last_completed_time: 1544252914 time_since_last_completed: 75442 seconds latest_start_time: 1544075850 time_since_latest_start: 252506 seconds last_checkpoint_time: 1544252914 time_since_last_checkpoint: 75442 seconds latest_start_position: 266, N/A, N/A last_checkpoint_position: 35184372088832, N/A, N/A first_failure_position: 5603849, [0x200011573:0x1:0x0], 0x3d0b02e56e00000 checked_phase1: 150469049 checked_phase2: 21386549 inconsistent_phase1: 6 inconsistent_phase2: 22 failed_phase1: 0 failed_phase2: 1 directories: 10159430 dirent_inconsistent: 0 linkea_inconsistent: 0 nlinks_inconsistent: 0 multiple_linked_checked: 886537 multiple_linked_inconsistent: 0 unknown_inconsistency: 0 unmatched_pairs_inconsistent: 0 dangling_inconsistent: 35 multiple_referenced_inconsistent: 0 bad_file_type_inconsistent: 0 lost_dirent_inconsistent: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 1 striped_dirs_inconsistent: 1 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 4492951 striped_shards_inconsistent: 0 striped_shards_failed: 0 striped_shards_skipped: 261 name_hash_inconsistent: 0 linkea_overflow_inconsistent: 0 success_count: 14 run_time_phase1: 164481 seconds run_time_phase2: 12404 seconds average_speed_phase1: 914 items/sec average_speed_phase2: 1724 objs/sec average_speed_total: 971 items/sec real_time_speed_phase1: N/A real_time_speed_phase2: N/A current_position: N/A name: lfsck_namespace magic: 0xa0621a0b version: 2 status: completed flags: inconsistent param: dryrun,all_targets,create_mdtobj last_completed_time: 1544251904 time_since_last_completed: 76452 seconds latest_start_time: 1544075851 time_since_latest_start: 252505 seconds last_checkpoint_time: 1544251904 time_since_last_checkpoint: 76452 seconds latest_start_position: 266, N/A, N/A last_checkpoint_position: 35184372088832, N/A, N/A first_failure_position: 557060, [0x280020f73:0x1:0x0], 0x198ca8b508be8000 checked_phase1: 48829719 checked_phase2: 20829132 inconsistent_phase1: 5 inconsistent_phase2: 0 failed_phase1: 0 failed_phase2: 2 directories: 5965672 dirent_inconsistent: 0 linkea_inconsistent: 0 nlinks_inconsistent: 0 multiple_linked_checked: 478400 multiple_linked_inconsistent: 0 unknown_inconsistency: 0 unmatched_pairs_inconsistent: 0 dangling_inconsistent: 163 multiple_referenced_inconsistent: 0 bad_file_type_inconsistent: 0 lost_dirent_inconsistent: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 2 striped_dirs_inconsistent: 2 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 4474054 striped_shards_inconsistent: 0 striped_shards_failed: 0 striped_shards_skipped: 261 name_hash_inconsistent: 0 linkea_overflow_inconsistent: 0 success_count: 18 run_time_phase1: 56418 seconds run_time_phase2: 11394 seconds average_speed_phase1: 865 items/sec average_speed_phase2: 1828 objs/sec average_speed_total: 1027 items/sec real_time_speed_phase1: N/A real_time_speed_phase2: N/A current_position: N/A name: lfsck_namespace magic: 0xa0621a0b version: 2 status: completed flags: inconsistent param: dryrun,all_targets,create_mdtobj last_completed_time: 1544251913 time_since_last_completed: 76443 seconds latest_start_time: 1544075851 time_since_latest_start: 252505 seconds last_checkpoint_time: 1544251913 time_since_last_checkpoint: 76443 seconds latest_start_position: 266, N/A, N/A last_checkpoint_position: 35184372088832, N/A, N/A first_failure_position: 24644277, [0x68003ff89:0x20:0x0], 0x1a11d7ef27b38000 checked_phase1: 49406594 checked_phase2: 20846709 inconsistent_phase1: 26 inconsistent_phase2: 1 failed_phase1: 0 failed_phase2: 1 directories: 5984370 dirent_inconsistent: 0 linkea_inconsistent: 0 nlinks_inconsistent: 0 multiple_linked_checked: 479388 multiple_linked_inconsistent: 0 unknown_inconsistency: 0 unmatched_pairs_inconsistent: 0 dangling_inconsistent: 160 multiple_referenced_inconsistent: 0 bad_file_type_inconsistent: 0 lost_dirent_inconsistent: 1 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 1 striped_dirs_inconsistent: 1 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 4474199 striped_shards_inconsistent: 0 striped_shards_failed: 0 striped_shards_skipped: 261 name_hash_inconsistent: 0 linkea_overflow_inconsistent: 0 success_count: 18 run_time_phase1: 57069 seconds run_time_phase2: 11403 seconds average_speed_phase1: 865 items/sec average_speed_phase2: 1828 objs/sec average_speed_total: 1026 items/sec real_time_speed_phase1: N/A real_time_speed_phase2: N/A current_position: N/A layout is still running. I think it's stuck [warble2]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_layout name: lfsck_layout magic: 0xb17371b9 version: 2 status: scanning-phase2 flags: scanned-once param: dryrun,all_targets,create_mdtobj last_completed_time: 1541789825 time_since_last_completed: 2538610 seconds latest_start_time: 1544075847 time_since_latest_start: 252588 seconds last_checkpoint_time: 1544240510 time_since_last_checkpoint: 87925 seconds latest_start_position: 266 last_checkpoint_position: 35184372088832 first_failure_position: 436180 success_count: 1 inconsistent_dangling: 0 inconsistent_unmatched_pair: 0 inconsistent_multiple_referenced: 0 inconsistent_orphan: 0 inconsistent_inconsistent_owner: 532160 inconsistent_others: 0 skipped: 0 failed_phase1: 0 failed_phase2: 0 checked_phase1: 140103851 checked_phase2: 0 run_time_phase1: 164481 seconds run_time_phase2: 87926 seconds average_speed_phase1: 851 items/sec average_speed_phase2: 0 items/sec real-time_speed_phase1: N/A real-time_speed_phase2: 0 items/sec current_position: [0x0:0x0:0x0] name: lfsck_layout magic: 0xb17371b9 version: 2 status: scanning-phase2 flags: scanned-once param: dryrun,all_targets,create_mdtobj last_completed_time: 1541789834 time_since_last_completed: 2538601 seconds latest_start_time: 1544075851 time_since_latest_start: 252584 seconds last_checkpoint_time: 1544132317 time_since_last_checkpoint: 196118 seconds latest_start_position: 266 last_checkpoint_position: 35184372088832 first_failure_position: 100570 success_count: 1 inconsistent_dangling: 0 inconsistent_unmatched_pair: 0 inconsistent_multiple_referenced: 0 inconsistent_orphan: 0 inconsistent_inconsistent_owner: 519932 inconsistent_others: 0 skipped: 0 failed_phase1: 0 failed_phase2: 0 checked_phase1: 42500398 checked_phase2: 0 run_time_phase1: 56418 seconds run_time_phase2: 196119 seconds average_speed_phase1: 753 items/sec average_speed_phase2: 0 items/sec real-time_speed_phase1: N/A real-time_speed_phase2: 0 items/sec current_position: [0x0:0x0:0x0] name: lfsck_layout magic: 0xb17371b9 version: 2 status: scanning-phase2 flags: scanned-once param: dryrun,all_targets,create_mdtobj last_completed_time: 1541789846 time_since_last_completed: 2538589 seconds latest_start_time: 1544075851 time_since_latest_start: 252584 seconds last_checkpoint_time: 1544132971 time_since_last_checkpoint: 195464 seconds latest_start_position: 266 last_checkpoint_position: 35184372088832 first_failure_position: 103878 success_count: 1 inconsistent_dangling: 2 inconsistent_unmatched_pair: 0 inconsistent_multiple_referenced: 0 inconsistent_orphan: 0 inconsistent_inconsistent_owner: 521695 inconsistent_others: 0 skipped: 0 failed_phase1: 0 failed_phase2: 0 checked_phase1: 43095736 checked_phase2: 0 run_time_phase1: 57069 seconds run_time_phase2: 195464 seconds average_speed_phase1: 755 items/sec average_speed_phase2: 0 items/sec real-time_speed_phase1: N/A real-time_speed_phase2: 0 items/sec current_position: [0x0:0x0:0x0] the only thing changing in the output above is the time counters. the lfsck/lfsk_layout processes are still there but not showing up as being active in 'top'. I've stopped it now -> [warble2]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_layout name: lfsck_layout magic: 0xb17371b9 version: 2 status: stopped flags: scanned-once param: dryrun,all_targets,create_mdtobj last_completed_time: 1541789825 time_since_last_completed: 2539497 seconds latest_start_time: 1544075847 time_since_latest_start: 253475 seconds last_checkpoint_time: 1544329255 time_since_last_checkpoint: 67 seconds latest_start_position: 266 last_checkpoint_position: 35184372088832 first_failure_position: 436180 success_count: 1 inconsistent_dangling: 0 inconsistent_unmatched_pair: 0 inconsistent_multiple_referenced: 0 inconsistent_orphan: 0 inconsistent_inconsistent_owner: 532160 inconsistent_others: 0 inconsistent_inconsistent_owner: 532160 inconsistent_others: 0 skipped: 0 failed_phase1: 0 failed_phase2: 0 checked_phase1: 140103851 checked_phase2: 0 run_time_phase1: 164481 seconds run_time_phase2: 88745 seconds average_speed_phase1: 851 items/sec average_speed_phase2: 0 objs/sec real-time_speed_phase1: N/A real-time_speed_phase2: N/A current_position: N/A name: lfsck_layout magic: 0xb17371b9 version: 2 status: stopped flags: scanned-once param: dryrun,all_targets,create_mdtobj last_completed_time: 1541789834 time_since_last_completed: 2539488 seconds latest_start_time: 1544075851 time_since_latest_start: 253471 seconds last_checkpoint_time: 1544329255 time_since_last_checkpoint: 67 seconds latest_start_position: 266 last_checkpoint_position: 35184372088832 first_failure_position: 100570 success_count: 1 inconsistent_dangling: 0 inconsistent_unmatched_pair: 0 inconsistent_multiple_referenced: 0 inconsistent_orphan: 0 inconsistent_inconsistent_owner: 519932 inconsistent_others: 0 skipped: 0 failed_phase1: 0 failed_phase2: 0 checked_phase1: 42500398 checked_phase2: 0 run_time_phase1: 56418 seconds run_time_phase2: 196938 seconds run_time_phase1: 56418 seconds run_time_phase2: 196938 seconds average_speed_phase1: 753 items/sec average_speed_phase2: 0 objs/sec real-time_speed_phase1: N/A real-time_speed_phase2: N/A current_position: N/A name: lfsck_layout magic: 0xb17371b9 version: 2 status: stopped flags: scanned-once param: dryrun,all_targets,create_mdtobj last_completed_time: 1541789846 time_since_last_completed: 2539476 seconds latest_start_time: 1544075851 time_since_latest_start: 253471 seconds last_checkpoint_time: 1544329255 time_since_last_checkpoint: 67 seconds latest_start_position: 266 last_checkpoint_position: 35184372088832 first_failure_position: 103878 success_count: 1 inconsistent_dangling: 2 inconsistent_unmatched_pair: 0 inconsistent_multiple_referenced: 0 inconsistent_orphan: 0 inconsistent_inconsistent_owner: 521695 inconsistent_others: 0 skipped: 0 failed_phase1: 0 failed_phase2: 0 checked_phase1: 43095736 checked_phase2: 0 run_time_phase1: 57069 seconds run_time_phase2: 196283 seconds average_speed_phase1: 755 items/sec average_speed_phase2: 0 objs/sec real-time_speed_phase1: N/A real-time_speed_phase2: N/A current_position: N/A I should point out that we're still mounting these 3 MDTs with -o skip_lfsck. cheers, |
| Comment by Lai Siyao [ 10/Dec/18 ] |
|
Robin, I'm still working on a patch to fix bugs in striped directory lfsck, I'll update you when it's finished, and I think it may help because many inconsistencies in your system are in striped directory. And next time when you think system stalls, can you use sysrq to dump all processes backtraces on all servers, and it's also helps to dump debug logs. |
| Comment by Gerrit Updater [ 23/Jan/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33661/ |
| Comment by Peter Jones [ 23/Jan/19 ] |
|
The existing patch has landed for 2.13 and could now potentially be included in 2.10.x or 2.12.x maintenance releases. Lai, if you're still working on a further patch, what ticket is it being tracked under? |
| Comment by Lai Siyao [ 24/Jan/19 ] |
|
Peter, it's tracked under |
| Comment by Gerrit Updater [ 26/Feb/19 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34326 |
| Comment by Gerrit Updater [ 26/Feb/19 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34327 |
| Comment by Gerrit Updater [ 01/Apr/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34326/ |