[LU-1797] MDS deactivates/reactivates OSTs Created: 28/Aug/12 Updated: 09/Jan/20 Resolved: 09/Jan/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Cliff White (Inactive) | Assignee: | Oleg Drokin |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre 2.2.93, LLNL/Hyperion |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 8543 |
| Description |
|
Running the SWL test suite (mixture of IO jobs spread across all clients) I am seeing this on multiple OSTs. Aug 28 05:11:41 ehyperion-dit34 kernel: Lustre: lustre-OST000d: Client lustre-MDT0000-mdtlov_UUID (at 192.168.127.6@o2ib1) reconnecting Aug 28 05:11:41 ehyperion-dit34 kernel: Lustre: lustre-OST000d: received MDS connection from 192.168.127.6@o2ib1 Aug 28 05:11:41 ehyperion-rst6 kernel: Lustre: 3947:0:(client.c:1920:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1346155795/real 1346155795] req@ffff880093a2d000 x1411378033913049/t0(0) o5->lustre-OST000d-osc-MDT0000@192.168.127.65@o2ib1:7/4 lens 432/432 e 0 to 1 dl 1346155901 ref 1 fl Rpc:RXN/0/ffffffff rc 0/-1 Aug 28 05:11:41 ehyperion-rst6 kernel: Lustre: lustre-OST000d-osc-MDT0000: Connection to lustre-OST000d (at 192.168.127.65@o2ib1) was lost; in progress operations using this service will wait for recovery to complete Aug 28 05:11:41 ehyperion-rst6 kernel: Lustre: lustre-OST000d-osc-MDT0000: Connection restored to lustre-OST000d (at 192.168.127.65@o2ib1) Aug 28 05:11:41 ehyperion-rst6 kernel: Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST000d_UUID now active, resetting orphans Aug 28 05:11:55 ehyperion-rst6 kernel: LustreError: 3947:0:(osc_create.c:169:osc_interpret_create()) @@@ Unknown rc -107 from async create: failing oscc req@ffff880093a2d000 x1411378033913049/t0(0) o5->lustre-OST000d-osc-MDT0000@192.168.127.65@o2ib1:7/4 lens 432/432 e 0 to 1 dl 1346155901 ref 1 fl Interpret:RXN/0/ffffffff rc -107/-1 Aug 28 05:12:31 ehyperion-rst6 kernel: LustreError: 12495:0:(mds_lov.c:883:__mds_lov_synchronize()) lustre-OST000d_UUID failed at mds_lov_clear_orphans: -5 Aug 28 05:12:31 ehyperion-rst6 kernel: LustreError: 12495:0:(mds_lov.c:903:__mds_lov_synchronize()) lustre-OST000d_UUID sync failed -5, deactivating Aug 28 05:23:07 ehyperion-rst6 kernel: Lustre: 3948:0:(client.c:1920:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1346156476/real 1346156476] req@ffff8801fa613400 x1411378034235068/t0(0) o400->lustre-OST000d-osc-MDT0000@192.168.127.65@o2ib1:28/4 lens 224/224 e 0 to 1 dl 1346156587 ref 1 fl Rpc:RXN/0/ffffffff rc 0/-1 Aug 28 05:23:07 ehyperion-rst6 kernel: Lustre: lustre-OST000d-osc-MDT0000: Connection to lustre-OST000d (at 192.168.127.65@o2ib1) was lost; in progress operations using this service will wait for recovery to complete Aug 28 05:23:07 ehyperion-dit34 kernel: Lustre: lustre-OST000d: Client lustre-MDT0000-mdtlov_UUID (at 192.168.127.6@o2ib1) reconnecting Aug 28 05:23:07 ehyperion-dit34 kernel: Lustre: lustre-OST000d: received MDS connection from 192.168.127.6@o2ib1 Aug 28 05:23:07 ehyperion-rst6 kernel: Lustre: lustre-OST000d-osc-MDT0000: Connection restored to lustre-OST000d (at 192.168.127.65@o2ib1) Aug 28 05:23:07 ehyperion-rst6 kernel: Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST000d_UUID now active, resetting orphans In this case, the OST recovered without intervention. In other cases 'lctl --device NN recover' has fixed the problem, resulting msg:
Aug 28 07:56:04 ehyperion-dit33 kernel: Lustre: 7426:0:(llog_net.c:162:llog_receptor_accept()) changing the import ffff8802dbaaf800 - ffff8802cf6ff000
Aug 28 07:56:04 ehyperion-dit33 kernel: Lustre: 7426:0:(llog_net.c:162:llog_receptor_accept()) Skipped 1 previous similar message
This does not appear to be causing SWL tests to fail at this time. |
| Comments |
| Comment by Cliff White (Inactive) [ 28/Aug/12 ] |
|
The OSS node does not report errors, nor does it inactivate connections. There are no client-side errors reported. |
| Comment by Peter Jones [ 28/Aug/12 ] |
|
Oleg Could you please look into this one? Peter |
| Comment by Oleg Drokin [ 28/Aug/12 ] |
|
Hm, it almost looks like we are going through del orphan, all is fine, we progress further into precreate and get disconnected there for some reason again, and then the check for osc in recovery fails and we bail out with EIO. The other separate question is how come OST did not reply to the create request? Even if OST is busy on the backend, at least early replies should still be flowing. Would be cool to get a matching lctl dk from ost as well, but I guess under this kind of load the buffer rotates quite quickly and it would be useless. |
| Comment by Cliff White (Inactive) [ 28/Aug/12 ] |
|
Attached is lctl dk from a failure (lustre-OST0032) |
| Comment by Cliff White (Inactive) [ 28/Aug/12 ] |
|
I have uploaded lu-1797-1538.txt.gz to FTP. That should be an lctl dk with all the calls included. Aug 28 15:35:40 ehyperion-rst6 kernel: Lustre: 3942:0:(client.c:1920:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1346193234/real 1346193234] req@ffff880014297400 x1411378058874282/t0(0) o5->lustre-OST000d-osc-MDT0000@192.168.127.65@o2ib1:7/4 lens 432/432 e 0 to 1 dl 1346193340 ref 1 fl Rpc:RXN/0/ffffffff rc 0/-1 Aug 28 15:35:40 ehyperion-rst6 kernel: Lustre: lustre-OST000d-osc-MDT0000: Connection to lustre-OST000d (at 192.168.127.65@o2ib1) was lost; in progress operations using this service will wait for recovery to complete Aug 28 15:35:40 ehyperion-rst6 kernel: Lustre: lustre-OST000d-osc-MDT0000: Connection restored to lustre-OST000d (at 192.168.127.65@o2ib1) Aug 28 15:35:40 ehyperion-rst6 kernel: Lustre: Skipped 1 previous similar message Aug 28 15:35:40 ehyperion-rst6 kernel: Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST000d_UUID now active, resetting orphans Aug 28 15:35:51 ehyperion-rst6 kernel: LustreError: 3942:0:(osc_create.c:169:osc_interpret_create()) @@@ Unknown rc -107 from async create: failing oscc req@ffff880014297400 x1411378058874282/t0(0) o5->lustre-OST000d-osc-MDT0000@192.168.127.65@o2ib1:7/4 lens 432/432 e 0 to 1 dl 1346193340 ref 1 fl Interpret:RXN/0/ffffffff rc -107/-1 Aug 28 15:36:30 ehyperion-rst6 kernel: LustreError: 15560:0:(lov_obd.c:1063:lov_clear_orphans()) error in orphan recovery on OST idx 13/60: rc = -5 Aug 28 15:36:30 ehyperion-rst6 kernel: LustreError: 15560:0:(mds_lov.c:883:__mds_lov_synchronize()) lustre-OST000d_UUID failed at mds_lov_clear_orphans: -5 Aug 28 15:36:31 ehyperion-rst6 kernel: LustreError: 15560:0:(mds_lov.c:903:__mds_lov_synchronize()) lustre-OST000d_UUID sync failed -5, deactivating Aug 28 15:38:20 ehyperion-rst6 kernel: cannot allocate a tage (1) Aug 28 15:38:20 ehyperion-rst6 kernel: cannot allocate a tage (4) Aug 28 15:39:06 ehyperion-rst6 kernel: Lustre: 3942:0:(client.c:1920:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1346193440/real 1346193440] req@ffff8801f8cca800 x1411378059090858/t0(0) o400->lustre-OST000d-osc-MDT0000@192.168.127.65@o2ib1:28/4 lens 224/224 e 0 to 1 dl 1346193546 ref 1 fl Rpc:RXN/0/ffffffff rc 0/-1 Aug 28 15:39:06 ehyperion-rst6 kernel: Lustre: lustre-OST000d-osc-MDT0000: Connection to lustre-OST000d (at 192.168.127.65@o2ib1) was lost; in progress operations using this service will wait for recovery to complete Aug 28 15:39:06 ehyperion-rst6 kernel: Lustre: lustre-OST000d-osc-MDT0000: Connection restored to lustre-OST000d (at 192.168.127.65@o2ib1) Aug 28 15:39:06 ehyperion-rst6 kernel: Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST000d_UUID now active, resetting orphans |
| Comment by Oleg Drokin [ 28/Aug/12 ] |
|
Latest logs shed some light onto what's going on. OST becomes very overloaded and creates start to take a lot of time (due to other disk activity I guess). Later one in the original set of logs for case #1 we see that a ping times out for some reason (network packet loss or something) which triggers reconnect and clears the inactive flag. So in the end I imagine for the "Fix" we need to actually schedule a later reconnect (or even precreate) atttempt for such busy OSTs later on and once the load eases, enable them back. Some of the messages would need to be silenced as well. |
| Comment by Peter Jones [ 30/Aug/12 ] |
|
Dropping priority as this appears to be a long-existing issue |
| Comment by Cliff White (Inactive) [ 30/Aug/12 ] |
|
Could not replicate with 1->3 OSS, 2->24 OST. Returned to normal testing, inbetween parallel-scale runs had one OST drop. Obtained lctl dk from OST, but suspect it was too old. LustreError: 26333:0:(mdc_request.c:1429:mdc_quotactl()) ptlrpc_queue_wait failed, rc: -114 192.168.127.63: Lustre: DEBUG MARKER: == parallel-scale test iorfpp: iorfpp == 18:30:49 (1346376649) |
| Comment by Cliff White (Inactive) [ 30/Sep/12 ] |
|
We are hitting this bug again with the lustre-review 9573 build hyperion-rst6 login: Lustre: 4171:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1349040375/real 1349040375] req@ffff88020629a000 x1414547145605102/t0(0) o5->lustre-OST0034-osc-MDT0000@192.168.127.64@o2ib1:7/4 lens 432/432 e 0 to 1 dl 1349040481 ref 1 fl Rpc:RXN/0/ffffffff rc 0/-1 Lustre: 4171:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 85 previous similar messages Lustre: lustre-OST0034-osc-MDT0000: Connection to lustre-OST0034 (at 192.168.127.64@o2ib1) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 6 previous similar messages Lustre: lustre-OST0034-osc-MDT0000: Connection restored to lustre-OST0034 (at 192.168.127.64@o2ib1) Lustre: Skipped 1 previous similar message Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST0034_UUID now active, resetting orphans Lustre: Skipped 19 previous similar messages LustreError: 4171:0:(osc_create.c:169:osc_interpret_create()) @@@ Unknown rc -107 from async create: failing oscc req@ffff88020629a000 x1414547145605102/t0(0) o5->lustre-OST0034-osc-MDT0000@192.168.127.64@o2ib1:7/4 lens 432/432 e 0 to 1 dl 1349040481 ref 1 fl Interpret:RXN/0/ffffffff rc -107/-1 LustreError: 7487:0:(lov_obd.c:1063:lov_clear_orphans()) error in orphan recovery on OST idx 52/60: rc = -5 LustreError: 7487:0:(mds_lov.c:883:__mds_lov_synchronize()) lustre-OST0034_UUID failed at mds_lov_clear_orphans: -5 LustreError: 7487:0:(mds_lov.c:903:__mds_lov_synchronize()) lustre-OST0034_UUID sync failed -5, deactivating |
| Comment by Alexandre Louvet [ 31/Jan/13 ] |
|
One of our system is showing this behaviour with a high reproduction rate : about 1 disconnection every 5 minutes for the past 2 days and I fail to find any evidence of high load on OSSes or MDS. The filesystem is made of 540 OSTs equally distributed across 36 OSSes. OSTs disconnected are not particularly located. Interconnect between nodes & servers doesn't report problems and there isn't any messages in the log than the one reported in the LU. The schema is always the same. One request did expire for slow reply after obd_timeout / 2 (ie 200s for me), then a reconnect fail during orphan recovery. If I try to force a recover it works like a charm, even if the forced recover is scheduled immediately after the deactivation. It is surprising as I can admit that something went wrong at a time, then that obd_timeout / 2 secondes later it was still wrong and failed the reconnection, but I failed to convince myself that I'm luky enough for the problem to disappear 1 second later when I schedule a forced recover. Am i wrong ? Site is running lustre 2.1.3 + patches. We have just upgraded the patches list last week to add Site is classified so exporting data is not that easy. What can I check to try to make progress ? Thanks, |
| Comment by Alexandre Louvet [ 08/Feb/13 ] |
|
While I wasn't able to find any evidence of OST being overloaded, and after having lost connection to over 400 OSTs, we did reboot everything. Now everything is back to normal. |
| Comment by Oleg Drokin [ 21/Feb/13 ] |
|
Well, I suspect extending the filesystem hints at perhaps you having a lot of mostly full (node-wise) filesystem and in order to satisfy precreates a lot of inode groups would need to be traversed which took a lot of time as they are not cached. Extension of the FS added some very empty inode groups where a single block read allows to satisfy entire precreate request which happens a lot faster. You can probably confirm this by looking at debugfs output and I suspect observe inode groups to be very sparsely populated with free inodes right to about the point of pre-extension. |
| Comment by Andreas Dilger [ 09/Jan/20 ] |
|
Close old bug |