[LU-10392] LustreError: 82980:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115 Created: 14/Dec/17 Updated: 26/Feb/18 Resolved: 09/Jan/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0 |
| Fix Version/s: | Lustre 2.11.0, Lustre 2.10.4 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Giuseppe Di Natale (Inactive) | Assignee: | Emoly Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | llnl | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
I've set up a lustre 2.10 filesystem for testing which has 3 MDS and 2 OSS servers. Each server has two targets each for a total of 6 MDTs and 4 OSTs. After formatting the new lustre filesystem (called lglossy below) and starting it, I see the following Lustre errors in console logs. [Wed Dec 13 12:51:27 2017] Lustre: srv-lglossy-MDT0002: No data found on store. Initialize space [Wed Dec 13 12:51:27 2017] Lustre: lglossy-MDT0002: new disk, initializing [Wed Dec 13 12:51:27 2017] Lustre: lglossy-MDT0002: Imperative Recovery not enabled, recovery window 300-900 [Wed Dec 13 12:51:27 2017] LustreError: 82500:0:(osd_oi.c:503:osd_oid()) lglossy-MDT0002-osd: unsupported quota oid: 0x16 [Wed Dec 13 12:51:27 2017] LustreError: 82980:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115 [Wed Dec 13 12:51:27 2017] LustreError: 82980:0:(fid_request.c:227:seq_client_alloc_seq()) cli-lglossy-MDT0002: Can't allocate new meta-sequence,rc -115 [Wed Dec 13 12:51:27 2017] LustreError: 82980:0:(fid_request.c:383:seq_client_alloc_fid()) cli-lglossy-MDT0002: Can't allocate new sequence: rc = -115 [Wed Dec 13 12:51:27 2017] LustreError: 82980:0:(lod_dev.c:419:lod_sub_recovery_thread()) lglossy-MDT0002-osd getting update log failed: rc = -115 [Wed Dec 13 12:51:28 2017] Lustre: lglossy-MDT0002: Connection restored to (at 134.9.38.12@tcp5) [Wed Dec 13 12:51:28 2017] Lustre: Skipped 5 previous similar messages [Wed Dec 13 12:51:28 2017] LustreError: 82647:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115 [Wed Dec 13 12:51:28 2017] Lustre: srv-lglossy-MDT0003: No data found on store. Initialize space [Wed Dec 13 12:51:28 2017] Lustre: lglossy-MDT0003: new disk, initializing [Wed Dec 13 12:51:28 2017] Lustre: lglossy-MDT0003: Imperative Recovery not enabled, recovery window 300-900 [Wed Dec 13 12:51:28 2017] LustreError: 83170:0:(osd_oi.c:503:osd_oid()) lglossy-MDT0003-osd: unsupported quota oid: 0x16 [Wed Dec 13 12:51:28 2017] LustreError: 83529:0:(fid_request.c:227:seq_client_alloc_seq()) cli-lglossy-MDT0003: Can't allocate new meta-sequence,rc -115 [Wed Dec 13 12:51:28 2017] LustreError: 83529:0:(fid_request.c:383:seq_client_alloc_fid()) cli-lglossy-MDT0003: Can't allocate new sequence: rc = -115 [Wed Dec 13 12:51:28 2017] LustreError: 83529:0:(lod_dev.c:419:lod_sub_recovery_thread()) lglossy-MDT0003-osd getting update log failed: rc = -115 [Wed Dec 13 12:51:32 2017] Lustre: lglossy-MDT0003: Connection restored to (at 134.9.38.12@tcp5) [Wed Dec 13 12:51:32 2017] LustreError: 82647:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115 [Wed Dec 13 12:51:32 2017] LustreError: 82647:0:(fid_handler.c:329:__seq_server_alloc_meta()) Skipped 2 previous similar messages [Wed Dec 13 12:51:32 2017] Lustre: Skipped 3 previous similar messages [Wed Dec 13 12:51:55 2017] Lustre: lglossy-MDT0002: Connection restored to lglossy-MDT0000-mdtlov_UUID (at 134.9.38.10@tcp5) [Wed Dec 13 12:51:55 2017] Lustre: Skipped 2 previous similar messages [Wed Dec 13 12:51:55 2017] LustreError: 82647:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115 [Wed Dec 13 12:51:55 2017] LustreError: 82647:0:(fid_handler.c:329:__seq_server_alloc_meta()) Skipped 4 previous similar messages [Wed Dec 13 12:51:56 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.22@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Dec 13 12:51:57 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.20@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Dec 13 12:51:59 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.18@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Dec 13 12:51:59 2017] LustreError: Skipped 4 previous similar messages [Wed Dec 13 12:52:21 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.22@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Dec 13 12:52:21 2017] LustreError: Skipped 1 previous similar message [Wed Dec 13 12:52:26 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.23@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Dec 13 12:52:26 2017] LustreError: Skipped 6 previous similar messages [Wed Dec 13 12:52:29 2017] Lustre: cli-ctl-lglossy-MDT0002: Allocated super-sequence [0x00000002c0000400-0x0000000300000400]:2:mdt] [Wed Dec 13 12:52:33 2017] Lustre: cli-ctl-lglossy-MDT0003: Allocated super-sequence [0x0000000340000400-0x0000000380000400]:3:mdt] [Wed Dec 13 12:52:46 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.22@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Dec 13 12:52:53 2017] Lustre: lglossy-MDT0002: Connection restored to lglossy-MDT0001-lwp-OST0000_UUID (at 134.9.38.14@tcp5) [Wed Dec 13 12:52:53 2017] Lustre: Skipped 5 previous similar messages [Wed Dec 13 12:58:37 2017] Lustre: lglossy-MDT0002: Connection restored to b6a69b56-0d6a-4087-5e21-742e78e50c81 (at 134.9.38.17@tcp5) [Wed Dec 13 12:58:37 2017] Lustre: Skipped 9 previous similar messages It's worth noting that the errors aren't fatal and the filesystem does come up. lglossy has had a test suite running against it since yesterday. I did notice esilverrock4: lglossy-MDT0004 esilverrock4: lglossy-MDT0005 esilverrock3: lglossy-MDT0002 esilverrock3: lglossy-MDT0003 esilverrock2: lglossy-MDT0000 esilverrock2: lglossy-MDT0001 Version information: Lustre: Build Version: 2.10.0_1.chaos SPL: Loaded module v0.7.2-1llnl ZFS: Loaded module v0.7.2-1llnl |
| Comments |
| Comment by Peter Jones [ 15/Dec/17 ] |
|
Giuseppe Are you really testing with 2.10.0 rather than 2.10.2? Are any patches applied here? If so, where is your branch accessible? Peter |
| Comment by Andreas Dilger [ 15/Dec/17 ] |
|
This looks like MDT0000 is not available when MDT0002 and MDT0003 are first mounted after reformat? The very first time an MDT is mounted it needs to contact the master sequence server (on MDT0000) to get a super sequence, which is good for a billion client mounts or 2^47 file creates, or combinations thereof. It looks like MDT0002 and MDT0003 have super sequences allocated eventually. |
| Comment by Giuseppe Di Natale (Inactive) [ 18/Dec/17 ] |
|
Andreas, thanks for the the explanation. While the super sequences do end up allocated, I have a problem with it being reported as an error. I feel like labelling it as an error is misleading and may cause someone to believe there is a problem, when there in fact isn't. |
| Comment by Olaf Faaland [ 18/Dec/17 ] |
|
Peter, Yes, the test really was performed with 2.10.0+3 patches, because it was expedient for Joe (Giuseppe) to get that going. He's rebasing now and will be testing with 2.10.2 in the near future. Our 2.10 build has only 3 patches beyond the 2.10.x we get from you, see below. Engineers need not consider our patches for this issue, but I'll figure out a better long term answer for future tickets re: 2.10. * 6e16bd8 (HEAD, tag: 2.10.0_1.chaos, czstash/2.10.0-llnl) LU-4009 osd-zfs: Add tunables to disable sync (DEBUG) | lustre/osd-zfs/osd_handler.c | 13 +++++++++++-- | lustre/osd-zfs/osd_object.c | 12 ++++++++++-- | 2 files changed, 21 insertions(+), 4 deletions(-) * 1e66cc7 LLNL build customizations | lustre.spec.in | 20 +++++++++++++------- | rpm/kmp-lustre-osd-zfs.preamble | 1 + | 2 files changed, 14 insertions(+), 7 deletions(-) * 7bbcefa Don't install lustre init script on systemd systems | lustre.spec.in | 4 +++- | lustre/conf/Makefile.am | 5 ++++- | lustre/scripts/Makefile.am | 4 ++-- | 3 files changed, 9 insertions(+), 4 deletions(-) * 58fd06e (tag: v2_10_0_0, tag: v2_10_0, tag: 2.10.0) New release 2.10.0 | lustre/ChangeLog | 2 +- | 1 file changed, 1 insertion(+), 1 deletion(-) |
| Comment by Peter Jones [ 19/Dec/17 ] |
|
Emoly Can you please help improve this behaviour as Andreas is about to lay out Peter |
| Comment by Peter Jones [ 19/Dec/17 ] |
|
Olaf Could you please open tickets to track the review/landing of the two commits that do not seem unique to LLNL 6e16bd8 Although the former is currently tracked under LU-4009 that reference is really for tracking a much broader effort (ZIL) Peter |
| Comment by Andreas Dilger [ 19/Dec/17 ] |
|
Emoly, it looks like these errors are all -EINPROGRESS = -115, which means that the MDTs are waiting for MDT0000 to start the master sequence server and be grated a meta sequence for the first time. Rather than printing this via CERROR(), it would be better to print it something like the following in __seq_service_alloc_meta(): if (rc == -EINPROGRESS) {
static int printed;
if (printed++ % 8 == 0)
LCONSOLE_INFO("%s: waiting to contact MDT0000 to allocate new meta-sequence\n", ...);
} else {
CERROR("%s: cannot allocate new meta-sequence: rc = %d\n", ...);
}
That prints the initial message in a way that is clear what server is involved and that this is not a problem, but doesn't print too many messages. Also, in seq_client_alloc_seq() please fix the CERROR() format to use the proper ": rc = %d\n" format, and no message should be printed if rc == -EINPROGRESS, since that is already handled in __seq_server_alloc_meta(). |
| Comment by Gerrit Updater [ 21/Dec/17 ] |
|
Emoly Liu (emoly.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/30623 |
| Comment by Gerrit Updater [ 09/Jan/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30623/ |
| Comment by Minh Diep [ 09/Jan/18 ] |
|
Landed for 2.11 |
| Comment by Gerrit Updater [ 09/Jan/18 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30813 |
| Comment by Gerrit Updater [ 26/Feb/18 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30813/ |