[LU-10392] LustreError: 82980:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115 Created: 14/Dec/17  Updated: 26/Feb/18  Resolved: 09/Jan/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: Lustre 2.11.0, Lustre 2.10.4

Type: Bug Priority: Minor
Reporter: Giuseppe Di Natale (Inactive) Assignee: Emoly Liu
Resolution: Fixed Votes: 0
Labels: llnl

Issue Links:
Duplicate
Related
is related to LU-4009 Add ZIL support to osd-zfs Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

I've set up a lustre 2.10 filesystem for testing which has 3 MDS and 2 OSS servers. Each server has two targets each for a total of 6 MDTs and 4 OSTs. After formatting the new lustre filesystem (called lglossy below) and starting it, I see the following Lustre errors in console logs.

[Wed Dec 13 12:51:27 2017] Lustre: srv-lglossy-MDT0002: No data found on store. Initialize space
[Wed Dec 13 12:51:27 2017] Lustre: lglossy-MDT0002: new disk, initializing
[Wed Dec 13 12:51:27 2017] Lustre: lglossy-MDT0002: Imperative Recovery not enabled, recovery window 300-900
[Wed Dec 13 12:51:27 2017] LustreError: 82500:0:(osd_oi.c:503:osd_oid()) lglossy-MDT0002-osd: unsupported quota oid: 0x16
[Wed Dec 13 12:51:27 2017] LustreError: 82980:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115
[Wed Dec 13 12:51:27 2017] LustreError: 82980:0:(fid_request.c:227:seq_client_alloc_seq()) cli-lglossy-MDT0002: Can't allocate new meta-sequence,rc -115
[Wed Dec 13 12:51:27 2017] LustreError: 82980:0:(fid_request.c:383:seq_client_alloc_fid()) cli-lglossy-MDT0002: Can't allocate new sequence: rc = -115
[Wed Dec 13 12:51:27 2017] LustreError: 82980:0:(lod_dev.c:419:lod_sub_recovery_thread()) lglossy-MDT0002-osd getting update log failed: rc = -115
[Wed Dec 13 12:51:28 2017] Lustre: lglossy-MDT0002: Connection restored to  (at 134.9.38.12@tcp5)
[Wed Dec 13 12:51:28 2017] Lustre: Skipped 5 previous similar messages
[Wed Dec 13 12:51:28 2017] LustreError: 82647:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115
[Wed Dec 13 12:51:28 2017] Lustre: srv-lglossy-MDT0003: No data found on store. Initialize space
[Wed Dec 13 12:51:28 2017] Lustre: lglossy-MDT0003: new disk, initializing
[Wed Dec 13 12:51:28 2017] Lustre: lglossy-MDT0003: Imperative Recovery not enabled, recovery window 300-900
[Wed Dec 13 12:51:28 2017] LustreError: 83170:0:(osd_oi.c:503:osd_oid()) lglossy-MDT0003-osd: unsupported quota oid: 0x16
[Wed Dec 13 12:51:28 2017] LustreError: 83529:0:(fid_request.c:227:seq_client_alloc_seq()) cli-lglossy-MDT0003: Can't allocate new meta-sequence,rc -115
[Wed Dec 13 12:51:28 2017] LustreError: 83529:0:(fid_request.c:383:seq_client_alloc_fid()) cli-lglossy-MDT0003: Can't allocate new sequence: rc = -115
[Wed Dec 13 12:51:28 2017] LustreError: 83529:0:(lod_dev.c:419:lod_sub_recovery_thread()) lglossy-MDT0003-osd getting update log failed: rc = -115
[Wed Dec 13 12:51:32 2017] Lustre: lglossy-MDT0003: Connection restored to  (at 134.9.38.12@tcp5)
[Wed Dec 13 12:51:32 2017] LustreError: 82647:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115
[Wed Dec 13 12:51:32 2017] LustreError: 82647:0:(fid_handler.c:329:__seq_server_alloc_meta()) Skipped 2 previous similar messages
[Wed Dec 13 12:51:32 2017] Lustre: Skipped 3 previous similar messages
[Wed Dec 13 12:51:55 2017] Lustre: lglossy-MDT0002: Connection restored to lglossy-MDT0000-mdtlov_UUID (at 134.9.38.10@tcp5)
[Wed Dec 13 12:51:55 2017] Lustre: Skipped 2 previous similar messages
[Wed Dec 13 12:51:55 2017] LustreError: 82647:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115
[Wed Dec 13 12:51:55 2017] LustreError: 82647:0:(fid_handler.c:329:__seq_server_alloc_meta()) Skipped 4 previous similar messages
[Wed Dec 13 12:51:56 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.22@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server.
[Wed Dec 13 12:51:57 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.20@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server.
[Wed Dec 13 12:51:59 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.18@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server.
[Wed Dec 13 12:51:59 2017] LustreError: Skipped 4 previous similar messages
[Wed Dec 13 12:52:21 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.22@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server.
[Wed Dec 13 12:52:21 2017] LustreError: Skipped 1 previous similar message
[Wed Dec 13 12:52:26 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.23@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server.
[Wed Dec 13 12:52:26 2017] LustreError: Skipped 6 previous similar messages
[Wed Dec 13 12:52:29 2017] Lustre: cli-ctl-lglossy-MDT0002: Allocated super-sequence [0x00000002c0000400-0x0000000300000400]:2:mdt]
[Wed Dec 13 12:52:33 2017] Lustre: cli-ctl-lglossy-MDT0003: Allocated super-sequence [0x0000000340000400-0x0000000380000400]:3:mdt]
[Wed Dec 13 12:52:46 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.22@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server.
[Wed Dec 13 12:52:53 2017] Lustre: lglossy-MDT0002: Connection restored to lglossy-MDT0001-lwp-OST0000_UUID (at 134.9.38.14@tcp5)
[Wed Dec 13 12:52:53 2017] Lustre: Skipped 5 previous similar messages
[Wed Dec 13 12:58:37 2017] Lustre: lglossy-MDT0002: Connection restored to b6a69b56-0d6a-4087-5e21-742e78e50c81 (at 134.9.38.17@tcp5)
[Wed Dec 13 12:58:37 2017] Lustre: Skipped 9 previous similar messages

It's worth noting that the errors aren't fatal and the filesystem does come up. lglossy has had a test suite running against it since yesterday.

I did notice LU-9408 was marked resolved and the resolution appears to have been related to an incorrect index during filesystem creation time. I am experiencing this even with MDTs that are correctly indexed as is evident below:

esilverrock4: lglossy-MDT0004
esilverrock4: lglossy-MDT0005
esilverrock3: lglossy-MDT0002
esilverrock3: lglossy-MDT0003
esilverrock2: lglossy-MDT0000
esilverrock2: lglossy-MDT0001

Version information:

Lustre: Build Version: 2.10.0_1.chaos
SPL: Loaded module v0.7.2-1llnl
ZFS: Loaded module v0.7.2-1llnl


 Comments   
Comment by Peter Jones [ 15/Dec/17 ]

Giuseppe

Are you really testing with 2.10.0 rather than 2.10.2? Are any patches applied here? If so, where is your branch accessible?

Peter

Comment by Andreas Dilger [ 15/Dec/17 ]

This looks like MDT0000 is not available when MDT0002 and MDT0003 are first mounted after reformat? The very first time an MDT is mounted it needs to contact the master sequence server (on MDT0000) to get a super sequence, which is good for a billion client mounts or 2^47 file creates, or combinations thereof. It looks like MDT0002 and MDT0003 have super sequences allocated eventually.

Comment by Giuseppe Di Natale (Inactive) [ 18/Dec/17 ]

Andreas, thanks for the the explanation. While the super sequences do end up allocated, I have a problem with it being reported as an error. I feel like labelling it as an error is misleading and may cause someone to believe there is a problem, when there in fact isn't.

Comment by Olaf Faaland [ 18/Dec/17 ]

Peter,

Yes, the test really was performed with 2.10.0+3 patches, because it was expedient for Joe (Giuseppe) to get that going. He's rebasing now and will be testing with 2.10.2 in the near future.

Our 2.10 build has only 3 patches beyond the 2.10.x we get from you, see below. Engineers need not consider our patches for this issue, but I'll figure out a better long term answer for future tickets re: 2.10.

* 6e16bd8 (HEAD, tag: 2.10.0_1.chaos, czstash/2.10.0-llnl) LU-4009 osd-zfs: Add tunables to disable sync (DEBUG)
|  lustre/osd-zfs/osd_handler.c | 13 +++++++++++--
|  lustre/osd-zfs/osd_object.c  | 12 ++++++++++--
|  2 files changed, 21 insertions(+), 4 deletions(-)
* 1e66cc7 LLNL build customizations
|  lustre.spec.in                  | 20 +++++++++++++-------
|  rpm/kmp-lustre-osd-zfs.preamble |  1 +
|  2 files changed, 14 insertions(+), 7 deletions(-)
* 7bbcefa Don't install lustre init script on systemd systems
|  lustre.spec.in             | 4 +++-
|  lustre/conf/Makefile.am    | 5 ++++-
|  lustre/scripts/Makefile.am | 4 ++--
|  3 files changed, 9 insertions(+), 4 deletions(-)
* 58fd06e (tag: v2_10_0_0, tag: v2_10_0, tag: 2.10.0) New release 2.10.0
|  lustre/ChangeLog | 2 +-
|  1 file changed, 1 insertion(+), 1 deletion(-)


Comment by Peter Jones [ 19/Dec/17 ]

Emoly

Can you please help improve this behaviour as Andreas is about to lay out

Peter

Comment by Peter Jones [ 19/Dec/17 ]

Olaf

Could you please open tickets to track the review/landing of the two commits that do not seem unique to LLNL

6e16bd8
7bbcefa

Although the former is currently tracked under LU-4009 that reference is really for tracking a much broader effort (ZIL)
Thanks

Peter

Comment by Andreas Dilger [ 19/Dec/17 ]

Emoly, it looks like these errors are all -EINPROGRESS = -115, which means that the MDTs are waiting for MDT0000 to start the master sequence server and be grated a meta sequence for the first time. Rather than printing this via CERROR(), it would be better to print it something like the following in __seq_service_alloc_meta():

        if (rc == -EINPROGRESS) {
                static int printed;

                if (printed++ % 8 == 0)
                         LCONSOLE_INFO("%s: waiting to contact MDT0000 to allocate new meta-sequence\n", ...);
        } else {
                CERROR("%s: cannot allocate new meta-sequence: rc = %d\n", ...);
        }

That prints the initial message in a way that is clear what server is involved and that this is not a problem, but doesn't print too many messages. Also, in seq_client_alloc_seq() please fix the CERROR() format to use the proper ": rc = %d\n" format, and no message should be printed if rc == -EINPROGRESS, since that is already handled in __seq_server_alloc_meta().

Comment by Gerrit Updater [ 21/Dec/17 ]

Emoly Liu (emoly.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/30623
Subject: LU-10392 fid: improve seq allocation error messages
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1978d0b19d84d5998972ea4592eb6be5b0edb724

Comment by Gerrit Updater [ 09/Jan/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30623/
Subject: LU-10392 fid: improve seq allocation error messages
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6f86519b3483b4cc754b42bddc98617de14cef2b

Comment by Minh Diep [ 09/Jan/18 ]

Landed for 2.11

Comment by Gerrit Updater [ 09/Jan/18 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30813
Subject: LU-10392 fid: improve seq allocation error messages
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 9076890294c9f5cab3294f885f2a7c0fc775663c

Comment by Gerrit Updater [ 26/Feb/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30813/
Subject: LU-10392 fid: improve seq allocation error messages
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: bb23c0343473376cd103de869e8a39ca1abf0f0a

Generated at Sat Feb 10 02:34:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.