Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10392

LustreError: 82980:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115

Details

    • 3
    • 9223372036854775807

    Description

      I've set up a lustre 2.10 filesystem for testing which has 3 MDS and 2 OSS servers. Each server has two targets each for a total of 6 MDTs and 4 OSTs. After formatting the new lustre filesystem (called lglossy below) and starting it, I see the following Lustre errors in console logs.

      [Wed Dec 13 12:51:27 2017] Lustre: srv-lglossy-MDT0002: No data found on store. Initialize space
      [Wed Dec 13 12:51:27 2017] Lustre: lglossy-MDT0002: new disk, initializing
      [Wed Dec 13 12:51:27 2017] Lustre: lglossy-MDT0002: Imperative Recovery not enabled, recovery window 300-900
      [Wed Dec 13 12:51:27 2017] LustreError: 82500:0:(osd_oi.c:503:osd_oid()) lglossy-MDT0002-osd: unsupported quota oid: 0x16
      [Wed Dec 13 12:51:27 2017] LustreError: 82980:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115
      [Wed Dec 13 12:51:27 2017] LustreError: 82980:0:(fid_request.c:227:seq_client_alloc_seq()) cli-lglossy-MDT0002: Can't allocate new meta-sequence,rc -115
      [Wed Dec 13 12:51:27 2017] LustreError: 82980:0:(fid_request.c:383:seq_client_alloc_fid()) cli-lglossy-MDT0002: Can't allocate new sequence: rc = -115
      [Wed Dec 13 12:51:27 2017] LustreError: 82980:0:(lod_dev.c:419:lod_sub_recovery_thread()) lglossy-MDT0002-osd getting update log failed: rc = -115
      [Wed Dec 13 12:51:28 2017] Lustre: lglossy-MDT0002: Connection restored to  (at 134.9.38.12@tcp5)
      [Wed Dec 13 12:51:28 2017] Lustre: Skipped 5 previous similar messages
      [Wed Dec 13 12:51:28 2017] LustreError: 82647:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115
      [Wed Dec 13 12:51:28 2017] Lustre: srv-lglossy-MDT0003: No data found on store. Initialize space
      [Wed Dec 13 12:51:28 2017] Lustre: lglossy-MDT0003: new disk, initializing
      [Wed Dec 13 12:51:28 2017] Lustre: lglossy-MDT0003: Imperative Recovery not enabled, recovery window 300-900
      [Wed Dec 13 12:51:28 2017] LustreError: 83170:0:(osd_oi.c:503:osd_oid()) lglossy-MDT0003-osd: unsupported quota oid: 0x16
      [Wed Dec 13 12:51:28 2017] LustreError: 83529:0:(fid_request.c:227:seq_client_alloc_seq()) cli-lglossy-MDT0003: Can't allocate new meta-sequence,rc -115
      [Wed Dec 13 12:51:28 2017] LustreError: 83529:0:(fid_request.c:383:seq_client_alloc_fid()) cli-lglossy-MDT0003: Can't allocate new sequence: rc = -115
      [Wed Dec 13 12:51:28 2017] LustreError: 83529:0:(lod_dev.c:419:lod_sub_recovery_thread()) lglossy-MDT0003-osd getting update log failed: rc = -115
      [Wed Dec 13 12:51:32 2017] Lustre: lglossy-MDT0003: Connection restored to  (at 134.9.38.12@tcp5)
      [Wed Dec 13 12:51:32 2017] LustreError: 82647:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115
      [Wed Dec 13 12:51:32 2017] LustreError: 82647:0:(fid_handler.c:329:__seq_server_alloc_meta()) Skipped 2 previous similar messages
      [Wed Dec 13 12:51:32 2017] Lustre: Skipped 3 previous similar messages
      [Wed Dec 13 12:51:55 2017] Lustre: lglossy-MDT0002: Connection restored to lglossy-MDT0000-mdtlov_UUID (at 134.9.38.10@tcp5)
      [Wed Dec 13 12:51:55 2017] Lustre: Skipped 2 previous similar messages
      [Wed Dec 13 12:51:55 2017] LustreError: 82647:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115
      [Wed Dec 13 12:51:55 2017] LustreError: 82647:0:(fid_handler.c:329:__seq_server_alloc_meta()) Skipped 4 previous similar messages
      [Wed Dec 13 12:51:56 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.22@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Dec 13 12:51:57 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.20@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Dec 13 12:51:59 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.18@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Dec 13 12:51:59 2017] LustreError: Skipped 4 previous similar messages
      [Wed Dec 13 12:52:21 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.22@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Dec 13 12:52:21 2017] LustreError: Skipped 1 previous similar message
      [Wed Dec 13 12:52:26 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.23@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Dec 13 12:52:26 2017] LustreError: Skipped 6 previous similar messages
      [Wed Dec 13 12:52:29 2017] Lustre: cli-ctl-lglossy-MDT0002: Allocated super-sequence [0x00000002c0000400-0x0000000300000400]:2:mdt]
      [Wed Dec 13 12:52:33 2017] Lustre: cli-ctl-lglossy-MDT0003: Allocated super-sequence [0x0000000340000400-0x0000000380000400]:3:mdt]
      [Wed Dec 13 12:52:46 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.22@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Dec 13 12:52:53 2017] Lustre: lglossy-MDT0002: Connection restored to lglossy-MDT0001-lwp-OST0000_UUID (at 134.9.38.14@tcp5)
      [Wed Dec 13 12:52:53 2017] Lustre: Skipped 5 previous similar messages
      [Wed Dec 13 12:58:37 2017] Lustre: lglossy-MDT0002: Connection restored to b6a69b56-0d6a-4087-5e21-742e78e50c81 (at 134.9.38.17@tcp5)
      [Wed Dec 13 12:58:37 2017] Lustre: Skipped 9 previous similar messages
      
      

      It's worth noting that the errors aren't fatal and the filesystem does come up. lglossy has had a test suite running against it since yesterday.

      I did notice LU-9408 was marked resolved and the resolution appears to have been related to an incorrect index during filesystem creation time. I am experiencing this even with MDTs that are correctly indexed as is evident below:

      esilverrock4: lglossy-MDT0004
      esilverrock4: lglossy-MDT0005
      esilverrock3: lglossy-MDT0002
      esilverrock3: lglossy-MDT0003
      esilverrock2: lglossy-MDT0000
      esilverrock2: lglossy-MDT0001
      

      Version information:

      Lustre: Build Version: 2.10.0_1.chaos
      SPL: Loaded module v0.7.2-1llnl
      ZFS: Loaded module v0.7.2-1llnl
      

      Attachments

        Issue Links

          Activity

            [LU-10392] LustreError: 82980:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30813
            Subject: LU-10392 fid: improve seq allocation error messages
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 9076890294c9f5cab3294f885f2a7c0fc775663c

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30813 Subject: LU-10392 fid: improve seq allocation error messages Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 9076890294c9f5cab3294f885f2a7c0fc775663c
            mdiep Minh Diep added a comment -

            Landed for 2.11

            mdiep Minh Diep added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30623/
            Subject: LU-10392 fid: improve seq allocation error messages
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 6f86519b3483b4cc754b42bddc98617de14cef2b

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30623/ Subject: LU-10392 fid: improve seq allocation error messages Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6f86519b3483b4cc754b42bddc98617de14cef2b

            Emoly Liu (emoly.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/30623
            Subject: LU-10392 fid: improve seq allocation error messages
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 1978d0b19d84d5998972ea4592eb6be5b0edb724

            gerrit Gerrit Updater added a comment - Emoly Liu (emoly.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/30623 Subject: LU-10392 fid: improve seq allocation error messages Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 1978d0b19d84d5998972ea4592eb6be5b0edb724

            Emoly, it looks like these errors are all -EINPROGRESS = -115, which means that the MDTs are waiting for MDT0000 to start the master sequence server and be grated a meta sequence for the first time. Rather than printing this via CERROR(), it would be better to print it something like the following in __seq_service_alloc_meta():

                    if (rc == -EINPROGRESS) {
                            static int printed;
            
                            if (printed++ % 8 == 0)
                                     LCONSOLE_INFO("%s: waiting to contact MDT0000 to allocate new meta-sequence\n", ...);
                    } else {
                            CERROR("%s: cannot allocate new meta-sequence: rc = %d\n", ...);
                    }
            

            That prints the initial message in a way that is clear what server is involved and that this is not a problem, but doesn't print too many messages. Also, in seq_client_alloc_seq() please fix the CERROR() format to use the proper ": rc = %d\n" format, and no message should be printed if rc == -EINPROGRESS, since that is already handled in __seq_server_alloc_meta().

            adilger Andreas Dilger added a comment - Emoly, it looks like these errors are all -EINPROGRESS = -115 , which means that the MDTs are waiting for MDT0000 to start the master sequence server and be grated a meta sequence for the first time. Rather than printing this via CERROR(), it would be better to print it something like the following in __seq_service_alloc_meta() : if (rc == -EINPROGRESS) { static int printed; if (printed++ % 8 == 0) LCONSOLE_INFO("%s: waiting to contact MDT0000 to allocate new meta-sequence\n", ...); } else { CERROR("%s: cannot allocate new meta-sequence: rc = %d\n", ...); } That prints the initial message in a way that is clear what server is involved and that this is not a problem, but doesn't print too many messages. Also, in seq_client_alloc_seq() please fix the CERROR() format to use the proper ": rc = %d\n" format, and no message should be printed if rc == -EINPROGRESS, since that is already handled in __seq_server_alloc_meta() .
            pjones Peter Jones added a comment -

            Olaf

            Could you please open tickets to track the review/landing of the two commits that do not seem unique to LLNL

            6e16bd8
            7bbcefa

            Although the former is currently tracked under LU-4009 that reference is really for tracking a much broader effort (ZIL)
            Thanks

            Peter

            pjones Peter Jones added a comment - Olaf Could you please open tickets to track the review/landing of the two commits that do not seem unique to LLNL 6e16bd8 7bbcefa Although the former is currently tracked under LU-4009 that reference is really for tracking a much broader effort (ZIL) Thanks Peter
            pjones Peter Jones added a comment -

            Emoly

            Can you please help improve this behaviour as Andreas is about to lay out

            Peter

            pjones Peter Jones added a comment - Emoly Can you please help improve this behaviour as Andreas is about to lay out Peter
            ofaaland Olaf Faaland added a comment - - edited

            Peter,

            Yes, the test really was performed with 2.10.0+3 patches, because it was expedient for Joe (Giuseppe) to get that going. He's rebasing now and will be testing with 2.10.2 in the near future.

            Our 2.10 build has only 3 patches beyond the 2.10.x we get from you, see below. Engineers need not consider our patches for this issue, but I'll figure out a better long term answer for future tickets re: 2.10.

            * 6e16bd8 (HEAD, tag: 2.10.0_1.chaos, czstash/2.10.0-llnl) LU-4009 osd-zfs: Add tunables to disable sync (DEBUG)
            |  lustre/osd-zfs/osd_handler.c | 13 +++++++++++--
            |  lustre/osd-zfs/osd_object.c  | 12 ++++++++++--
            |  2 files changed, 21 insertions(+), 4 deletions(-)
            * 1e66cc7 LLNL build customizations
            |  lustre.spec.in                  | 20 +++++++++++++-------
            |  rpm/kmp-lustre-osd-zfs.preamble |  1 +
            |  2 files changed, 14 insertions(+), 7 deletions(-)
            * 7bbcefa Don't install lustre init script on systemd systems
            |  lustre.spec.in             | 4 +++-
            |  lustre/conf/Makefile.am    | 5 ++++-
            |  lustre/scripts/Makefile.am | 4 ++--
            |  3 files changed, 9 insertions(+), 4 deletions(-)
            * 58fd06e (tag: v2_10_0_0, tag: v2_10_0, tag: 2.10.0) New release 2.10.0
            |  lustre/ChangeLog | 2 +-
            |  1 file changed, 1 insertion(+), 1 deletion(-)
            
            
            
            ofaaland Olaf Faaland added a comment - - edited Peter, Yes, the test really was performed with 2.10.0+3 patches, because it was expedient for Joe (Giuseppe) to get that going. He's rebasing now and will be testing with 2.10.2 in the near future. Our 2.10 build has only 3 patches beyond the 2.10.x we get from you, see below. Engineers need not consider our patches for this issue, but I'll figure out a better long term answer for future tickets re: 2.10. * 6e16bd8 (HEAD, tag: 2.10.0_1.chaos, czstash/2.10.0-llnl) LU-4009 osd-zfs: Add tunables to disable sync (DEBUG) | lustre/osd-zfs/osd_handler.c | 13 +++++++++++-- | lustre/osd-zfs/osd_object.c | 12 ++++++++++-- | 2 files changed, 21 insertions(+), 4 deletions(-) * 1e66cc7 LLNL build customizations | lustre.spec.in | 20 +++++++++++++------- | rpm/kmp-lustre-osd-zfs.preamble | 1 + | 2 files changed, 14 insertions(+), 7 deletions(-) * 7bbcefa Don't install lustre init script on systemd systems | lustre.spec.in | 4 +++- | lustre/conf/Makefile.am | 5 ++++- | lustre/scripts/Makefile.am | 4 ++-- | 3 files changed, 9 insertions(+), 4 deletions(-) * 58fd06e (tag: v2_10_0_0, tag: v2_10_0, tag: 2.10.0) New release 2.10.0 | lustre/ChangeLog | 2 +- | 1 file changed, 1 insertion(+), 1 deletion(-)

            Andreas, thanks for the the explanation. While the super sequences do end up allocated, I have a problem with it being reported as an error. I feel like labelling it as an error is misleading and may cause someone to believe there is a problem, when there in fact isn't.

            dinatale2 Giuseppe Di Natale (Inactive) added a comment - Andreas, thanks for the the explanation. While the super sequences do end up allocated, I have a problem with it being reported as an error. I feel like labelling it as an error is misleading and may cause someone to believe there is a problem, when there in fact isn't.

            This looks like MDT0000 is not available when MDT0002 and MDT0003 are first mounted after reformat? The very first time an MDT is mounted it needs to contact the master sequence server (on MDT0000) to get a super sequence, which is good for a billion client mounts or 2^47 file creates, or combinations thereof. It looks like MDT0002 and MDT0003 have super sequences allocated eventually.

            adilger Andreas Dilger added a comment - This looks like MDT0000 is not available when MDT0002 and MDT0003 are first mounted after reformat? The very first time an MDT is mounted it needs to contact the master sequence server (on MDT0000) to get a super sequence, which is good for a billion client mounts or 2^47 file creates, or combinations thereof. It looks like MDT0002 and MDT0003 have super sequences allocated eventually.
            pjones Peter Jones added a comment -

            Giuseppe

            Are you really testing with 2.10.0 rather than 2.10.2? Are any patches applied here? If so, where is your branch accessible?

            Peter

            pjones Peter Jones added a comment - Giuseppe Are you really testing with 2.10.0 rather than 2.10.2? Are any patches applied here? If so, where is your branch accessible? Peter

            People

              emoly.liu Emoly Liu
              dinatale2 Giuseppe Di Natale (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: