Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10392

LustreError: 82980:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115

Details

    • 3
    • 9223372036854775807

    Description

      I've set up a lustre 2.10 filesystem for testing which has 3 MDS and 2 OSS servers. Each server has two targets each for a total of 6 MDTs and 4 OSTs. After formatting the new lustre filesystem (called lglossy below) and starting it, I see the following Lustre errors in console logs.

      [Wed Dec 13 12:51:27 2017] Lustre: srv-lglossy-MDT0002: No data found on store. Initialize space
      [Wed Dec 13 12:51:27 2017] Lustre: lglossy-MDT0002: new disk, initializing
      [Wed Dec 13 12:51:27 2017] Lustre: lglossy-MDT0002: Imperative Recovery not enabled, recovery window 300-900
      [Wed Dec 13 12:51:27 2017] LustreError: 82500:0:(osd_oi.c:503:osd_oid()) lglossy-MDT0002-osd: unsupported quota oid: 0x16
      [Wed Dec 13 12:51:27 2017] LustreError: 82980:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115
      [Wed Dec 13 12:51:27 2017] LustreError: 82980:0:(fid_request.c:227:seq_client_alloc_seq()) cli-lglossy-MDT0002: Can't allocate new meta-sequence,rc -115
      [Wed Dec 13 12:51:27 2017] LustreError: 82980:0:(fid_request.c:383:seq_client_alloc_fid()) cli-lglossy-MDT0002: Can't allocate new sequence: rc = -115
      [Wed Dec 13 12:51:27 2017] LustreError: 82980:0:(lod_dev.c:419:lod_sub_recovery_thread()) lglossy-MDT0002-osd getting update log failed: rc = -115
      [Wed Dec 13 12:51:28 2017] Lustre: lglossy-MDT0002: Connection restored to  (at 134.9.38.12@tcp5)
      [Wed Dec 13 12:51:28 2017] Lustre: Skipped 5 previous similar messages
      [Wed Dec 13 12:51:28 2017] LustreError: 82647:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115
      [Wed Dec 13 12:51:28 2017] Lustre: srv-lglossy-MDT0003: No data found on store. Initialize space
      [Wed Dec 13 12:51:28 2017] Lustre: lglossy-MDT0003: new disk, initializing
      [Wed Dec 13 12:51:28 2017] Lustre: lglossy-MDT0003: Imperative Recovery not enabled, recovery window 300-900
      [Wed Dec 13 12:51:28 2017] LustreError: 83170:0:(osd_oi.c:503:osd_oid()) lglossy-MDT0003-osd: unsupported quota oid: 0x16
      [Wed Dec 13 12:51:28 2017] LustreError: 83529:0:(fid_request.c:227:seq_client_alloc_seq()) cli-lglossy-MDT0003: Can't allocate new meta-sequence,rc -115
      [Wed Dec 13 12:51:28 2017] LustreError: 83529:0:(fid_request.c:383:seq_client_alloc_fid()) cli-lglossy-MDT0003: Can't allocate new sequence: rc = -115
      [Wed Dec 13 12:51:28 2017] LustreError: 83529:0:(lod_dev.c:419:lod_sub_recovery_thread()) lglossy-MDT0003-osd getting update log failed: rc = -115
      [Wed Dec 13 12:51:32 2017] Lustre: lglossy-MDT0003: Connection restored to  (at 134.9.38.12@tcp5)
      [Wed Dec 13 12:51:32 2017] LustreError: 82647:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115
      [Wed Dec 13 12:51:32 2017] LustreError: 82647:0:(fid_handler.c:329:__seq_server_alloc_meta()) Skipped 2 previous similar messages
      [Wed Dec 13 12:51:32 2017] Lustre: Skipped 3 previous similar messages
      [Wed Dec 13 12:51:55 2017] Lustre: lglossy-MDT0002: Connection restored to lglossy-MDT0000-mdtlov_UUID (at 134.9.38.10@tcp5)
      [Wed Dec 13 12:51:55 2017] Lustre: Skipped 2 previous similar messages
      [Wed Dec 13 12:51:55 2017] LustreError: 82647:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115
      [Wed Dec 13 12:51:55 2017] LustreError: 82647:0:(fid_handler.c:329:__seq_server_alloc_meta()) Skipped 4 previous similar messages
      [Wed Dec 13 12:51:56 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.22@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Dec 13 12:51:57 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.20@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Dec 13 12:51:59 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.18@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Dec 13 12:51:59 2017] LustreError: Skipped 4 previous similar messages
      [Wed Dec 13 12:52:21 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.22@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Dec 13 12:52:21 2017] LustreError: Skipped 1 previous similar message
      [Wed Dec 13 12:52:26 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.23@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Dec 13 12:52:26 2017] LustreError: Skipped 6 previous similar messages
      [Wed Dec 13 12:52:29 2017] Lustre: cli-ctl-lglossy-MDT0002: Allocated super-sequence [0x00000002c0000400-0x0000000300000400]:2:mdt]
      [Wed Dec 13 12:52:33 2017] Lustre: cli-ctl-lglossy-MDT0003: Allocated super-sequence [0x0000000340000400-0x0000000380000400]:3:mdt]
      [Wed Dec 13 12:52:46 2017] LustreError: 137-5: lglossy-MDT0001_UUID: not available for connect from 134.9.38.22@tcp5 (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Dec 13 12:52:53 2017] Lustre: lglossy-MDT0002: Connection restored to lglossy-MDT0001-lwp-OST0000_UUID (at 134.9.38.14@tcp5)
      [Wed Dec 13 12:52:53 2017] Lustre: Skipped 5 previous similar messages
      [Wed Dec 13 12:58:37 2017] Lustre: lglossy-MDT0002: Connection restored to b6a69b56-0d6a-4087-5e21-742e78e50c81 (at 134.9.38.17@tcp5)
      [Wed Dec 13 12:58:37 2017] Lustre: Skipped 9 previous similar messages
      
      

      It's worth noting that the errors aren't fatal and the filesystem does come up. lglossy has had a test suite running against it since yesterday.

      I did notice LU-9408 was marked resolved and the resolution appears to have been related to an incorrect index during filesystem creation time. I am experiencing this even with MDTs that are correctly indexed as is evident below:

      esilverrock4: lglossy-MDT0004
      esilverrock4: lglossy-MDT0005
      esilverrock3: lglossy-MDT0002
      esilverrock3: lglossy-MDT0003
      esilverrock2: lglossy-MDT0000
      esilverrock2: lglossy-MDT0001
      

      Version information:

      Lustre: Build Version: 2.10.0_1.chaos
      SPL: Loaded module v0.7.2-1llnl
      ZFS: Loaded module v0.7.2-1llnl
      

      Attachments

        Issue Links

          Activity

            [LU-10392] LustreError: 82980:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30813/
            Subject: LU-10392 fid: improve seq allocation error messages
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: bb23c0343473376cd103de869e8a39ca1abf0f0a

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30813/ Subject: LU-10392 fid: improve seq allocation error messages Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: bb23c0343473376cd103de869e8a39ca1abf0f0a

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30813
            Subject: LU-10392 fid: improve seq allocation error messages
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 9076890294c9f5cab3294f885f2a7c0fc775663c

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30813 Subject: LU-10392 fid: improve seq allocation error messages Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 9076890294c9f5cab3294f885f2a7c0fc775663c
            mdiep Minh Diep added a comment -

            Landed for 2.11

            mdiep Minh Diep added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30623/
            Subject: LU-10392 fid: improve seq allocation error messages
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 6f86519b3483b4cc754b42bddc98617de14cef2b

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30623/ Subject: LU-10392 fid: improve seq allocation error messages Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6f86519b3483b4cc754b42bddc98617de14cef2b

            Emoly Liu (emoly.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/30623
            Subject: LU-10392 fid: improve seq allocation error messages
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 1978d0b19d84d5998972ea4592eb6be5b0edb724

            gerrit Gerrit Updater added a comment - Emoly Liu (emoly.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/30623 Subject: LU-10392 fid: improve seq allocation error messages Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 1978d0b19d84d5998972ea4592eb6be5b0edb724

            Emoly, it looks like these errors are all -EINPROGRESS = -115, which means that the MDTs are waiting for MDT0000 to start the master sequence server and be grated a meta sequence for the first time. Rather than printing this via CERROR(), it would be better to print it something like the following in __seq_service_alloc_meta():

                    if (rc == -EINPROGRESS) {
                            static int printed;
            
                            if (printed++ % 8 == 0)
                                     LCONSOLE_INFO("%s: waiting to contact MDT0000 to allocate new meta-sequence\n", ...);
                    } else {
                            CERROR("%s: cannot allocate new meta-sequence: rc = %d\n", ...);
                    }
            

            That prints the initial message in a way that is clear what server is involved and that this is not a problem, but doesn't print too many messages. Also, in seq_client_alloc_seq() please fix the CERROR() format to use the proper ": rc = %d\n" format, and no message should be printed if rc == -EINPROGRESS, since that is already handled in __seq_server_alloc_meta().

            adilger Andreas Dilger added a comment - Emoly, it looks like these errors are all -EINPROGRESS = -115 , which means that the MDTs are waiting for MDT0000 to start the master sequence server and be grated a meta sequence for the first time. Rather than printing this via CERROR(), it would be better to print it something like the following in __seq_service_alloc_meta() : if (rc == -EINPROGRESS) { static int printed; if (printed++ % 8 == 0) LCONSOLE_INFO("%s: waiting to contact MDT0000 to allocate new meta-sequence\n", ...); } else { CERROR("%s: cannot allocate new meta-sequence: rc = %d\n", ...); } That prints the initial message in a way that is clear what server is involved and that this is not a problem, but doesn't print too many messages. Also, in seq_client_alloc_seq() please fix the CERROR() format to use the proper ": rc = %d\n" format, and no message should be printed if rc == -EINPROGRESS, since that is already handled in __seq_server_alloc_meta() .
            pjones Peter Jones added a comment -

            Olaf

            Could you please open tickets to track the review/landing of the two commits that do not seem unique to LLNL

            6e16bd8
            7bbcefa

            Although the former is currently tracked under LU-4009 that reference is really for tracking a much broader effort (ZIL)
            Thanks

            Peter

            pjones Peter Jones added a comment - Olaf Could you please open tickets to track the review/landing of the two commits that do not seem unique to LLNL 6e16bd8 7bbcefa Although the former is currently tracked under LU-4009 that reference is really for tracking a much broader effort (ZIL) Thanks Peter
            pjones Peter Jones added a comment -

            Emoly

            Can you please help improve this behaviour as Andreas is about to lay out

            Peter

            pjones Peter Jones added a comment - Emoly Can you please help improve this behaviour as Andreas is about to lay out Peter
            ofaaland Olaf Faaland added a comment - - edited

            Peter,

            Yes, the test really was performed with 2.10.0+3 patches, because it was expedient for Joe (Giuseppe) to get that going. He's rebasing now and will be testing with 2.10.2 in the near future.

            Our 2.10 build has only 3 patches beyond the 2.10.x we get from you, see below. Engineers need not consider our patches for this issue, but I'll figure out a better long term answer for future tickets re: 2.10.

            * 6e16bd8 (HEAD, tag: 2.10.0_1.chaos, czstash/2.10.0-llnl) LU-4009 osd-zfs: Add tunables to disable sync (DEBUG)
            |  lustre/osd-zfs/osd_handler.c | 13 +++++++++++--
            |  lustre/osd-zfs/osd_object.c  | 12 ++++++++++--
            |  2 files changed, 21 insertions(+), 4 deletions(-)
            * 1e66cc7 LLNL build customizations
            |  lustre.spec.in                  | 20 +++++++++++++-------
            |  rpm/kmp-lustre-osd-zfs.preamble |  1 +
            |  2 files changed, 14 insertions(+), 7 deletions(-)
            * 7bbcefa Don't install lustre init script on systemd systems
            |  lustre.spec.in             | 4 +++-
            |  lustre/conf/Makefile.am    | 5 ++++-
            |  lustre/scripts/Makefile.am | 4 ++--
            |  3 files changed, 9 insertions(+), 4 deletions(-)
            * 58fd06e (tag: v2_10_0_0, tag: v2_10_0, tag: 2.10.0) New release 2.10.0
            |  lustre/ChangeLog | 2 +-
            |  1 file changed, 1 insertion(+), 1 deletion(-)
            
            
            
            ofaaland Olaf Faaland added a comment - - edited Peter, Yes, the test really was performed with 2.10.0+3 patches, because it was expedient for Joe (Giuseppe) to get that going. He's rebasing now and will be testing with 2.10.2 in the near future. Our 2.10 build has only 3 patches beyond the 2.10.x we get from you, see below. Engineers need not consider our patches for this issue, but I'll figure out a better long term answer for future tickets re: 2.10. * 6e16bd8 (HEAD, tag: 2.10.0_1.chaos, czstash/2.10.0-llnl) LU-4009 osd-zfs: Add tunables to disable sync (DEBUG) | lustre/osd-zfs/osd_handler.c | 13 +++++++++++-- | lustre/osd-zfs/osd_object.c | 12 ++++++++++-- | 2 files changed, 21 insertions(+), 4 deletions(-) * 1e66cc7 LLNL build customizations | lustre.spec.in | 20 +++++++++++++------- | rpm/kmp-lustre-osd-zfs.preamble | 1 + | 2 files changed, 14 insertions(+), 7 deletions(-) * 7bbcefa Don't install lustre init script on systemd systems | lustre.spec.in | 4 +++- | lustre/conf/Makefile.am | 5 ++++- | lustre/scripts/Makefile.am | 4 ++-- | 3 files changed, 9 insertions(+), 4 deletions(-) * 58fd06e (tag: v2_10_0_0, tag: v2_10_0, tag: 2.10.0) New release 2.10.0 | lustre/ChangeLog | 2 +- | 1 file changed, 1 insertion(+), 1 deletion(-)

            Andreas, thanks for the the explanation. While the super sequences do end up allocated, I have a problem with it being reported as an error. I feel like labelling it as an error is misleading and may cause someone to believe there is a problem, when there in fact isn't.

            dinatale2 Giuseppe Di Natale (Inactive) added a comment - Andreas, thanks for the the explanation. While the super sequences do end up allocated, I have a problem with it being reported as an error. I feel like labelling it as an error is misleading and may cause someone to believe there is a problem, when there in fact isn't.

            People

              emoly.liu Emoly Liu
              dinatale2 Giuseppe Di Natale (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: