Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6722

sanity-lfsck test_1a: FAIL: (3) Fail to start LFSCK for namespace!

Details

    • 3
    • 9223372036854775807

    Description

      sanity-lfsck test 1a failed as follows:

      CMD: shadow-5vm12 /usr/sbin/lctl set_param fail_loc=0x1501
      CMD: shadow-5vm12 /usr/sbin/lctl set_param fail_loc=0
      10.1.4.53@tcp:/lustre /mnt/lustre lustre rw,flock,user_xattr 0 0
      CMD: shadow-5vm1.shadow.whamcloud.com grep -c /mnt/lustre' ' /proc/mounts
      Stopping client shadow-5vm1.shadow.whamcloud.com /mnt/lustre (opts:)
      CMD: shadow-5vm1.shadow.whamcloud.com lsof -t /mnt/lustre
      CMD: shadow-5vm1.shadow.whamcloud.com umount  /mnt/lustre 2>&1
      CMD: shadow-5vm12 /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace -r
      shadow-5vm12: Fail to start LFSCK: Read-only file system
       sanity-lfsck test_1a: @@@@@@ FAIL: (3) Fail to start LFSCK for namespace! 
      

      Maloo report: https://testing.hpdd.intel.com/test_sets/8520310e-108d-11e5-a2d3-5254006e85c2

      Attachments

        Issue Links

          Activity

            [LU-6722] sanity-lfsck test_1a: FAIL: (3) Fail to start LFSCK for namespace!

            We are going to need a patch for SLES12 now that it is supported.

            simmonsja James A Simmons added a comment - We are going to need a patch for SLES12 now that it is supported.
            pjones Peter Jones added a comment -

            Landed for 2.8

            pjones Peter Jones added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15334/
            Subject: LU-6722 ldiskfs: declare credits for quota when destroy inode
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e065d04c3cdcf40ae4d4fd4aee0b2548aca4d045

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15334/ Subject: LU-6722 ldiskfs: declare credits for quota when destroy inode Project: fs/lustre-release Branch: master Current Patch Set: Commit: e065d04c3cdcf40ae4d4fd4aee0b2548aca4d045

            Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/15401/
            Subject: LU-6722 jbd: double minimum journal size for RHEL7
            Project: tools/e2fsprogs
            Branch: master-lustre
            Current Patch Set:
            Commit: 15d2f58cd493a73f860a4415cac0da48c932e72a

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/15401/ Subject: LU-6722 jbd: double minimum journal size for RHEL7 Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: Commit: 15d2f58cd493a73f860a4415cac0da48c932e72a
            gerrit Gerrit Updater added a comment - - edited

            nevermind

            gerrit Gerrit Updater added a comment - - edited nevermind
            gerrit Gerrit Updater added a comment - - edited

            Deleted irrelevant comment.

            gerrit Gerrit Updater added a comment - - edited Deleted irrelevant comment.

            Fan Yong,
            there is also a generic issue that RHEL7 only allows transactions 1/2 the size of RHEL6, so in addition to your patch to fix the transaction credits for setxattr, there also needs to be a second patch to double the minimum journal size from 4MB to 8MB when running on kernels 3.10 and later.

            adilger Andreas Dilger added a comment - Fan Yong, there is also a generic issue that RHEL7 only allows transactions 1/2 the size of RHEL6, so in addition to your patch to fix the transaction credits for setxattr, there also needs to be a second patch to double the minimum journal size from 4MB to 8MB when running on kernels 3.10 and later.

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/15334
            Subject: LU-6722 ldiskfs: more credits to destroy inode with large EA
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 4f54dc215328cc9dbc2cab1236db84de29afe6ad

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/15334 Subject: LU-6722 ldiskfs: more credits to destroy inode with large EA Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4f54dc215328cc9dbc2cab1236db84de29afe6ad
            yong.fan nasf (Inactive) added a comment - - edited

            The http://review.whamcloud.com/10376 mainly affects the the OSD-ldiskfs to complain that someone declare too much credits, but it is not fatal. The root reason for this trouble is inside ldiskfs itself because of not consider quota modification when ldiskfs_xattr_delete_inode().

            yong.fan nasf (Inactive) added a comment - - edited The http://review.whamcloud.com/10376 mainly affects the the OSD-ldiskfs to complain that someone declare too much credits, but it is not fatal. The root reason for this trouble is inside ldiskfs itself because of not consider quota modification when ldiskfs_xattr_delete_inode().
            sarah Sarah Liu added a comment -

            this issue caused many failures of EL7 client/server

            https://testing.hpdd.intel.com/test_sessions/9d7790da-135d-11e5-b4b0-5254006e85c2

            sarah Sarah Liu added a comment - this issue caused many failures of EL7 client/server https://testing.hpdd.intel.com/test_sessions/9d7790da-135d-11e5-b4b0-5254006e85c2
            adilger Andreas Dilger added a comment - - edited

            The reason it is complaining is because of a change to the underlying JBD2 transaction limits:

            static int osd_param_is_not_sane(const struct osd_device *dev,
                                             const struct thandle *th)
            {               
                    struct osd_thandle *oh = container_of(th, typeof(*oh), ot_super);
                            
                    return oh->ot_credits > osd_transaction_size(dev);
            }
            
            #ifdef LDISKFS_HT_MISC
            # define osd_transaction_size(dev) (osd_journal(dev)->j_max_transaction_buffers / 2)
            #else
            # define osd_transaction_size(dev) (osd_journal(dev)->j_max_transaction_buffers)
            #endif
            
            static int osd_trans_start(const struct lu_env *env, struct dt_device *d,
                                       struct thandle *th)
            {
                    :
                    :
                            CWARN("%.16s: too many transaction credits (%d > %d)\n",
                                  LDISKFS_SB(osd_sb(dev))->s_es->s_volume_name,
                                  oh->ot_credits,   
                                  osd_journal(dev)->j_max_transaction_buffers);
            

            The CERROR() message is bad, and should use osd_transaction_size(dev) instead of accessing j_max_transaction_buffers directly. This was missed from http://review.whamcloud.com/10376 originally.

            I expect that LDISKFS_HT_MISC is only true for RHEL 7 (the original upstream patch 8f7d89f36 is in 3.10 and later), and this is cutting the maximum transaction size in half. I don't think that this will be a problem in real usage, only if the journal size is very small, since this is a single-transaction limit.

            An easy solution would be to increase the minimum journal size from 4MB to 8MB for RHEL7. Does the test environment specify the journal size explicitly? It seems that the default journal size should be 4096 blocks for filesystems over 128MB (according to mke2fs ext2fs_default_journal_size(). According to lustre/tests/cfg/local.sh the default OSTSIZE and MDSSIZE are 200MB.

            adilger Andreas Dilger added a comment - - edited The reason it is complaining is because of a change to the underlying JBD2 transaction limits: static int osd_param_is_not_sane(const struct osd_device *dev, const struct thandle *th) { struct osd_thandle *oh = container_of(th, typeof(*oh), ot_super); return oh->ot_credits > osd_transaction_size(dev); } #ifdef LDISKFS_HT_MISC # define osd_transaction_size(dev) (osd_journal(dev)->j_max_transaction_buffers / 2) #else # define osd_transaction_size(dev) (osd_journal(dev)->j_max_transaction_buffers) #endif static int osd_trans_start(const struct lu_env *env, struct dt_device *d, struct thandle *th) { : : CWARN("%.16s: too many transaction credits (%d > %d)\n", LDISKFS_SB(osd_sb(dev))->s_es->s_volume_name, oh->ot_credits, osd_journal(dev)->j_max_transaction_buffers); The CERROR() message is bad, and should use osd_transaction_size(dev) instead of accessing j_max_transaction_buffers directly. This was missed from http://review.whamcloud.com/10376 originally. I expect that LDISKFS_HT_MISC is only true for RHEL 7 (the original upstream patch 8f7d89f36 is in 3.10 and later), and this is cutting the maximum transaction size in half. I don't think that this will be a problem in real usage, only if the journal size is very small, since this is a single-transaction limit. An easy solution would be to increase the minimum journal size from 4MB to 8MB for RHEL7. Does the test environment specify the journal size explicitly? It seems that the default journal size should be 4096 blocks for filesystems over 128MB (according to mke2fs ext2fs_default_journal_size() . According to lustre/tests/cfg/local.sh the default OSTSIZE and MDSSIZE are 200MB.

            People

              yong.fan nasf (Inactive)
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: