Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2979

sanity 133a: proc counter for mkdir on mds1 was not incremented

Details

    • Bug
    • Resolution: Won't Fix
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 3
    • 7262

    Description

      This issue was created by maloo for Li Wei <liwei@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/9b412d48-8dbe-11e2-bb99-52540035b04c.

      The sub-test test_133a failed with the following error:

      The counter for mkdir on mds1 was not incremented

      Info required for matching: sanity 133a

      == sanity test 133a: Verifying MDT stats ========================================== 09:55:50 (1363366550)
      CMD: client-20-ib /usr/sbin/lctl list_param mdt.*.rename_stats
      mdt.lustre-MDT0000.rename_stats
      CMD: client-20-ib /usr/sbin/lctl set_param mdt.*.md_stats=clear
      mdt.lustre-MDT0000.md_stats=clear
      CMD: client-21-ib /usr/sbin/lctl set_param obdfilter.*.stats=clear
      obdfilter.lustre-OST0000.stats=clear
      obdfilter.lustre-OST0001.stats=clear
      obdfilter.lustre-OST0002.stats=clear
      obdfilter.lustre-OST0003.stats=clear
      obdfilter.lustre-OST0004.stats=clear
      obdfilter.lustre-OST0005.stats=clear
      obdfilter.lustre-OST0006.stats=clear
      CMD: client-20-ib /usr/sbin/lctl get_param mdt.lustre-MDT0000.md_stats
      
       sanity test_133a: @@@@@@ FAIL: The counter for mkdir on mds1 was not incremented 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:3973:error_noexit()
        = /usr/lib64/lustre/tests/test-framework.sh:3996:error()
        = /usr/lib64/lustre/tests/sanity.sh:8033:check_stats()
        = /usr/lib64/lustre/tests/sanity.sh:8056:test_133a()
        = /usr/lib64/lustre/tests/test-framework.sh:4251:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:4284:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:4139:run_test()
        = /usr/lib64/lustre/tests/sanity.sh:8083:main()
      Dumping lctl log to /logdir/test_logs/2013-03-15/lustre-reviews-el6-x86_64--review--1_2_1__14053__-70011914322200-085246/sanity.test_133a.*.1363366552.log
      CMD: client-20-ib,client-21-ib,client-22-ib,client-23-ib.lab.whamcloud.com /usr/sbin/lctl dk > /logdir/test_logs/2013-03-15/lustre-reviews-el6-x86_64--review--1_2_1__14053__-70011914322200-085246/sanity.test_133a.debug_log.\$(hostname -s).1363366552.log;
               dmesg > /logdir/test_logs/2013-03-15/lustre-reviews-el6-x86_64--review--1_2_1__14053__-70011914322200-085246/sanity.test_133a.dmesg.\$(hostname -s).1363366552.log
      

      Attachments

        Issue Links

          Activity

            [LU-2979] sanity 133a: proc counter for mkdir on mds1 was not incremented

            Old blocker for unsupported version

            simmonsja James A Simmons added a comment - Old blocker for unsupported version
            adilger Andreas Dilger added a comment - http://review.whamcloud.com/6136 is still not landed
            sarah Sarah Liu added a comment -

            verified with the latest tag-2.4.50RC1, client is running tag-2.4.50RC1 and server is running 2.3.0

            sarah Sarah Liu added a comment - verified with the latest tag-2.4.50RC1, client is running tag-2.4.50RC1 and server is running 2.3.0

            Patch landed to master

            jlevi Jodi Levi (Inactive) added a comment - Patch landed to master
            jhammond John Hammond added a comment - - edited
            jhammond John Hammond added a comment - - edited Please see http://review.whamcloud.com/6328 .
            di.wang Di Wang added a comment -

            Ah, quite possible, I spent so much time to investigate the problem on collecting, but did not notice listing. Thank you.

            di.wang Di Wang added a comment - Ah, quite possible, I spent so much time to investigate the problem on collecting, but did not notice listing. Thank you.
            jhammond John Hammond added a comment -

            I believe this is a bug related to on demand allocation of per-cpu stats. If nothing has yet happened on cpu 0 then no stats will have been allocated for cpu 0, and hence lprocfs_stats_counter_get(stats, 0, index) will return NULL. This NULL will in turn be returned from lprocfs_stats_seq_start() and will be interpreted by the seq_file code as an early EOF.

            static inline struct lprocfs_counter *
            lprocfs_stats_counter_get(struct lprocfs_stats *stats, unsigned int cpuid,
                                      int index)
            {
                    struct lprocfs_counter *cntr;
            
                    cntr = &stats->ls_percpu[cpuid]->lp_cntr[index];
            
                    if ((stats->ls_flags & LPROCFS_STATS_FLAG_IRQ_SAFE) != 0)
                            cntr = (void *)cntr + index * sizeof(__s64);
            
                    return cntr;
            }
            
            static void *lprocfs_stats_seq_start(struct seq_file *p, loff_t *pos)
            {
                    struct lprocfs_stats *stats = p->private;
                    /* return 1st cpu location */
                    return (*pos >= stats->ls_num) ? NULL :
                            lprocfs_stats_counter_get(stats, 0, *pos);
            }
            
            jhammond John Hammond added a comment - I believe this is a bug related to on demand allocation of per-cpu stats. If nothing has yet happened on cpu 0 then no stats will have been allocated for cpu 0, and hence lprocfs_stats_counter_get(stats, 0, index) will return NULL. This NULL will in turn be returned from lprocfs_stats_seq_start() and will be interpreted by the seq_file code as an early EOF. static inline struct lprocfs_counter * lprocfs_stats_counter_get(struct lprocfs_stats *stats, unsigned int cpuid, int index) { struct lprocfs_counter *cntr; cntr = &stats->ls_percpu[cpuid]->lp_cntr[index]; if ((stats->ls_flags & LPROCFS_STATS_FLAG_IRQ_SAFE) != 0) cntr = (void *)cntr + index * sizeof(__s64); return cntr; } static void *lprocfs_stats_seq_start(struct seq_file *p, loff_t *pos) { struct lprocfs_stats *stats = p->private; /* return 1st cpu location */ return (*pos >= stats->ls_num) ? NULL : lprocfs_stats_counter_get(stats, 0, *pos); }
            rread Robert Read added a comment -

            IMHO, not having properly functioning performance counters is pretty major problem.

            rread Robert Read added a comment - IMHO, not having properly functioning performance counters is pretty major problem.
            di.wang Di Wang added a comment -

            http://review.whamcloud.com/6136 Add some debug information for the failure to help me understand the problem.

            di.wang Di Wang added a comment - http://review.whamcloud.com/6136 Add some debug information for the failure to help me understand the problem.
            sarah Sarah Liu added a comment -

            another instance seen in interop between 2.3.0 client and 2.4 server:
            https://maloo.whamcloud.com/test_sets/491428e6-8fed-11e2-9b28-52540035b04c

            sarah Sarah Liu added a comment - another instance seen in interop between 2.3.0 client and 2.4 server: https://maloo.whamcloud.com/test_sets/491428e6-8fed-11e2-9b28-52540035b04c

            People

              di.wang Di Wang
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: