[LU-2979] sanity 133a: proc counter for mkdir on mds1 was not incremented Created: 18/Mar/13  Updated: 14/Aug/16  Resolved: 14/Aug/16

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Di Wang
Resolution: Won't Fix Votes: 0
Labels: LB

Issue Links:
Duplicate
is duplicated by LU-3296 fs/lustre/mdt/*/md_stats not showing ... Resolved
Related
is related to LU-1282 Lustre 2.1 client memory usage at mou... Closed
is related to LU-2902 sanity test_156: NOT IN CACHE: before... Resolved
Severity: 3
Rank (Obsolete): 7262

 Description   

This issue was created by maloo for Li Wei <liwei@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/9b412d48-8dbe-11e2-bb99-52540035b04c.

The sub-test test_133a failed with the following error:

The counter for mkdir on mds1 was not incremented

Info required for matching: sanity 133a

== sanity test 133a: Verifying MDT stats ========================================== 09:55:50 (1363366550)
CMD: client-20-ib /usr/sbin/lctl list_param mdt.*.rename_stats
mdt.lustre-MDT0000.rename_stats
CMD: client-20-ib /usr/sbin/lctl set_param mdt.*.md_stats=clear
mdt.lustre-MDT0000.md_stats=clear
CMD: client-21-ib /usr/sbin/lctl set_param obdfilter.*.stats=clear
obdfilter.lustre-OST0000.stats=clear
obdfilter.lustre-OST0001.stats=clear
obdfilter.lustre-OST0002.stats=clear
obdfilter.lustre-OST0003.stats=clear
obdfilter.lustre-OST0004.stats=clear
obdfilter.lustre-OST0005.stats=clear
obdfilter.lustre-OST0006.stats=clear
CMD: client-20-ib /usr/sbin/lctl get_param mdt.lustre-MDT0000.md_stats

 sanity test_133a: @@@@@@ FAIL: The counter for mkdir on mds1 was not incremented 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:3973:error_noexit()
  = /usr/lib64/lustre/tests/test-framework.sh:3996:error()
  = /usr/lib64/lustre/tests/sanity.sh:8033:check_stats()
  = /usr/lib64/lustre/tests/sanity.sh:8056:test_133a()
  = /usr/lib64/lustre/tests/test-framework.sh:4251:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:4284:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4139:run_test()
  = /usr/lib64/lustre/tests/sanity.sh:8083:main()
Dumping lctl log to /logdir/test_logs/2013-03-15/lustre-reviews-el6-x86_64--review--1_2_1__14053__-70011914322200-085246/sanity.test_133a.*.1363366552.log
CMD: client-20-ib,client-21-ib,client-22-ib,client-23-ib.lab.whamcloud.com /usr/sbin/lctl dk > /logdir/test_logs/2013-03-15/lustre-reviews-el6-x86_64--review--1_2_1__14053__-70011914322200-085246/sanity.test_133a.debug_log.\$(hostname -s).1363366552.log;
         dmesg > /logdir/test_logs/2013-03-15/lustre-reviews-el6-x86_64--review--1_2_1__14053__-70011914322200-085246/sanity.test_133a.dmesg.\$(hostname -s).1363366552.log


 Comments   
Comment by Jodi Levi (Inactive) [ 18/Mar/13 ]

Di,
Could you have a quick look at this one and give your opinion on the severity?
Thank you!

Comment by Di Wang [ 18/Mar/13 ]

Jodi, minor is fine to me, IMHO, it is probably test script or counter tracking problem, which does not impact the "real" function.

Comment by Sarah Liu [ 21/Mar/13 ]

another instance seen in interop between 2.3.0 client and 2.4 server:
https://maloo.whamcloud.com/test_sets/491428e6-8fed-11e2-9b28-52540035b04c

Comment by Di Wang [ 23/Apr/13 ]

http://review.whamcloud.com/6136 Add some debug information for the failure to help me understand the problem.

Comment by Robert Read (Inactive) [ 10/May/13 ]

IMHO, not having properly functioning performance counters is pretty major problem.

Comment by John Hammond [ 13/May/13 ]

I believe this is a bug related to on demand allocation of per-cpu stats. If nothing has yet happened on cpu 0 then no stats will have been allocated for cpu 0, and hence lprocfs_stats_counter_get(stats, 0, index) will return NULL. This NULL will in turn be returned from lprocfs_stats_seq_start() and will be interpreted by the seq_file code as an early EOF.

static inline struct lprocfs_counter *
lprocfs_stats_counter_get(struct lprocfs_stats *stats, unsigned int cpuid,
                          int index)
{
        struct lprocfs_counter *cntr;

        cntr = &stats->ls_percpu[cpuid]->lp_cntr[index];

        if ((stats->ls_flags & LPROCFS_STATS_FLAG_IRQ_SAFE) != 0)
                cntr = (void *)cntr + index * sizeof(__s64);

        return cntr;
}

static void *lprocfs_stats_seq_start(struct seq_file *p, loff_t *pos)
{
        struct lprocfs_stats *stats = p->private;
        /* return 1st cpu location */
        return (*pos >= stats->ls_num) ? NULL :
                lprocfs_stats_counter_get(stats, 0, *pos);
}
Comment by Di Wang [ 13/May/13 ]

Ah, quite possible, I spent so much time to investigate the problem on collecting, but did not notice listing. Thank you.

Comment by John Hammond [ 13/May/13 ]

Please see http://review.whamcloud.com/6328.

Comment by Jodi Levi (Inactive) [ 14/May/13 ]

Patch landed to master

Comment by Sarah Liu [ 21/May/13 ]

verified with the latest tag-2.4.50RC1, client is running tag-2.4.50RC1 and server is running 2.3.0

Comment by Andreas Dilger [ 12/Jul/13 ]

http://review.whamcloud.com/6136 is still not landed

Comment by James A Simmons [ 14/Aug/16 ]

Old blocker for unsupported version

Generated at Sat Feb 10 01:29:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.