[LU-577] REPLAY_SINGLE test_70b failed due to $MOUNT not pass on to rundbench Created: 08/Aug/11  Updated: 08/May/12  Resolved: 08/May/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.6
Fix Version/s: Lustre 2.3.0, Lustre 2.1.2, Lustre 1.8.8

Type: Bug Priority: Minor
Reporter: Jay Lan (Inactive) Assignee: Jian Yu
Resolution: Fixed Votes: 0
Labels: None
Environment:

Server: centos5.6, client: sles11sp1


Attachments: File rundbench_MOUNT.patch    
Severity: 3
Rank (Obsolete): 4636

 Description   

When I run "ACC_SM_ONLY=REPLAY_SINGLE NAME=my_config sh acceptance-small.sh"
test 70b failed if my $MOUNT is not /mnt/lustre.

The $MOUNT variable was set in my_config script and exported. The variable
was picked up in replay-single.sh correctly, but failed to pass on to rundbench.

The attached patch fixed the problem.



 Comments   
Comment by James A Simmons [ 19/Aug/11 ]

This is a issue also with Lustre 2.X.

Comment by James A Simmons [ 24/Aug/11 ]

This test fails for me when I use more than 32 nodes. The check_for_process reports rundbench falsely which causes the test to fail. Also the test doesn't clean up the directory after itself.

Comment by Peter Jones [ 29/Aug/11 ]

YuJian

Could you please help with this one?

Thanks

Peter

Comment by James A Simmons [ 12/Jan/12 ]

Doing some testing with a potential patch. What I'm seeing on the MDS side is

JBD: mdt_"X" wants too many credits (3296 > 1024)

At this point we get /bin/rm: cannot remove directory `./clients/client0/~dmtmp/SEED': No such file or directory

from running rundbench.

Comment by Jian Yu [ 12/Jan/12 ]

Hi James,

JBD: mdt_"X" wants too many credits (3296 > 1024)

        if (nblocks > journal->j_max_transaction_buffers) {
                printk(KERN_ERR "JBD: %s wants too many credits (%d > %d)\n",
                       current->comm, nblocks,
                       journal->j_max_transaction_buffers);
                ret = -ENOSPC;
                goto out;
        }

You've to increase the journal size. If "-J size=64" was specified, then:

j_max_transaction_buffers = j_maxlen / 4 = 16MB = 16 * 1024KB / 4KB = 4096 (max transaction credit blocks)
Comment by James A Simmons [ 13/Jan/12 ]

Yes setting the journal size to 64 worked to fix those errors. I noticed this error no matter how many clients I used. Perhaps as a part of a patch change local.sh to set a journal size of 64MB always? Now to figure out why this test fails with many clients.

Comment by Jian Yu [ 16/Jan/12 ]

Perhaps as a part of a patch change local.sh to set a journal size of 64MB always?

I think we'd better not make this change.

Currently, in lustre/utils/mkfs_lustre.c, the default journal size is calculated according to the device size as follows:

                /* Journal size in MB */
                if (strstr(mop->mo_mkfsopts, "-J") == NULL) {
                        /* Choose our own default journal size */
                        long journal_sz = 0, max_sz;
                        if (device_sz > 1024 * 1024) /* 1GB */
                                journal_sz = (device_sz / 102400) * 4;
                        /* cap journal size at 1GB */
                        if (journal_sz > 1024L)
                                journal_sz = 1024L;
                        /* man mkfs.ext3 */
                        max_sz = (102400 * L_BLOCK_SIZE) >> 20; /* 400MB */
                        if (journal_sz > max_sz)
                                journal_sz = max_sz;
                        if (journal_sz) {
                                sprintf(buf, " -J size=%ld", journal_sz);
                                strscat(mop->mo_mkfsopts, buf,
                                        sizeof(mop->mo_mkfsopts));
                        }
                }

It's better to improve the above codes to calculate more correct default journal size or figure out why more transaction buffer credits are requested now.

Comment by James A Simmons [ 18/Jan/12 ]

The patch posted at http://review.whamcloud.com/#change,252 fixes the issues I have been seeing. As for the more than 32 node problem. I tracked that down to pdsh itself. Pdsh does a fan out in 32 units which was causing not all of the dbench apps to start in time. So for the PDSH I set it to "pdsh -S -f 64 -Rssh -w". The -f increased the number of simultaneous connections that can be started to 64.

Comment by Jian Yu [ 12/Apr/12 ]

Patch for b1_8 branch is in http://review.whamcloud.com/2518.

Comment by James A Simmons [ 12/Apr/12 ]

I will give it a try today. Also I like to suggest we set up pdsh with the -f flag to the number of host available.

Comment by Build Master (Inactive) [ 19/Apr/12 ]

Integrated in lustre-b1_8 » x86_64,client,el6,inkernel #184
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision 4bd1b85a9fc80b7370229318e947d6cf4cfc927a)

Result = SUCCESS
Johann Lombardi : 4bd1b85a9fc80b7370229318e947d6cf4cfc927a
Files :

  • lustre/tests/replay-single.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 19/Apr/12 ]

Integrated in lustre-b1_8 » i686,client,el5,inkernel #184
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision 4bd1b85a9fc80b7370229318e947d6cf4cfc927a)

Result = SUCCESS
Johann Lombardi : 4bd1b85a9fc80b7370229318e947d6cf4cfc927a
Files :

  • lustre/tests/replay-single.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 19/Apr/12 ]

Integrated in lustre-b1_8 » i686,server,el5,ofa #184
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision 4bd1b85a9fc80b7370229318e947d6cf4cfc927a)

Result = SUCCESS
Johann Lombardi : 4bd1b85a9fc80b7370229318e947d6cf4cfc927a
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/replay-single.sh
Comment by Build Master (Inactive) [ 19/Apr/12 ]

Integrated in lustre-b1_8 » i686,client,el5,ofa #184
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision 4bd1b85a9fc80b7370229318e947d6cf4cfc927a)

Result = SUCCESS
Johann Lombardi : 4bd1b85a9fc80b7370229318e947d6cf4cfc927a
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/replay-single.sh
Comment by Build Master (Inactive) [ 19/Apr/12 ]

Integrated in lustre-b1_8 » x86_64,server,el5,inkernel #184
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision 4bd1b85a9fc80b7370229318e947d6cf4cfc927a)

Result = SUCCESS
Johann Lombardi : 4bd1b85a9fc80b7370229318e947d6cf4cfc927a
Files :

  • lustre/tests/replay-single.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 19/Apr/12 ]

Integrated in lustre-b1_8 » x86_64,client,el5,ofa #184
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision 4bd1b85a9fc80b7370229318e947d6cf4cfc927a)

Result = SUCCESS
Johann Lombardi : 4bd1b85a9fc80b7370229318e947d6cf4cfc927a
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/replay-single.sh
Comment by Build Master (Inactive) [ 19/Apr/12 ]

Integrated in lustre-b1_8 » x86_64,client,el5,inkernel #184
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision 4bd1b85a9fc80b7370229318e947d6cf4cfc927a)

Result = SUCCESS
Johann Lombardi : 4bd1b85a9fc80b7370229318e947d6cf4cfc927a
Files :

  • lustre/tests/replay-single.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 19/Apr/12 ]

Integrated in lustre-b1_8 » x86_64,server,el5,ofa #184
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision 4bd1b85a9fc80b7370229318e947d6cf4cfc927a)

Result = SUCCESS
Johann Lombardi : 4bd1b85a9fc80b7370229318e947d6cf4cfc927a
Files :

  • lustre/tests/replay-single.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 19/Apr/12 ]

Integrated in lustre-b1_8 » i686,server,el5,inkernel #184
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision 4bd1b85a9fc80b7370229318e947d6cf4cfc927a)

Result = SUCCESS
Johann Lombardi : 4bd1b85a9fc80b7370229318e947d6cf4cfc927a
Files :

  • lustre/tests/replay-single.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 19/Apr/12 ]

Integrated in lustre-b1_8 » i686,client,el6,inkernel #184
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision 4bd1b85a9fc80b7370229318e947d6cf4cfc927a)

Result = SUCCESS
Johann Lombardi : 4bd1b85a9fc80b7370229318e947d6cf4cfc927a
Files :

  • lustre/tests/replay-single.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Apr/12 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #498
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision d2c1a397c4d450ccdb0006e8b52dc0b418f54466)

Result = SUCCESS
Oleg Drokin : d2c1a397c4d450ccdb0006e8b52dc0b418f54466
Files :

  • lustre/tests/replay-single.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Apr/12 ]

Integrated in lustre-master » i686,client,el5,inkernel #498
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision d2c1a397c4d450ccdb0006e8b52dc0b418f54466)

Result = SUCCESS
Oleg Drokin : d2c1a397c4d450ccdb0006e8b52dc0b418f54466
Files :

  • lustre/tests/replay-single.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Apr/12 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #498
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision d2c1a397c4d450ccdb0006e8b52dc0b418f54466)

Result = SUCCESS
Oleg Drokin : d2c1a397c4d450ccdb0006e8b52dc0b418f54466
Files :

  • lustre/tests/replay-single.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Apr/12 ]

Integrated in lustre-master » i686,client,el6,ofa #498
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision d2c1a397c4d450ccdb0006e8b52dc0b418f54466)

Result = SUCCESS
Oleg Drokin : d2c1a397c4d450ccdb0006e8b52dc0b418f54466
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/replay-single.sh
Comment by Build Master (Inactive) [ 30/Apr/12 ]

Integrated in lustre-master » x86_64,server,el5,ofa #498
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision d2c1a397c4d450ccdb0006e8b52dc0b418f54466)

Result = SUCCESS
Oleg Drokin : d2c1a397c4d450ccdb0006e8b52dc0b418f54466
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/replay-single.sh
Comment by Build Master (Inactive) [ 30/Apr/12 ]

Integrated in lustre-master » i686,server,el5,ofa #498
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision d2c1a397c4d450ccdb0006e8b52dc0b418f54466)

Result = SUCCESS
Oleg Drokin : d2c1a397c4d450ccdb0006e8b52dc0b418f54466
Files :

  • lustre/tests/replay-single.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Apr/12 ]

Integrated in lustre-master » i686,client,el6,inkernel #498
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision d2c1a397c4d450ccdb0006e8b52dc0b418f54466)

Result = SUCCESS
Oleg Drokin : d2c1a397c4d450ccdb0006e8b52dc0b418f54466
Files :

  • lustre/tests/replay-single.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Apr/12 ]

Integrated in lustre-master » x86_64,server,el6,ofa #498
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision d2c1a397c4d450ccdb0006e8b52dc0b418f54466)

Result = SUCCESS
Oleg Drokin : d2c1a397c4d450ccdb0006e8b52dc0b418f54466
Files :

  • lustre/tests/replay-single.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Apr/12 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #498
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision d2c1a397c4d450ccdb0006e8b52dc0b418f54466)

Result = SUCCESS
Oleg Drokin : d2c1a397c4d450ccdb0006e8b52dc0b418f54466
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/replay-single.sh
Comment by Build Master (Inactive) [ 30/Apr/12 ]

Integrated in lustre-master » i686,server,el5,inkernel #498
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision d2c1a397c4d450ccdb0006e8b52dc0b418f54466)

Result = SUCCESS
Oleg Drokin : d2c1a397c4d450ccdb0006e8b52dc0b418f54466
Files :

  • lustre/tests/replay-single.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Apr/12 ]

Integrated in lustre-master » x86_64,client,el5,ofa #498
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision d2c1a397c4d450ccdb0006e8b52dc0b418f54466)

Result = SUCCESS
Oleg Drokin : d2c1a397c4d450ccdb0006e8b52dc0b418f54466
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/replay-single.sh
Comment by Build Master (Inactive) [ 30/Apr/12 ]

Integrated in lustre-master » i686,client,el5,ofa #498
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision d2c1a397c4d450ccdb0006e8b52dc0b418f54466)

Result = SUCCESS
Oleg Drokin : d2c1a397c4d450ccdb0006e8b52dc0b418f54466
Files :

  • lustre/tests/test-framework.sh
  • lustre/tests/replay-single.sh
Comment by Build Master (Inactive) [ 30/Apr/12 ]

Integrated in lustre-master » i686,server,el6,inkernel #498
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision d2c1a397c4d450ccdb0006e8b52dc0b418f54466)

Result = SUCCESS
Oleg Drokin : d2c1a397c4d450ccdb0006e8b52dc0b418f54466
Files :

  • lustre/tests/replay-single.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Apr/12 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #498
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision d2c1a397c4d450ccdb0006e8b52dc0b418f54466)

Result = SUCCESS
Oleg Drokin : d2c1a397c4d450ccdb0006e8b52dc0b418f54466
Files :

  • lustre/tests/replay-single.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Apr/12 ]

Integrated in lustre-master » x86_64,client,el6,ofa #498
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision d2c1a397c4d450ccdb0006e8b52dc0b418f54466)

Result = SUCCESS
Oleg Drokin : d2c1a397c4d450ccdb0006e8b52dc0b418f54466
Files :

  • lustre/tests/replay-single.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Apr/12 ]

Integrated in lustre-master » i686,server,el6,ofa #498
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision d2c1a397c4d450ccdb0006e8b52dc0b418f54466)

Result = SUCCESS
Oleg Drokin : d2c1a397c4d450ccdb0006e8b52dc0b418f54466
Files :

  • lustre/tests/replay-single.sh
  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 30/Apr/12 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #498
LU-577 tests: FAIL replay-single test_70b rundbench load (Revision d2c1a397c4d450ccdb0006e8b52dc0b418f54466)

Result = SUCCESS
Oleg Drokin : d2c1a397c4d450ccdb0006e8b52dc0b418f54466
Files :

  • lustre/tests/replay-single.sh
  • lustre/tests/test-framework.sh
Comment by Jian Yu [ 02/May/12 ]

Patches have been landed on b1_8 and master branches.

Comment by James A Simmons [ 02/May/12 ]

b2_1 patch is still left to merge.

Comment by Jian Yu [ 02/May/12 ]

Hello Oleg,
Could you please land the patch of http://review.whamcloud.com/#change,2538 on b2_1 branch? Thanks!

Comment by James A Simmons [ 08/May/12 ]

Landed to b2_1 branch. Ticket can be closed.

Comment by Peter Jones [ 08/May/12 ]

Landed for 1.8.8, 2.1.2, and 2.3

Generated at Sat Feb 10 01:08:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.