[LU-577] REPLAY_SINGLE test_70b failed due to $MOUNT not pass on to rundbench Created: 08/Aug/11 Updated: 08/May/12 Resolved: 08/May/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.6 |
| Fix Version/s: | Lustre 2.3.0, Lustre 2.1.2, Lustre 1.8.8 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Jay Lan (Inactive) | Assignee: | Jian Yu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Server: centos5.6, client: sles11sp1 |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 4636 |
| Description |
|
When I run "ACC_SM_ONLY=REPLAY_SINGLE NAME=my_config sh acceptance-small.sh" The $MOUNT variable was set in my_config script and exported. The variable The attached patch fixed the problem. |
| Comments |
| Comment by James A Simmons [ 19/Aug/11 ] |
|
This is a issue also with Lustre 2.X. |
| Comment by James A Simmons [ 24/Aug/11 ] |
|
This test fails for me when I use more than 32 nodes. The check_for_process reports rundbench falsely which causes the test to fail. Also the test doesn't clean up the directory after itself. |
| Comment by Peter Jones [ 29/Aug/11 ] |
|
YuJian Could you please help with this one? Thanks Peter |
| Comment by James A Simmons [ 12/Jan/12 ] |
|
Doing some testing with a potential patch. What I'm seeing on the MDS side is JBD: mdt_"X" wants too many credits (3296 > 1024) At this point we get /bin/rm: cannot remove directory `./clients/client0/~dmtmp/SEED': No such file or directory from running rundbench. |
| Comment by Jian Yu [ 12/Jan/12 ] |
|
Hi James,
if (nblocks > journal->j_max_transaction_buffers) {
printk(KERN_ERR "JBD: %s wants too many credits (%d > %d)\n",
current->comm, nblocks,
journal->j_max_transaction_buffers);
ret = -ENOSPC;
goto out;
}
You've to increase the journal size. If "-J size=64" was specified, then: j_max_transaction_buffers = j_maxlen / 4 = 16MB = 16 * 1024KB / 4KB = 4096 (max transaction credit blocks) |
| Comment by James A Simmons [ 13/Jan/12 ] |
|
Yes setting the journal size to 64 worked to fix those errors. I noticed this error no matter how many clients I used. Perhaps as a part of a patch change local.sh to set a journal size of 64MB always? Now to figure out why this test fails with many clients. |
| Comment by Jian Yu [ 16/Jan/12 ] |
I think we'd better not make this change. Currently, in lustre/utils/mkfs_lustre.c, the default journal size is calculated according to the device size as follows: /* Journal size in MB */
if (strstr(mop->mo_mkfsopts, "-J") == NULL) {
/* Choose our own default journal size */
long journal_sz = 0, max_sz;
if (device_sz > 1024 * 1024) /* 1GB */
journal_sz = (device_sz / 102400) * 4;
/* cap journal size at 1GB */
if (journal_sz > 1024L)
journal_sz = 1024L;
/* man mkfs.ext3 */
max_sz = (102400 * L_BLOCK_SIZE) >> 20; /* 400MB */
if (journal_sz > max_sz)
journal_sz = max_sz;
if (journal_sz) {
sprintf(buf, " -J size=%ld", journal_sz);
strscat(mop->mo_mkfsopts, buf,
sizeof(mop->mo_mkfsopts));
}
}
It's better to improve the above codes to calculate more correct default journal size or figure out why more transaction buffer credits are requested now. |
| Comment by James A Simmons [ 18/Jan/12 ] |
|
The patch posted at http://review.whamcloud.com/#change,252 fixes the issues I have been seeing. As for the more than 32 node problem. I tracked that down to pdsh itself. Pdsh does a fan out in 32 units which was causing not all of the dbench apps to start in time. So for the PDSH I set it to "pdsh -S -f 64 -Rssh -w". The -f increased the number of simultaneous connections that can be started to 64. |
| Comment by Jian Yu [ 12/Apr/12 ] |
|
Patch for b1_8 branch is in http://review.whamcloud.com/2518. |
| Comment by James A Simmons [ 12/Apr/12 ] |
|
I will give it a try today. Also I like to suggest we set up pdsh with the -f flag to the number of host available. |
| Comment by Build Master (Inactive) [ 19/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 19/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 19/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 19/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 19/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 19/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 19/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 19/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 19/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 19/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 30/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 30/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 30/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 30/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 30/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 30/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 30/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 30/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 30/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 30/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 30/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 30/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 30/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 30/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 30/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 30/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 30/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Jian Yu [ 02/May/12 ] |
|
Patches have been landed on b1_8 and master branches. |
| Comment by James A Simmons [ 02/May/12 ] |
|
b2_1 patch is still left to merge. |
| Comment by Jian Yu [ 02/May/12 ] |
|
Hello Oleg, |
| Comment by James A Simmons [ 08/May/12 ] |
|
Landed to b2_1 branch. Ticket can be closed. |
| Comment by Peter Jones [ 08/May/12 ] |
|
Landed for 1.8.8, 2.1.2, and 2.3 |