[LU-1096] Test failure on test suite lfsck,Segmentation fault Created: 12/Feb/12 Updated: 30/May/12 Resolved: 30/May/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 6460 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/1bfcbe72-553d-11e1-9aa8-5254004bbbd3. |
| Comments |
| Comment by Peter Jones [ 16/Feb/12 ] |
|
Andreas will look into this one |
| Comment by Peter Jones [ 19/Mar/12 ] |
|
Niu Andreas is out on vacation for the next two weeks so could you please make an initial assessment of this issue? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 19/Mar/12 ] |
lfsck -c -l --mdsdb /scratch/mdsdb --ostdb /scratch/ostdb-0 /scratch/ostdb-1 /scratch/ostdb-2 /scratch/ostdb-3 /scratch/ostdb-4 /scratch/ostdb-5 /mnt/lustre lfsck 1.41.90.wc4 (01-Sep-2011) /scratch/mdsdb:mdshdr : Invalid argument /usr/lib64/lustre/tests/test-framework.sh: line 2679: 19558 Segmentation fault lfsck -c -l --mdsdb /scratch/mdsdb --ostdb /scratch/ostdb-0 /scratch/ostdb-1 /scratch/ostdb-2 /scratch/ostdb-3 /scratch/ostdb-4 /scratch/ostdb-5 /mnt/lustre Looks like open mdsdb failed, and it caused segment fault. + if ((rc = dbp->open(dbp, NULL, fname, dbname, DB_HASH, + DB_CREATE | DB_THREAD, 0664)) != 0) + { + dbp->err(dbp, rc, "%s:%s\n", fname, dbname); + dbp->close(dbp, 0); + return (EIO); I suspect this failure is similar to the The maloo shows that test server and client have different system, server is CentOS release 6.2, but client is CentOS release 5.7, so I think the db4 version should be different. I think we may need to improve the test script to make sure that using same db4 to generate & check the db file, and lfsck also needs be improved to handle the db4 version mismatch gracefully. But I don't think this should be a blocker. |
| Comment by Peter Jones [ 09/Apr/12 ] |
|
Yangsheng Could you please look into how to address this problem. Andreas had some ideas about how to deal with this situation more gracefully Thanks Peter |
| Comment by Yang Sheng [ 12/Apr/12 ] |
|
This issue isn't relate to db4 library. Just wrong invoke log_write(...) after lfsck_opendb(): rc = lfsck_opendb(mds_file, MDS_HDR, &mds_hdrdb, 0, 0, 0);
if (rc != 0) {
>>>>>>> log_write("%s: error opening mds_hdr in %s: rc %d\n",
mds_file, rc);
return(-EINVAL);
Have a guidance for commit patch to e2fsprog? |
| Comment by Yang Sheng [ 12/Apr/12 ] |
|
For record: int log_close(int status)
{
time_t tm;
if (logfile == NULL)
return(0);
time(&tm);
if (status < 0) {
fprintf(logfile, "ERROR: lfsck aborted\n");
} else {
fprintf(logfile, "lfsck run completed: %s\n",ctime(&tm));
}
fprintf(logfile, "===============================================\n\n");
fclose(logfile);
>>>>>>>>logfile = NULL;
return(0);
}
Else may cause a double free in some failure case. |
| Comment by Andreas Dilger [ 13/Apr/12 ] |
|
YS, you can submit patches to Gerrit with the tools/e2fsprogs repo, on the master-lustre branch. I'm just in the middle of rebasing this tree, so any patch would need to be rebased again before it could land. |
| Comment by Yang Sheng [ 30/May/12 ] |
|
Patch landed. close bug. |