[LU-2980] sanity.sh test_17b: Read-only file system Created: 18/Mar/13 Updated: 20/May/13 Resolved: 20/May/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Maloo | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HB | ||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 7263 | ||||||||||||||||||||
| Description |
|
This issue was created by maloo for Andreas Dilger <andreas.dilger@intel.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/43161786-8f2c-11e2-92ff-52540035b04c. The sub-test test_17b failed with the following error in the MDS dmesg log:
It seems that test_17a is somehow corrupting the filesystem, since the block number is the same 106889676224884 in the few MDS dmesg logs I looked at, and this seems to be ASCII text from the test_17a() run. (gdb) p /x 106889676224884 $1 = 0x6137312e7974 This is the ASCII string "ty.17a<NUL><NUL>", which might be a fragment from $tdir or similar "sani[ty.17a]". Info required for matching: sanity 17b |
| Comments |
| Comment by Andreas Dilger [ 18/Mar/13 ] |
|
Searching back the past 4 weeks, this has only started failing on 2013-03-12, so it is likely a new regression introduced by a patch landed on that day or the one before: https://maloo.whamcloud.com/sub_tests/8d829c9c-8bfb-11e2-abec-52540035b04c |
| Comment by Di Wang [ 19/Mar/13 ] |
|
In sanity 17a, it tries to create a symlink with "/mnt/lustre/d0.sanity/d17/f.sanity.17a", since it is only 38bytes, so it should be written to i_data, but somehow i_file_acl is being overwritten according to Andreas's comment. But the interesting thing is that i_data has 60 bytes length, I do not know how it can be overwritten. struct ldiskfs_inode_info {
__le32 i_data[15]; /* unconverted */
__u32 i_dtime;
ldiskfs_fsblk_t i_file_acl;
.....
}
|
| Comment by Andreas Dilger [ 19/Mar/13 ] |
|
If this can be reproduced, it would be useful to dump the inode contents, either with debugfs, or "od -tx4" to see what else is being written into the i_data field to offset the symlink data. |
| Comment by Andreas Dilger [ 19/Mar/13 ] |
|
Maybe a patch for test_17a() to see if ls -l works, and/or something like: local mds_dev=$(mdsdevname $(($($LFS getstripe -M $DIR/$tdir/$tfile) + 1))
do_facet $SINGLEMDS debugfs -c -R "stat /ROOT$tdir/$tfile" $mds_dev}}
|
| Comment by Hongchao Zhang [ 21/Mar/13 ] |
|
there is no related patch landed on "ldiskfs", and there are two patches which could be related to the issue but no possible location is found to be related to the issue. how about printing the "i_data" and "i_dtime" alongside "i_file_acl" in ldiskfs_xattr_delete_inode (http://review.whamcloud.com/#change,5798) ? |
| Comment by Hongchao Zhang [ 21/Mar/13 ] |
|
this ticket can't be reproduced by running "sanity.sh" (from subtest0 to subtest17) repeatedly for a long time. |
| Comment by Zhenyu Xu [ 22/Mar/13 ] |
|
I think |
| Comment by James A Simmons [ 22/Mar/13 ] |
|
Yes it is a duplicate. I noticed as well it is a very difficult bug to reproduce. |
| Comment by James A Simmons [ 22/Mar/13 ] |
|
If I encounter this bug again what data should I collect? |
| Comment by Peter Jones [ 25/Mar/13 ] |
|
Hongchao I see that you have created a debug patch - http://review.whamcloud.com/#change,5798 - is your intention to land this so if anyone hits this issue again then we have more info to go on? Peter |
| Comment by Hongchao Zhang [ 27/Mar/13 ] |
|
currently, this bug only occurs during test for patch review, |
| Comment by Hongchao Zhang [ 27/Mar/13 ] |
|
btw, when this ticket occurs, most of the tests in sanity.sh failed! |
| Comment by James A Simmons [ 27/Mar/13 ] |
|
It would be really nice if maloo reported which patch was being tested in its subtest logs to avoid thinking that this bug was on the master branch. The failures started March 12th which is when I introduced the ldiskfs-config.h version of the patch. This makes sense if the source of the problem was the patch from |
| Comment by Peter Jones [ 27/Mar/13 ] |
|
Dropping priority given the latest information. Please close this ticket if no further work needs to be tracked by this ticket |
| Comment by James A Simmons [ 09/Apr/13 ] |
|
I assume we haven't seen this bug in some time. If that is the case we can close this ticket and reopen it if it for some reason reappears. |
| Comment by James A Simmons [ 20/May/13 ] |
|
Peter can you close this ticket. Thanks. |
| Comment by Peter Jones [ 20/May/13 ] |
|
ok thanks |