[LU-1294] Segmentation fault running lfsck Created: 09/Apr/12 Updated: 10/Apr/12 Resolved: 09/Apr/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Sarah Liu |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
|
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 6420 | ||||
| Description |
|
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/59343d1e-8207-11e1-9c84-525400d2bfa6. lfsck hit a segmentation fault after printing an invalid argument message. It appears that the "ostdb-7" file is cut off, but that may just be due to the way the output is logged. lfsck -c -l --mdsdb /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/mdsdb --ostdb /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-0 /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-1 /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-2 /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-3 /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-4 /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-5 /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-6 /mnt/lustre 06:03:09:lfsck 1.41.90.wc4 (01-Sep-2011) 06:03:10:/home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/mdsdb:mdshdr 06:03:10:: Invalid argument 06:03:10:/usr/lib64/lustre/tests/test-framework.sh: line 2772: 8975 Segmentation fault lfsck -c -l --mdsdb /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/mdsdb --ostdb /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-0 /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-1 /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-2 /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-3 /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-4 /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-5 /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-6 /mnt/lustre 06:03:10: lfsck : @@@@@@ FAIL: lfsck -c -l --mdsdb /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/mdsdb --ostdb /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-0 /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-1 /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-2 /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-3 /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-4 /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-5 /home/autotest/shared_dir/2012-04-07/230543-7f46bd3a7040/ostdb-6 /mnt/lustre returned 139, should be <= 1 06:03:11:Dumping lctl log to /logdir/test_logs/2012-04-07/lustre-master-el6-x86_64-el5-x86_64__479__-7f46bd3a7040/lfsck..*.1333879390.log 06:03:15:lfsck returned 0 |
| Comments |
| Comment by Andreas Dilger [ 09/Apr/12 ] |
|
TT-487 is to track automated collection of core dumps, but until that is finished this test needs to be run by hand and a core file attached to this bug and/or run lfsck under gdb and collect the stack trace. |
| Comment by Andreas Dilger [ 09/Apr/12 ] |
|
The client and server are running different versions - RHEL5 on the client and RHEL6 on the server. This may be the root cause of the problem, since db4 is not a very portable database format. Typically this is not a problem for lfsck, since the databases are only useful for a very short time. |
| Comment by Peter Jones [ 09/Apr/12 ] |
|
Sarah Could you please try and reproduce this failure and gather fuller data? Thanks Peter |
| Comment by Peter Jones [ 09/Apr/12 ] |
|
duplicate of |
| Comment by Sarah Liu [ 10/Apr/12 ] |
|
(gdb) run -c -l --mdsdb /scratch/mdsdb --ostdb /scratch/ostdb-0 /mnt/lustre Program received signal SIGSEGV, Segmentation fault. |