[LU-5204] 2.6 DNE stress testing: EINVAL when attempting to delete file - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.6.0
Labels:
- dne2

Severity:
3
Rank (Obsolete):
14529

Description

After our stress testing this weekend, we are unable to delete some (perhaps any?) of the files on a particular OST (OST 38). All of them give EINVAL.

For example:
[root@galaxy-esf-mds008 tmp]# rm -f posix_shm_open
rm: cannot remove `posix_shm_open': Invalid argument
[root@galaxy-esf-mds008 tmp]# lfs getstripe posix_shm_open
posix_shm_open
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 38
obdidx objid objid group
38 907263 0xdd7ff 0

However, OST 38 (OST0027) is showing up in lctl dl, and as far as I know, there are no issues with it. (The dk logs on the OSS don't show any issues.)

Here's the relevant part of the log from MDT000:
00000004:00020000:2.0:1402947131.685511:0:25039:0:(lod_lov.c:695:validate_lod_and_idx()) esfprod-MDT0000-mdtlov: bad idx: 38 of 64
00000004:00000001:2.0:1402947131.685513:0:25039:0:(lod_lov.c:757:lod_initialize_objects()) Process leaving via out (rc=18446744073709551594 : -22 : 0xffffffffffffffea)
00000004:00000010:2.0:1402947131.685515:0:25039:0:(lod_lov.c:782:lod_initialize_objects()) kfreed 'stripe': 8 at ffff8807fc208a00.
00000004:00000001:2.0:1402947131.685516:0:25039:0:(lod_lov.c:788:lod_initialize_objects()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
00000004:00000001:2.0:1402947131.685519:0:25039:0:(lod_lov.c:839:lod_parse_striping()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
00000004:00000001:2.0:1402947131.685520:0:25039:0:(lod_lov.c:885:lod_load_striping_locked()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
00000004:00000001:2.0:1402947131.685522:0:25039:0:(lod_object.c:2754:lod_declare_object_destroy()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
00000004:00000001:2.0:1402947131.685524:0:25039:0:(mdd_dir.c:1586:mdd_unlink()) Process leaving via stop (rc=18446744073709551594 : -22 : 0xffffffffffffffea)

I don't know for certain if this is related to DNE2 or not, but this is not an error I've seen before. The file system and objects are still around, so I can provide further data if needed.

Any thoughts?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

invalid_object_client_mdt0007
506 kB
16/Jun/14 7:41 PM
invalid_object_mds_mdt0000
133 kB
16/Jun/14 7:41 PM
lctl_dl_from_client
4 kB
16/Jun/14 7:41 PM
lctl_dl_from_mds001_mdt0000
4 kB
16/Jun/14 7:41 PM
LU-5204_mds0_start_log.tar.gz
0.2 kB
17/Jun/14 7:48 PM
LU-5204_start_log_with_oss.tar.gz
0.3 kB
17/Jun/14 8:10 PM
mdt0.config.log
57 kB
17/Jun/14 9:14 PM

Issue Links

is related to

LU-5233 2.6 DNE stress testing: (lod_object.c:930:lod_declare_attr_set()) ASSERTION( lo->ldo_stripe ) failed

Resolved

Activity

People

Assignee:: Emoly Liu

Reporter:: Patrick Farrell (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 16/Jun/14 7:37 PM

Updated:: 18/Feb/15 9:09 PM

Resolved:: 06/Nov/14 6:56 PM