Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
Lustre 2.6.0
-
3
-
14529
Description
After our stress testing this weekend, we are unable to delete some (perhaps any?) of the files on a particular OST (OST 38). All of them give EINVAL.
For example:
[root@galaxy-esf-mds008 tmp]# rm -f posix_shm_open
rm: cannot remove `posix_shm_open': Invalid argument
[root@galaxy-esf-mds008 tmp]# lfs getstripe posix_shm_open
posix_shm_open
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 38
obdidx objid objid group
38 907263 0xdd7ff 0
However, OST 38 (OST0027) is showing up in lctl dl, and as far as I know, there are no issues with it. (The dk logs on the OSS don't show any issues.)
Here's the relevant part of the log from MDT000:
00000004:00020000:2.0:1402947131.685511:0:25039:0:(lod_lov.c:695:validate_lod_and_idx()) esfprod-MDT0000-mdtlov: bad idx: 38 of 64
00000004:00000001:2.0:1402947131.685513:0:25039:0:(lod_lov.c:757:lod_initialize_objects()) Process leaving via out (rc=18446744073709551594 : -22 : 0xffffffffffffffea)
00000004:00000010:2.0:1402947131.685515:0:25039:0:(lod_lov.c:782:lod_initialize_objects()) kfreed 'stripe': 8 at ffff8807fc208a00.
00000004:00000001:2.0:1402947131.685516:0:25039:0:(lod_lov.c:788:lod_initialize_objects()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
00000004:00000001:2.0:1402947131.685519:0:25039:0:(lod_lov.c:839:lod_parse_striping()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
00000004:00000001:2.0:1402947131.685520:0:25039:0:(lod_lov.c:885:lod_load_striping_locked()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
00000004:00000001:2.0:1402947131.685522:0:25039:0:(lod_object.c:2754:lod_declare_object_destroy()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
00000004:00000001:2.0:1402947131.685524:0:25039:0:(mdd_dir.c:1586:mdd_unlink()) Process leaving via stop (rc=18446744073709551594 : -22 : 0xffffffffffffffea)
I don't know for certain if this is related to DNE2 or not, but this is not an error I've seen before. The file system and objects are still around, so I can provide further data if needed.
Any thoughts?
Attachments
Issue Links
- is related to
-
LU-5233 2.6 DNE stress testing: (lod_object.c:930:lod_declare_attr_set()) ASSERTION( lo->ldo_stripe ) failed
- Resolved