Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
Lustre 2.6.0
-
3
-
14529
Description
After our stress testing this weekend, we are unable to delete some (perhaps any?) of the files on a particular OST (OST 38). All of them give EINVAL.
For example:
[root@galaxy-esf-mds008 tmp]# rm -f posix_shm_open
rm: cannot remove `posix_shm_open': Invalid argument
[root@galaxy-esf-mds008 tmp]# lfs getstripe posix_shm_open
posix_shm_open
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 38
obdidx objid objid group
38 907263 0xdd7ff 0
However, OST 38 (OST0027) is showing up in lctl dl, and as far as I know, there are no issues with it. (The dk logs on the OSS don't show any issues.)
Here's the relevant part of the log from MDT000:
00000004:00020000:2.0:1402947131.685511:0:25039:0:(lod_lov.c:695:validate_lod_and_idx()) esfprod-MDT0000-mdtlov: bad idx: 38 of 64
00000004:00000001:2.0:1402947131.685513:0:25039:0:(lod_lov.c:757:lod_initialize_objects()) Process leaving via out (rc=18446744073709551594 : -22 : 0xffffffffffffffea)
00000004:00000010:2.0:1402947131.685515:0:25039:0:(lod_lov.c:782:lod_initialize_objects()) kfreed 'stripe': 8 at ffff8807fc208a00.
00000004:00000001:2.0:1402947131.685516:0:25039:0:(lod_lov.c:788:lod_initialize_objects()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
00000004:00000001:2.0:1402947131.685519:0:25039:0:(lod_lov.c:839:lod_parse_striping()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
00000004:00000001:2.0:1402947131.685520:0:25039:0:(lod_lov.c:885:lod_load_striping_locked()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
00000004:00000001:2.0:1402947131.685522:0:25039:0:(lod_object.c:2754:lod_declare_object_destroy()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
00000004:00000001:2.0:1402947131.685524:0:25039:0:(mdd_dir.c:1586:mdd_unlink()) Process leaving via stop (rc=18446744073709551594 : -22 : 0xffffffffffffffea)
I don't know for certain if this is related to DNE2 or not, but this is not an error I've seen before. The file system and objects are still around, so I can provide further data if needed.
Any thoughts?
Attachments
Issue Links
- is related to
-
LU-5233 2.6 DNE stress testing: (lod_object.c:930:lod_declare_attr_set()) ASSERTION( lo->ldo_stripe ) failed
-
- Resolved
-
Since we can not reproduce the problem locally, I can not figure out why the config log is "corrupted". If it happens again in DNE testing, please remember what's the step to reproduce it. We will probably have more ideas.