Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
None
-
Lustre 2.8.0
-
3
-
9223372036854775807
Description
We have several directories with entries for non existing files. For example:
[root@quartz2311:~]# ls -l /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0 ls: cannot access /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0/filler.003: No such file or directory total 3154 -rw------- 1 casses1 casses1 1048576 Dec 21 16:43 filler.000 -rw------- 1 casses1 casses1 1048576 Dec 21 16:43 filler.001 -rw------- 1 casses1 casses1 1048576 Dec 21 16:43 filler.002 -????????? ? ? ? ? ? filler.003 drwx------ 2 casses1 casses1 25600 Dec 21 16:43 ~dmtmp
The directory itself is a remote directory on one MDT:
[root@quartz2311:~]# lfs getdirstripe -d /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0 lmv_stripe_count: 0 lmv_stripe_offset: 3
We are able to get striping information for this file:
[root@quartz2311:~]# lfs getstripe /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0/filler.003 /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0/filler.003 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: 1 lmm_layout_gen: 0 lmm_stripe_offset: 27 obdidx objid objid group 27 20538776 0x1396598 0xcc0000402
It looks like the OSS serving that OST was rebooted and the OST went through recovery around the time the missing file was created. In particular, we note that the object number falls in the range of orphan objects that were deleted:
[root@zinci:~]# grep 0xcc0000402 /var/log/conman/console.zinc* /var/log/conman/console.zinc43:2016-12-21 16:30:56 [189484.767900] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538706 to 0xcc0000402:20541649 /var/log/conman/console.zinc43:2016-12-21 16:33:30 [189639.110247] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538766 to 0xcc0000402:20541649 /var/log/conman/console.zinc43:2016-12-21 16:35:41 [189769.704490] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538766 to 0xcc0000402:20541649 /var/log/conman/console.zinc43:2016-12-21 16:40:19 [190047.449320] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538766 to 0xcc0000402:20541649 /var/log/conman/console.zinc43:2016-12-21 16:44:45 [190313.751155] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538820 to 0xcc0000402:20541649 /var/log/conman/console.zinc44:2016-12-21 16:49:27 [ 159.838420] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538820 to 0xcc0000402:20541649
I will attach server console logs separately.
Attachments
Issue Links
- is duplicated by
-
LU-8562 osp_precreate_cleanup_orphans/osp_precreate_reserve race may cause data loss
- Resolved