Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8967

directory entries for non existing files

Details

    • 3
    • 9223372036854775807

    Description

      We have several directories with entries for non existing files. For example:

      [root@quartz2311:~]# ls -l /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0                                                                                 
      ls: cannot access /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0/filler.003: No such file or directory
      total 3154
      -rw------- 1 casses1 casses1 1048576 Dec 21 16:43 filler.000
      -rw------- 1 casses1 casses1 1048576 Dec 21 16:43 filler.001
      -rw------- 1 casses1 casses1 1048576 Dec 21 16:43 filler.002
      -????????? ? ?       ?             ?            ? filler.003
      drwx------ 2 casses1 casses1   25600 Dec 21 16:43 ~dmtmp
      

      The directory itself is a remote directory on one MDT:

      [root@quartz2311:~]# lfs getdirstripe -d /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0
      lmv_stripe_count: 0 lmv_stripe_offset: 3
      

      We are able to get striping information for this file:

      [root@quartz2311:~]# lfs getstripe /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0/filler.003
      /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0/filler.003
      lmm_stripe_count:   1
      lmm_stripe_size:    1048576
      lmm_pattern:        1
      lmm_layout_gen:     0
      lmm_stripe_offset:  27
              obdidx           objid           objid           group
                  27        20538776      0x1396598      0xcc0000402
      

      It looks like the OSS serving that OST was rebooted and the OST went through recovery around the time the missing file was created. In particular, we note that the object number falls in the range of orphan objects that were deleted:

      [root@zinci:~]# grep 0xcc0000402 /var/log/conman/console.zinc*
      /var/log/conman/console.zinc43:2016-12-21 16:30:56 [189484.767900] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538706 to 0xcc0000402:20541649
      /var/log/conman/console.zinc43:2016-12-21 16:33:30 [189639.110247] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538766 to 0xcc0000402:20541649
      /var/log/conman/console.zinc43:2016-12-21 16:35:41 [189769.704490] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538766 to 0xcc0000402:20541649
      /var/log/conman/console.zinc43:2016-12-21 16:40:19 [190047.449320] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538766 to 0xcc0000402:20541649
      /var/log/conman/console.zinc43:2016-12-21 16:44:45 [190313.751155] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538820 to 0xcc0000402:20541649
      /var/log/conman/console.zinc44:2016-12-21 16:49:27 [  159.838420] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538820 to 0xcc0000402:20541649
      

      I will attach server console logs separately.

      Attachments

        Issue Links

          Activity

            [LU-8967] directory entries for non existing files
            nedbass Ned Bass (Inactive) made changes -
            Labels Original: llnl topllnl New: llnl
            bhoagland Brad Hoagland (Inactive) made changes -
            Link Original: This issue is related to JFC-21 [ JFC-21 ]
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            mdiep Minh Diep made changes -
            Link New: This issue is duplicated by LU-8562 [ LU-8562 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to JFC-21 [ JFC-21 ]
            pjones Peter Jones made changes -
            Assignee Original: WC Triage [ wc-triage ] New: Mikhail Pershin [ tappro ]
            nedbass Ned Bass (Inactive) made changes -
            Attachment New: LU-8967.console.zinc4.mds [ 24629 ]
            Attachment New: LU-8967.console.zinc43 [ 24630 ]
            Attachment New: LU-8967.console.zinc44 [ 24631 ]
            nedbass Ned Bass (Inactive) made changes -
            Description Original: We have several directories with entries for non existing files. For example:

            {noformat}
            [root@quartz2311:~]# ls -l /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0
            ls: cannot access /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0/filler.003: No such file or directory
            total 3154
            -rw------- 1 casses1 casses1 1048576 Dec 21 16:43 filler.000
            -rw------- 1 casses1 casses1 1048576 Dec 21 16:43 filler.001
            -rw------- 1 casses1 casses1 1048576 Dec 21 16:43 filler.002
            -????????? ? ? ? ? ? filler.003
            drwx------ 2 casses1 casses1 25600 Dec 21 16:43 ~dmtmp
            {noformat}

            The directory itself is a remote directory on one MDT:

            {noformat}
            [root@quartz2311:~]# lfs getdirstripe -d /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0
            lmv_stripe_count: 0 lmv_stripe_offset: 3
            {noformat}

            We are able to get striping information for this file:


            {noformat}
            [root@quartz2311:~]# lfs getstripe /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0/filler.003
            /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0/filler.003
            lmm_stripe_count: 1
            lmm_stripe_size: 1048576
            lmm_pattern: 1
            lmm_layout_gen: 0
            lmm_stripe_offset: 27
                    obdidx objid objid group
                        27 20538776 0x1396598 0xcc0000402
            {noformat}


            It looks like the OSS serving that OST was rebooted and the OST went through recovery around the time the missing file was created. In particular, we note that the object number falls in the range of orphan objects that were deleted:

            {noformat}
            [root@zinci:~]# grep 0xcc0000402 /var/log/conman/console.zinc*
            /var/log/conman/console.zinc43:2016-12-21 16:30:56 [189484.767900] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538706 to 0xcc0000402:20541649
            /var/log/conman/console.zinc43:2016-12-21 16:33:30 [189639.110247] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538766 to 0xcc0000402:20541649
            /var/log/conman/console.zinc43:2016-12-21 16:35:41 [189769.704490] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538766 to 0xcc0000402:20541649
            /var/log/conman/console.zinc43:2016-12-21 16:40:19 [190047.449320] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538766 to 0xcc0000402:20541649
            /var/log/conman/console.zinc43:2016-12-21 16:44:45 [190313.751155] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538820 to 0xcc0000402:20541649
            /var/log/conman/console.zinc44:2016-12-19 11:59:57 [ 405.845014] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20409728 to 0xcc0000402:20414161
            /var/log/conman/console.zinc44:2016-12-21 16:49:27 [ 159.838420] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538820 to 0xcc0000402:20541649
            {noformat}

            I will attach server console logs separately.
            New: We have several directories with entries for non existing files. For example:

            {noformat}
            [root@quartz2311:~]# ls -l /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0
            ls: cannot access /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0/filler.003: No such file or directory
            total 3154
            -rw------- 1 casses1 casses1 1048576 Dec 21 16:43 filler.000
            -rw------- 1 casses1 casses1 1048576 Dec 21 16:43 filler.001
            -rw------- 1 casses1 casses1 1048576 Dec 21 16:43 filler.002
            -????????? ? ? ? ? ? filler.003
            drwx------ 2 casses1 casses1 25600 Dec 21 16:43 ~dmtmp
            {noformat}

            The directory itself is a remote directory on one MDT:

            {noformat}
            [root@quartz2311:~]# lfs getdirstripe -d /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0
            lmv_stripe_count: 0 lmv_stripe_offset: 3
            {noformat}

            We are able to get striping information for this file:


            {noformat}
            [root@quartz2311:~]# lfs getstripe /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0/filler.003
            /p/lscratchh/casses1/quartz-zinc_3/19519/dbench/quartz2322/clients/client0/filler.003
            lmm_stripe_count: 1
            lmm_stripe_size: 1048576
            lmm_pattern: 1
            lmm_layout_gen: 0
            lmm_stripe_offset: 27
                    obdidx objid objid group
                        27 20538776 0x1396598 0xcc0000402
            {noformat}


            It looks like the OSS serving that OST was rebooted and the OST went through recovery around the time the missing file was created. In particular, we note that the object number falls in the range of orphan objects that were deleted:

            {noformat}
            [root@zinci:~]# grep 0xcc0000402 /var/log/conman/console.zinc*
            /var/log/conman/console.zinc43:2016-12-21 16:30:56 [189484.767900] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538706 to 0xcc0000402:20541649
            /var/log/conman/console.zinc43:2016-12-21 16:33:30 [189639.110247] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538766 to 0xcc0000402:20541649
            /var/log/conman/console.zinc43:2016-12-21 16:35:41 [189769.704490] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538766 to 0xcc0000402:20541649
            /var/log/conman/console.zinc43:2016-12-21 16:40:19 [190047.449320] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538766 to 0xcc0000402:20541649
            /var/log/conman/console.zinc43:2016-12-21 16:44:45 [190313.751155] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538820 to 0xcc0000402:20541649
            /var/log/conman/console.zinc44:2016-12-21 16:49:27 [ 159.838420] Lustre: lsh-OST001b: deleting orphan objects from 0xcc0000402:20538820 to 0xcc0000402:20541649
            {noformat}

            I will attach server console logs separately.
            nedbass Ned Bass (Inactive) created issue -

            People

              tappro Mikhail Pershin
              nedbass Ned Bass (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: