Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2808

osp_object_assign_fid()) ASSERTION( fid_is_zero(lu_object_fid(&o->opo_obj.do_lu)) ) failed:

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.4.0
    • 3
    • 6797

    Description

      Racy reproducer:

      # llmount.sh
      # cd /mnt/lustre
      # while true; do lfs setstripe -c 1 f0; rm f0; done &
      [1] 3814
      # while true; do truncate --size=1 f0; done
      cannot truncate `f0' to length 21705: No such file or directory
      cannot truncate `f0' to length 17399: No such file or directory
      cannot truncate `f0' to length 18024: No such file or directory
      cannot truncate `f0' to length 25593: No such file or directory
      cannot truncate `f0' to length 19126: No such file or directory
      cannot truncate `f0' to length 29680: No such file or directory
      cannot truncate `f0' to length 14928: No such file or directory
      cannot truncate `f0' to length 23877: No such file or directory
      cannot truncate `f0' to length 6911: No such file or directory
      cannot truncate `f0' to length 868: No such file or directory
      cannot truncate `f0' to length 791: No such file or directory
      cannot truncate `f0' to length 28593: No such file or directory
      cannot truncate `f0' to length 7330: No such file or directory
      cannot truncate `f0' to length 9708: No such file or directory
      
      Message from syslogd@m at Feb 13 10:29:39 ...
       kernel:LustreError: 3088:0:(osp_object.c:56:osp_object_assign_fid()) ASSERTION( fid_is_zero(lu_object_fid(&o->opo_obj.do_lu)) ) failed:
      
      Message from syslogd@m at Feb 13 10:29:39 ...
       kernel:LustreError: 3088:0:(osp_object.c:56:osp_object_assign_fid()) LBUG
      
      Message from syslogd@m at Feb 13 10:29:39 ...
       kernel:Kernel panic - not syncing: LBUG
      
      crash> bt
      PID: 31940  TASK: ffff8801663b5500  CPU: 0   COMMAND: "mdt00_001"
       #0 [ffff8801663b78e8] machine_kexec at ffffffff81031f7b
       #1 [ffff8801663b7948] crash_kexec at ffffffff810b8c22
       #2 [ffff8801663b7a18] panic at ffffffff814eae18
       #3 [ffff8801663b7a98] lbug_with_loc at ffffffffa0ef3eeb [libcfs]
       #4 [ffff8801663b7ab8] osp_object_assign_fid at ffffffffa0b98942 [osp]
       #5 [ffff8801663b7ae8] osp_declare_attr_set at ffffffffa0b98b11 [osp]
       #6 [ffff8801663b7b38] lod_declare_attr_set at ffffffffa0b68083 [lod]
       #7 [ffff8801663b7b88] mdd_attr_set at ffffffffa051f5e9 [mdd]
       #8 [ffff8801663b7c08] mdt_attr_set at ffffffffa0a4deb8 [mdt]
       #9 [ffff8801663b7c58] mdt_reint_setattr at ffffffffa0a4e7ad [mdt]
      #10 [ffff8801663b7cc8] mdt_reint_rec at ffffffffa0a486b1 [mdt]
      #11 [ffff8801663b7ce8] mdt_reint_internal at ffffffffa0a41d13 [mdt]
      #12 [ffff8801663b7d28] mdt_reint at ffffffffa0a42044 [mdt]
      #13 [ffff8801663b7d48] mdt_handle_common at ffffffffa0a32fb8 [mdt]
      #14 [ffff8801663b7d98] mds_regular_handle at ffffffffa0a6a5c5 [mdt]
      #15 [ffff8801663b7da8] ptlrpc_server_handle_request at ffffffffa062c00c [ptlrpc]
      #16 [ffff8801663b7ea8] ptlrpc_main at ffffffffa062d556 [ptlrpc]
      #17 [ffff8801663b7f48] kernel_thread at ffffffff8100c0ca
      

      A deterministic reproducer is attached. It does

      fd1 = open("f0", O_RDWR|O_CREAT|O_LOV_DELAY_CREATE, 0666);
      fd2 = open("f0", O_RDWR|O_CREAT, 0666);
      ftruncate(fd2, 1);
      

      This seems to be independent from LU-2523. I tried this with and without the patch from LU-2523 and got the same result.

      Note that in the call to ftruncate() if fd1 is used instead of fd2 or if a zero length is used then there is no LBUG. If truncate("f0", 1) is used then there is also no LBUG.

      Interestingly, if MOUNT_2=yes is used and the first open is of "/mnt/lustre/f0" and the second of "/mnt/lustre2/f0" then there is no LBUG. However while the setstripe ioctl() will appear to succeed, in fact the default striping will be applied to the file.

      Attachments

        Issue Links

          Activity

            [LU-2808] osp_object_assign_fid()) ASSERTION( fid_is_zero(lu_object_fid(&o->opo_obj.do_lu)) ) failed:
            jhammond John Hammond made changes -
            Status Original: Resolved [ 5 ] New: Closed [ 6 ]
            jhammond John Hammond made changes -
            Resolution New: Duplicate [ 3 ]
            Status Original: In Progress [ 3 ] New: Resolved [ 5 ]
            adilger Andreas Dilger made changes -
            Labels Original: MB osp New: osp

            Creating objects on truncate is what LU-2808 is all about. Closing this bug, and assigning that one to John.

            adilger Andreas Dilger added a comment - Creating objects on truncate is what LU-2808 is all about. Closing this bug, and assigning that one to John.
            adilger Andreas Dilger made changes -
            Link New: This issue is duplicated by LU-2399 [ LU-2399 ]

            Jinshan's #5291 has resolved the LBUG(). There remains the issue of creating objects on truncate discussed at http://review.whamcloud.com/5473.

            jhammond John Hammond added a comment - Jinshan's #5291 has resolved the LBUG(). There remains the issue of creating objects on truncate discussed at http://review.whamcloud.com/5473 .
            jhammond John Hammond made changes -
            Priority Original: Blocker [ 1 ] New: Minor [ 4 ]

            I did similar thing in patch 5291 to support sending size info to MDT in truncate RPC.

            jay Jinshan Xiong (Inactive) added a comment - I did similar thing in patch 5291 to support sending size info to MDT in truncate RPC.

            I tend to think that removing the assertion will just remove the only one symptom and hide all subsequent troubles we may get into.
            there are number of issues in this area and I hope they will be solved almost automatically once we start to take locks around transactions.
            as a temporary solution we could recognize the striping has been already created, return -EEXSIT and handle it in the caller ?

            bzzz Alex Zhuravlev added a comment - I tend to think that removing the assertion will just remove the only one symptom and hide all subsequent troubles we may get into. there are number of issues in this area and I hope they will be solved almost automatically once we start to take locks around transactions. as a temporary solution we could recognize the striping has been already created, return -EEXSIT and handle it in the caller ?
            jhammond John Hammond added a comment -

            Sure, I'll give it a shot. I had stopped work on this after Alex's comment, which I may have misunderstood. Is disabling the assertion the correct band-aid here?

            jhammond John Hammond added a comment - Sure, I'll give it a shot. I had stopped work on this after Alex's comment, which I may have misunderstood. Is disabling the assertion the correct band-aid here?

            People

              jhammond John Hammond
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: