Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13189

ASSERTION( obj->oo_with_projid ) failed with 2.12.3

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0, Lustre 2.15.1
    • Lustre 2.12.3, Lustre 2.14.0, Lustre 2.15.0
    • None
    • rhel 7.7 zfs-0.8.2 kernel 3.10.0-1062.9.1.el7.x86_64
    • 3
    • 9223372036854775807

    Description

      Seeing a crash fairly frequently on one of our oss

      [Feb 1 19:45] Lustre: work2-OST0002: Recovery over after 0:56, of 20 clients 20 recovered and 0 were evicted.
      [ +0.000279] Lustre: work2-OST0002: deleting orphan objects from 0x0:268076412 to 0x0:268081537
      [ +0.198454] LustreError: 14123:0:(osd_object.c:1345:osd_attr_set()) ASSERTION( obj->oo_with_projid ) failed:
      [ +0.000046] LustreError: 14123:0:(osd_object.c:1345:osd_attr_set()) LBUG
      [ +0.000064] Pid: 14123, comm: ll_ost_io01_013 3.10.0-1062.9.1.el7.x86_64 #1 SMP Mon Dec 2 08:31:54 EST 2019
      [ +0.000035] Call Trace:
      [ +0.000018] [<ffffffffc10e87cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [ +0.001388] [<ffffffffc10e887c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [ +0.001275] [<ffffffffc179b458>] osd_attr_set+0xdd8/0xe50 [osd_zfs]

      Message from syslogd@rit-ost1.las.iastate.edu at Feb 1 19:45:04 ...
      kernel:LustreError: 14123:0:(osd_object.c:1345:osd_attr_set()) ASSERTION( obj->oo_with_projid ) failed:
      [ +0.001272] [<ffffffffc190e622>] ofd_commitrw_write+0x13c2/0x1d40 [ofd]
      [ +0.001274] [<ffffffffc191212c>] ofd_commitrw+0x48c/0x9e0 [ofd]
      [ +0.001255] [<ffffffffc15ad0fa>] tgt_brw_write+0x10ba/0x1ce0 [ptlrpc]
      [ +0.001586] [<ffffffffc15ab2ea>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
      [ +0.001574] [<ffffffffc155029b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [ +0.001555] [<ffffffffc1553bfc>] ptlrpc_main+0xb2c/0x1460 [ptlrpc]
      [ +0.001560] [<ffffffffb28c61f1>] kthread+0xd1/0xe0
      [ +0.001500] [<ffffffffb2f8dd1d>] ret_from_fork_nospec_begin+0x7/0x21
      [ +0.001487] [<ffffffffffffffff>] 0xffffffffffffffff
      [ +0.001498] Kernel panic - not syncing: LBUG

       

      Not sure what exactly is causing it. Stack trace is from after the server reboots, as soon as recovery finishes and io starts again it happens. Originally I thought it was related to the recovery process and that aborting recovery would work around it, but it still occurs. I'm not really sure if it's a particular file or an io pattern that's leading to it, and I've not been able to narrow it down to a specific job in our environment.

       

       

      Attachments

        Issue Links

          Activity

            People

              dongyang Dongyang Li
              snehring Shane Nehring
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: