Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13980

Kernel panic on OST after removing files under '/O' folder

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Open
    • Priority: Trivial
    • Resolution: Unresolved
    • Affects Version/s: Lustre 2.10.8, Lustre 2.12.4
    • Fix Version/s: None
    • Labels:
      None
    • Environment:
      CentOS Linux release 7.7.1908 (Core) with kernel 3.10.0-957.1.3.el7_lustre.x86_64 for Lustre 2.10.8 and CentOS Linux release 7.7.1908 (Core) with 3.10.0-1062.9.1.el7_lustre.x86_64 for Lustre 2.12.4.
    • Rank (Obsolete):
      9223372036854775807

      Description

      I removed some data stripes under '/O' folder on OST and started LFSCK. Then OST was forced to reboot because of kernel panic. By looking into vmcore, I found the specific error line is:

      [ 1057.367833] Lustre: lustre-OST0000: new disk, initializing
      [ 1057.367877] Lustre: srv-lustre-OST0000: No data found on store. Initialize space
      [ 1057.417121] Lustre: lustre-OST0000: Imperative Recovery not enabled, recovery window 300-900
      [ 1062.018722] Lustre: lustre-OST0000: Connection restored to lustre-MDT0000-mdtlov_UUID (at 10.0.0.122@tcp)
      [ 1089.010284] Lustre: lustre-OST0000: Connection restored to 89c68bff-12c8-9f48-f01e-f6306c666eb9 (at 10.0.0.98@tcp)
      [ 1281.516928] LustreError: 10410:0:(osd_handler.c:1982:osd_object_release()) LBUG
      [ 1281.516939] Pid: 10410, comm: ll_ost_out00_00 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Mon May 27 03:45:37 UTC 2019
      [ 1281.516944] Call Trace:
      [ 1281.516960]  [<ffffffffc05fd7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [ 1281.516986]  [<ffffffffc05fd87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [ 1281.517004]  [<ffffffffc0b93820>] osd_get_ldiskfs_dirent_param+0x0/0x130 [osd_ldiskfs]
      [ 1281.517173]  [<ffffffffc07442b0>] lu_object_put+0x190/0x3e0 [obdclass]
      [ 1281.517244]  [<ffffffffc09d8bc3>] out_handle+0x1503/0x1bc0 [ptlrpc]
      [ 1281.517369]  [<ffffffffc09ce7ca>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      [ 1281.517481]  [<ffffffffc097705b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      [ 1281.517582]  [<ffffffffc097a7a2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      

      (The full dmesg log collected in vmcore is in the attachment)

      Then I found after removing files under '/O' on OST, even a simple write operation can result in the same kernel panic.

      I'm just curious about why 'osd_object_release' is evoked in above situations and the position of LFSCK functions in the error call trace.

      Thanks a lot! 

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              adilger Andreas Dilger
              Reporter:
              rzhan Runzhou Han
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated: