Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17012

conf-sanity test_113: crashed during conf-sanity test_113

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

      This issue relates to the following test suite run:
      https://testing.whamcloud.com/test_sets/7a49f99b-6593-423d-97ce-6bb8790f459d

      test_113 failed with the following error:

      [13531.690472] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
      [13531.692557] Oops: 0000 [#1] SMP PTI
      [13531.693244] CPU: 1 PID: 1175920 Comm: llog_process_th 4.18.0-425.10.1.el8_lustre.x86_64 #1
      [13531.695593] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [13531.696668] RIP: 0010:ls_device_get+0x1e3/0x3b0 [obdclass]
      [13531.712774] Call Trace:
      [13531.713318]  local_oid_storage_init+0xb8/0x16c0 [obdclass]
      [13531.714401]  llog_osd_setup+0x9d/0x400 [obdclass]
      [13531.715359]  llog_setup.part.6+0x146/0x840 [obdclass]
      [13531.716373]  osp_sync_llog_init+0x1e0/0xb10 [osp]
      [13531.718232]  osp_sync_init+0x262/0x770 [osp]
      [13531.719065]  osp_init0.isra.19+0x1689/0x19f0 [osp]
      [13531.720005]  osp_device_alloc+0xcb/0x180 [osp]
      [13531.720878]  obd_setup+0x119/0x300 [obdclass]
      [13531.721771]  class_setup+0x587/0x7a0 [obdclass]
      [13531.722687]  class_process_config+0x1248/0x2160 [obdclass]
      [13531.723771]  class_config_llog_handler+0x93b/0x12e0 [obdclass]
      [13531.724916]  llog_process_thread+0xedf/0x1b60 [obdclass]
      [13531.728786]  llog_process_thread_daemonize+0x9b/0xe0 [obdclass]
      [13531.729936]  kthread+0x10b/0x130
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-master/4447 - 5.15.0-52-generic
      servers: https://build.whamcloud.com/job/lustre-master/4447 - 4.18.0-425.10.1.el8_lustre.x86_64

      There have been 13 failures since 2023-07-08, but none in the two months before then.

      $ git log --oneline --after 2023-07-06 --before 2023-07-09
      51d62f2122fe LU-16637 llite: call truncate_inode_pages() in inode lock
      0cb7ebf22304 LU-16927 tests: improve sanity-quota
      629d6bca95f9 LU-8191 tests: convert functions to static
      97df1cba957b LU-16925 osd-ldiskfs: Remove unused bio_integrity_enabled
      1defc11dfa59 LU-16922 kernel: update RHEL 9.2 [5.14.0-284.18.1.el9_2]
      a1d332f613ac LU-8191 mdt: convert functions to static
      094ae18ed8a9 LU-16548 lnet: Fixing missing gnilnd define CURRENT_LND_VERSION
      0d77e94b4793 LU-16723 parser: fix help hanging
      46a9abf4330e LU-16890 obd: OBD_FREE_PRE() to ignore NULL pointers
      9190af53287b LU-16899 gnilnd: Use libcfs_nidstr and fix typo
      35017d0973bb LU-16898 osd-ldiskfs: do not return dr_error from past RPC
      4fc3c208422e LU-16518 obd: fix style and clang error
      acdc2c8bb7aa LU-16796 libcfs: Remove reference to LASSERT_ATOMIC_GT
      c1915c5f0dd8 LU-16846 nrs: Fix console messages
      4ce452292fbe LU-16842 fsx: tolerate delete last non-stale mirror error
      7ea4e0c7c534 LU-12019 build: Recognize Debian Kernel and set KMP dir
      c2f548dacc5f LU-16805 llite: improve readpage debug
      f5a75ea44db3 LU-16697 llite: Set BDI_CAP_* flags for lustre
      b16c9333a008 LU-16691 ldiskfs: limit length of per-inode prealloc list
      bba59b1287c9 LU-16651 llite: hold invalidate_lock when invalidate cache pages
      3ef773db80fc LU-16594 build: get_random_u32_below, get_acl with dentry
      e7cf1fc1f274 LU-13340 lustre: Support large nids in LCFG_ADD_UUID
      7f1aa5b66b24 LU-16518 build: llvm/clang support
      e3e91ea95fd9 LU-13343 gss: no sec flavor on loopback connection
      530a302e10fc LU-12511 build: include firewalld files for native Linux client
      aac625055e50 LU-12511 llite: use mapping_set_error instead of opencoded set_bit
      84bb366642e8 LU-16847 ldiskfs: refactor t10 code.
      9e5040a304a9 LU-16847 ldiskfs: do not copy ldiskfs_chunk_trans_blocks
      

      None of these patches look particularly related to the crash. However, it looks like there is always an earlier failure due to LU-16954 from the BDI_CAP patch.

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      conf-sanity test_113 - onyx-65vm4 crashed during conf-sanity test_113

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: