Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18764

Crash in conf-sanity 123H in interop with 2.16.1

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.17.0
    • Lustre 2.17.0, Lustre 2.16.1
    • None
    • 3
    • 9223372036854775807

    Description

      (first hit Feb 18) we get this hit regularly in interop. the patch that added this test was https://review.whamcloud.com/c/fs/lustre-release/+/56256 for LU-18170

      [27373.539546] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == conf-sanity test 123H: check concurent accesses with \'lctl llog_print ========================================================== 22:07:19 \(1740780439\)
      [27373.780325] Lustre: DEBUG MARKER: == conf-sanity test 123H: check concurent accesses with 'lctl llog_print ========================================================== 22:07:19 (1740780439)
      [27438.099364] Lustre: DEBUG MARKER: /usr/sbin/lctl llog_print -r params
      [27438.542994] Lustre: DEBUG MARKER: seq 1 20 |  xargs -P20 -I{} bash -c '/usr/sbin/lctl llog_print -r params | wc -l' |  sort | uniq -c
      [27438.824080] ------------[ cut here ]------------
      [27438.824088] WARNING: CPU: 0 PID: 969973 at lib/vsprintf.c:2726 vsnprintf+0x4bf/0x570
      [27438.825864] Modules linked in: xfs libcrc32c ofd(OE) ost(OE) loop osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey tls dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common i2c_piix4 virtio_balloon pcspkr joydev drm fuse ext4 mbcache jbd2 ata_generic crct10dif_pclmul crc32_pclmul crc32c_intel ata_piix libata virtio_net ghash_clmulni_intel virtio_blk net_failover failover serio_raw [last unloaded: libcfs]
      [27438.832027] CPU: 0 PID: 969973 Comm: llog_process_th Kdump: loaded Tainted: G           OE     -------  ---  5.14.0-427.31.1_lustre.el9.x86_64 #1
      [27438.833428] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [27438.834084] RIP: 0010:vsnprintf+0x4bf/0x570
      [27438.834603] Code: 44 24 10 8b 03 4d 8d 7c 0f 01 31 c9 83 f8 2f 0f 87 50 ff ff ff e9 08 fc ff ff 89 d0 83 c2 08 48 03 43 10 89 13 e9 98 fe ff ff <0f> 0b e9 cc fc ff ff 41 0f b6 04 24 4d 89 e6 e9 c2 fd ff ff 83 fa
      [27438.836542] RSP: 0000:ffffa1ccc2d33c70 EFLAGS: 00010292
      [27438.837140] RAX: 0000000000000000 RBX: ffff92c10181b824 RCX: ffffa1ccc2d33cc8
      [27438.837954] RDX: ffffffffc0b75150 RSI: fffffffffffffffd RDI: ffff92c10181b869
      [27438.838762] RBP: ffffa1ccc2d33d18 R08: 000000000000000a R09: ffff92c20181b861
      [27438.839566] R10: ffffffffffffffff R11: 000000000000000f R12: 0000000000000007
      [27438.840386] R13: ffff92c10181b869 R14: ffff92c1014736bf R15: 0000000000000510
      [27438.841176] FS:  0000000000000000(0000) GS:ffff92c17fc00000(0000) knlGS:0000000000000000
      [27438.842055] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [27438.842718] CR2: 0000560664894008 CR3: 0000000033b90001 CR4: 00000000000606f0
      [27438.843521] Call Trace:
      [27438.843847]  <TASK>
      [27438.844136]  ? show_trace_log_lvl+0x1c4/0x2df
      [27438.844708]  ? show_trace_log_lvl+0x1c4/0x2df
      [27438.845241]  ? snprintf+0x49/0x70
      [27438.845662]  ? vsnprintf+0x4bf/0x570
      [27438.846095]  ? __warn+0x81/0x110
      [27438.846558]  ? vsnprintf+0x4bf/0x570
      [27438.846992]  ? report_bug+0x10a/0x140
      [27438.847474]  ? handle_bug+0x3c/0x70
      [27438.847939]  ? exc_invalid_op+0x14/0x70
      [27438.848422]  ? asm_exc_invalid_op+0x16/0x20
      [27438.848945]  ? vsnprintf+0x4bf/0x570
      [27438.849413]  snprintf+0x49/0x70
      [27438.849814]  class_config_yaml_output+0x3b6/0x600 [obdclass]
      [27438.850874]  llog_print_cb+0x5e2/0x670 [obdclass]
      [27438.851516]  llog_process_thread+0xd97/0x1920 [obdclass]
      [27438.852185]  ? __pfx_llog_process_thread_daemonize+0x10/0x10 [obdclass]
      [27438.852977]  llog_process_thread_daemonize+0x99/0xe0 [obdclass]
      [27438.853715]  kthread+0xe0/0x100
      [27438.854127]  ? __pfx_kthread+0x10/0x10
      [27438.854601]  ret_from_fork+0x2c/0x50
      [27438.855082]  </TASK>
      [27438.855421] ---[ end trace 708d17dd91c9a88b ]---
      [27438.988112] LustreError: 970099:0:(llog_ioctl.c:296:llog_print_cb()) not enough space for print log records
      [27439.098846] LustreError: 970208:0:(llog.c:525:llog_process_thread()) ASSERTION( is_power_of_2(chunk_size) ) failed:
      [27439.098852] LustreError: 970208:0:(llog.c:525:llog_process_thread()) LBUG
      [27439.098855] CPU: 0 PID: 970208 Comm: llog_process_th Kdump: loaded Tainted: G        W  OE     -------  ---  5.14.0-427.31.1_lustre.el9.x86_64 #1
      [27439.098858] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [27439.098859] Call Trace:
      [27439.098883]  <TASK>
      [27439.098886]  dump_stack_lvl+0x34/0x48
      [27439.098914]  lbug_with_loc.cold+0x5/0x58 [libcfs]
      [27439.098990]  llog_process_thread+0xa9e/0x1920 [obdclass]
      [27439.099065]  ? __pfx_llog_process_thread_daemonize+0x10/0x10 [obdclass]
      [27439.099120]  ? mgs_key_init+0xa8/0x130 [mgs]
      [27439.099200]  ? __pfx_llog_process_thread_daemonize+0x10/0x10 [obdclass]
      [27439.099248]  llog_process_thread_daemonize+0x99/0xe0 [obdclass]
      [27439.099297]  kthread+0xe0/0x100
      [27439.099301]  ? __pfx_kthread+0x10/0x10
      [27439.099304]  ret_from_fork+0x2c/0x50
      [27439.099309]  </TASK>
      [27445.568983] Kernel panic - not syncing: LBUG
      [27445.568988] CPU: 0 PID: 970208 Comm: llog_process_th Kdump: loaded Tainted: G        W  OE     -------  ---  5.14.0-427.31.1_lustre.el9.x86_64 #1
      [27445.571030] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [27445.571676] Call Trace:
      [27445.571994]  <TASK>
      [27445.572295]  dump_stack_lvl+0x34/0x48
      [27445.572739]  panic+0x107/0x2f7
      [27445.573157]  lbug_with_loc.cold+0x2f/0x58 [libcfs]
      [27445.573725]  llog_process_thread+0xa9e/0x1920 [obdclass]
      [27445.574409]  ? __pfx_llog_process_thread_daemonize+0x10/0x10 [obdclass]
      [27445.575186]  ? mgs_key_init+0xa8/0x130 [mgs]
      [27445.575707]  ? __pfx_llog_process_thread_daemonize+0x10/0x10 [obdclass]
      [27445.576486]  llog_process_thread_daemonize+0x99/0xe0 [obdclass]
      [27445.577216]  kthread+0xe0/0x100
      [27445.577615]  ? __pfx_kthread+0x10/0x10
      [27445.578080]  ret_from_fork+0x2c/0x50
      [27445.578532]  </TASK>

      First hit: https://testing.whamcloud.com/test_sets/3f66b52d-b4d7-4d30-884e-ba82f8340698

      more hits:

      https://testing.whamcloud.com/test_sets/3a04df19-c183-4e94-85d8-c3f00d71aaf0

      https://testing.whamcloud.com/test_sets/db2be02b-8282-4de1-8d1f-a5e5c367695a

      https://testing.whamcloud.com/test_sets/2de30589-9fa9-4047-9763-a19616f19601

      https://testing.whamcloud.com/test_sets/c32f7d41-35cb-4df1-b64f-dd98d009a41a

       

      master githash atthe time of the first hit:

      Attachments

        Issue Links

          Activity

            People

              eaujames Etienne Aujames
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: