Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15501

obd_set_max_rpcs_in_flight store is racy

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.12.8, Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      This is a frequent fixture in boilpot testing:

      [561965.306321] Lustre: DEBUG MARKER: == conf-sanity test 90c: check max_mod_rpcs_in_flight update limits ================================== 13:46:20 (1643395580)
      [561968.515428] Lustre: Lustre: Build Version: 2.12.8_17_ge72a30b
      [561969.230746] LNet: Added LNI 192.168.123.156@tcp [8/512/0/180]
      [561969.231606] LNet: Accept secure, port 988
      [561971.496242] Lustre: 23709:0:(gss_svc_upcall.c:1149:gss_init_svc_upcall()) Init channel is not opened by lsvcgssd, following request might be dropped until lsvcgssd is active
      [561971.497876] Key type lgssc registered
      [561973.046482] Lustre: Echo OBD driver; http://www.lustre.org/
      [561977.656992] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
      [561977.995492] Lustre: MGS: Connection restored to MGC192.168.123.156@tcp_0 (at 0@lo)
      [561978.674547] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 60-180
      [561980.637597] Lustre: DEBUG MARKER: centos-154.localnet: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
      [561981.944367] LDISKFS-fs (dm-1): file extents enabled, maximum tree depth=5
      [561981.948361] LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
      [561983.596774] Lustre: lustre-MDT0000: Connection restored to 192.168.123.156@tcp (at 0@lo)
      [561983.607091] Lustre: lustre-OST0000: Imperative Recovery not enabled, recovery window 60-180
      [561984.606708] Lustre: lustre-OST0000: Connection restored to 192.168.123.156@tcp (at 0@lo)
      [561984.607637] Lustre: Skipped 1 previous similar message
      [561984.613312] Lustre: lustre-OST0000: deleting orphan objects from 0x0:91 to 0x0:129
      [561985.851321] Lustre: DEBUG MARKER: centos-154.localnet: executing wait_import_state_mount FULL osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid
      [561986.256018] Lustre: Mounted lustre-client
      [561988.711619] Lustre: Unmounted lustre-client
      [561989.633975] Lustre: lustre-MDT0000: Connection restored to 192.168.123.156@tcp (at 0@lo)
      [561989.634963] Lustre: Skipped 2 previous similar messages
      [561989.751355] Lustre: Mounted lustre-client
      [561990.034221] LustreError: 25021:0:(genops.c:2230:obd_set_max_mod_rpcs_in_flight()) lustre-MDT0000-mdc-ffff8802936d4a88: can't set max_mod_rpcs_in_flight=9 higher than ocd_maxmodrpcs=8 returned by the server at connection
      [561990.131862] Lustre: Modifying parameter lustre-M-MDT0000.mdc.lustre-MDT0000-mdc-*.max_rpcs_in_flight in log params
      [561992.304651] Lustre: Unmounted lustre-client
      [561993.514193] Lustre: Mounted lustre-client
      [561994.530413] Lustre: Modifying parameter lustre-M-MDT0000.mdc.lustre-MDT0000-mdc-*.max_rpcs_in_flight in log params
      [561994.531511] Lustre: Skipped 1 previous similar message
      [561996.421067] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
      [561996.422013] IP: [<ffffffffa03cb915>] obd_set_max_mod_rpcs_in_flight+0x1a5/0x370 [obdclass]
      [561996.422013] PGD aadfb067 PUD aadfa067 PMD 0 
      [561996.422013] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [561996.422013] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod loop zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) jbd2 mbcache crc_t10dif crct10dif_generic crct10dif_common i2c_piix4 virtio_balloon pcspkr virtio_console ip_tables rpcsec_gss_krb5 drm_kms_helper ttm ata_generic drm pata_acpi drm_panel_orientation_quirks ata_piix floppy serio_raw virtio_blk libata i2c_core [last unloaded: libcfs]
      [561996.422013] CPU: 3 PID: 25154 Comm: lctl Kdump: loaded Tainted: P           OE  ------------   3.10.0-7.9-debug #1
      [561996.422013] Hardware name: Red Hat KVM, BIOS 1.13.0-2.module_el8.5.0+746+bbd5d70c 04/01/2014
      [561996.422013] task: ffff8802dc2a5c40 ti: ffff88008ccb8000 task.ti: ffff88008ccb8000
      [561996.422013] RIP: 0010:[<ffffffffa03cb915>]  [<ffffffffa03cb915>] obd_set_max_mod_rpcs_in_flight+0x1a5/0x370 [obdclass]
      [561996.422013] RSP: 0018:ffff88008ccbbe10  EFLAGS: 00010202
      [561996.422013] RAX: 0000000000000006 RBX: ffff8802b4cfc2b8 RCX: 0000000000000007
      [561996.422013] RDX: 0000000000000007 RSI: 0000000000000007 RDI: ffff8802b4cfc2b8
      [561996.422013] RBP: ffff88008ccbbe38 R08: 0000000000000007 R09: 0000000000000001
      [561996.422013] R10: 000000000000000a R11: f000000000000000 R12: 0000000000000007
      [561996.422013] R13: 0000000000000007 R14: 0000000000000000 R15: ffff88008ccbbf18
      [561996.422013] FS:  00007f3ed9911740(0000) GS:ffff880331ac0000(0000) knlGS:0000000000000000
      [561996.422013] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [561996.422013] CR2: 00000000000000b0 CR3: 00000000ab370000 CR4: 00000000000007e0
      [561996.422013] Call Trace:
      [561996.422013]  [<ffffffffa092bce9>] max_mod_rpcs_in_flight_store+0x69/0x80 [mdc]
      [561996.422013]  [<ffffffffa03d5a08>] lustre_attr_store+0x18/0x20 [obdclass]
      [561996.422013]  [<ffffffff812d6e0d>] sysfs_kf_write+0x3d/0x50
      [561996.422013]  [<ffffffff812d68f0>] kernfs_fop_write+0x100/0x180
      [561996.422013]  [<ffffffff81245a16>] vfs_write+0xd6/0x230
      [561996.422013]  [<ffffffff817edf55>] ? system_call_after_swapgs+0xa2/0x13a
      [561996.422013]  [<ffffffff8124685f>] SyS_write+0x7f/0xf0
      [561996.422013]  [<ffffffff817edf55>] ? system_call_after_swapgs+0xa2/0x13a
      [561996.422013]  [<ffffffff817ee00c>] system_call_fastpath+0x1f/0x24
      [561996.422013]  [<ffffffff817edf55>] ? system_call_after_swapgs+0xa2/0x13a
      [561996.422013] Code: 00 00 c7 05 e2 51 0a 00 40 00 00 00 48 83 c2 40 e8 a1 59 e9 ff e9 00 ff ff ff 0f 1f 40 00 f6 05 2d 87 eb ff 20 0f 84 96 fe ff ff <49> 8b 86 b0 00 00 00 45 0f b7 8e 58 03 00 00 0f b7 ce 4d 8b 86 
      [561996.422013] RIP  [<ffffffffa03cb915>] obd_set_max_mod_rpcs_in_flight+0x1a5/0x370 [obdclass] 

      in the debug log the last activity we see is osc being torn down, so I think it's a race where cli->import is set to NULL as we are trying to call into the max_mod_rpcs_in_flight_store, similar issues is in obd_set_max_rpcs_in_flight

      Attachments

        Issue Links

          Activity

            People

              green Oleg Drokin
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: