Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17269

el9.3 crash conf-sanity test_41c Oops in class_setup()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      [ 3093.416284] Lustre: DEBUG MARKER: == conf-sanity test 41c: concurrent mounts of MDT/OST should all fail but one ========================================================== 19:54:14 (1699300454)
      ...
      [ 3149.141357] LustreError: 187855:0:(libcfs_fail.h:190:cfs_race()) cfs_race id 716 sleeping
      [ 3149.143276] LustreError: 187854:0:(libcfs_fail.h:201:cfs_race()) cfs_fail_race id 716 waking
      [ 3149.143494] LustreError: 187855:0:(libcfs_fail.h:199:cfs_race()) cfs_fail_race id 716 awake: rc=500
      [ 3149.143591] LustreError: 187855:0:(obd_config.c:696:class_setup()) Device 0 setup in progress (type osd-zfs)
      [ 3149.143660] LustreError: 187855:0:(obd_mount.c:213:lustre_start_simple()) lustre-MDT0000-osd setup error -17
      [ 3149.143731] LustreError: 187855:0:(tgt_mount.c:2183:server_fill_super()) Unable to start osd on lustre-mdt1/mdt1: -17
      [ 3149.143804] LustreError: 187855:0:(super25.c:188:lustre_fill_super()) llite: Unable to mount <unknown>: rc = -17
      [ 3149.143896] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
      [ 3149.144137] CPU: 0 PID: 187854 Comm: mount.lustre Tainted: G        W  O     --------- -  - 4.18.0 #2
      [ 3149.144266] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc36 04/01/2014
      [ 3149.144445] RIP: 0010:class_setup+0x610/0xad0 [obdclass]
      [ 3149.144519] Code: 05 61 f0 09 00 00 00 00 00 e8 2c 3a ea ff 31 d2 be 2f 02 00 00 48 c7 c7 10 3b 98 c0 e8 49 65 7a e2 e8 b4 ed c6 e2 48 8b 04 24 <48> 8b 40 28 48 83 f8 01 0f 84 8e 03 00 00 48 8b 04 24 48 8b 48 28
      [ 3149.144747] RSP: 0018:ffff9206ab77bae8 EFLAGS: 00010246
      [ 3149.144814] RAX: 6b6b6b6b6b6b6b6b RBX: ffff9206a6cf4600 RCX: 000000000002d000
      [ 3149.144912] RDX: 0000000000000000 RSI: 000000000000022f RDI: ffffffffc0983b10
      [ 3149.145018] RBP: ffff9206b42b0530 R08: ffffffffc07d5000 R09: ffffffffa3e0bbc0
      [ 3149.145117] R10: ffff9206ab77ba20 R11: ffff9206ad3457a3 R12: ffff9206b42b0110
      [ 3149.145233] R13: ffff9206b42b02b8 R14: ffff9206b42b0048 R15: 0000000000000000
      [ 3149.145334] FS:  00007f1f838808c0(0000) GS:ffff9206cfe00000(0000) knlGS:0000000000000000
      [ 3149.145434] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 3149.145528] CR2: 0000000000667000 CR3: 00000001908f7003 CR4: 0000000000370eb0
      [ 3149.145634] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 3149.145736] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 3149.145835] Call Trace:
      [ 3149.146062]  ? libcfs_debug_msg+0x9be/0xb00 [libcfs]
      [ 3149.146380]  ? xas_load+0x8/0x80
      [ 3149.146452]  ? xas_find+0x173/0x1b0
      [ 3149.146854]  ? xa_find+0xae/0xe0
      [ 3149.146911]  ? do_raw_spin_unlock+0x44/0xc0
      [ 3149.146973]  ? _raw_spin_unlock+0x1a/0x30
      [ 3149.147061]  class_process_config+0x14fa/0x2e60 [obdclass]
      [ 3149.147154]  ? do_lcfg+0x15a/0x4b0 [obdclass]
      [ 3149.147247]  do_lcfg+0x223/0x4b0 [obdclass]
      [ 3149.147322]  lustre_start_simple+0x72/0x1c0 [obdclass]
      [ 3149.147471]  osd_start+0x565/0x7b0 [ptlrpc]
      [ 3149.147536]  ? kstrtou16+0x1b/0x40
      [ 3149.147607]  ? target_name2index+0x106/0x140 [obdclass]
      [ 3149.147721]  server_fill_super+0x327/0x1100 [ptlrpc]
      [ 3149.147814]  ? obd_zombie_barrier+0x36/0x90 [obdclass]
      [ 3149.147889]  ? debug_mutex_init+0x31/0x40
      [ 3149.147978]  lustre_fill_super+0x390/0x480 [lustre]
      [ 3149.148066]  ? lustre_mount+0x10/0x10 [lustre]
      [ 3149.148141]  mount_nodev+0x41/0x90
      

      this problem was introduced in c5e5060d950 ("LU-8802 obd: remove MAX_OBD_DEVICES") IMO:

      	if (class_name2dev(new_obd->obd_name) == -1) {
      		class_incref(new_obd, "obd_device_list", new_obd);
      		rc = xa_alloc(&obd_devs, &dev_no, new_obd,
      			      xa_limit_31b, GFP_ATOMIC);
      

      two threads can try and create OBDs with a same name:

      00000020:00000080:0.0:1699293418.519360:0:185838:0:(genops.c:417:class_newdev()) Allocate new device lustre-OST0000-osd (00000000b8694366)
      00000020:00000080:1.0:1699293418.519360:0:185839:0:(genops.c:417:class_newdev()) Allocate new device lustre-OST0000-osd (00000000e7494c1a)
      

      Attachments

        Issue Links

          Activity

            People

              timday Tim Day
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: