Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • Upstream
    • Lustre 2.11.0
    • 9223372036854775807

    Description

      Now that libcfs has been sorted out with user land we can look to simplify the code and use kernel function that already exist. Good example are prng.c or the cfs_alloc_array_* functions. Also only place in the linux directory only code that is needed to support various distros.

      Attachments

        Issue Links

          Activity

            [LU-9859] libcfs simplification

            "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54178
            Subject: LU-9859 libcfs: move libcfs hash to obdclass
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 44901b543321d443a5c83a363fa28c5cc9eee2a1

            gerrit Gerrit Updater added a comment - "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54178 Subject: LU-9859 libcfs: move libcfs hash to obdclass Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 44901b543321d443a5c83a363fa28c5cc9eee2a1

            "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54177
            Subject: LU-9859 libcfs: move libcfs hash to obdclass
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2da0309e3481134f25bd72a57c5e15c1e808e8e9

            gerrit Gerrit Updater added a comment - "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54177 Subject: LU-9859 libcfs: move libcfs hash to obdclass Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2da0309e3481134f25bd72a57c5e15c1e808e8e9

            Yes I can reproduce it. Also patch https://review.whamcloud.com/c/fs/lustre-release/+/41664 resolves this

            simmonsja James A Simmons added a comment - Yes I can reproduce it. Also patch https://review.whamcloud.com/c/fs/lustre-release/+/41664 resolves this

            Thanks. Let me try to reproduce this problem.

            simmonsja James A Simmons added a comment - Thanks. Let me try to reproduce this problem.

            Did you change anything in your cfg/local.sh file to expose this bug?

            It seems that something got broken with libcfs module parameters.
            I have libcfs_debug defined in /etc/modprobe.d/lustre.conf:

            [root@vmcentos7-1 ~]# cat /etc/modprobe.d/lustre.conf
            options lnet networks=tcp0(ens33)
            options libcfs libcfs_debug=33965841
            

            Then with

            [root@vmcentos7-1 ~]# modprobe libcfs
            [root@vmcentos7-1 ~]# rmmod libcfs
            

            I get:

            [   50.390835] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc
            [   50.393177] IP: [<ffffffffc07933fa>] cfs_trace_lock_tcd+0xa/0x80 [libcfs]
            ...
            [   50.422078] CPU: 1 PID: 2517 Comm: rmmod Kdump: loaded Tainted: G           OE  ------------   3.10.0-1160.88.1.el7.x86_64 #4
            ...
            [   50.446143] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
            [   50.447885] Call Trace:
            [   50.448517]  [<ffffffffc0793837>] libcfs_debug_msg+0xd7/0xa30 [libcfs]
            [   50.449904]  [<ffffffff9d5621ed>] ? call_rcu_sched+0x1d/0x30
            [   50.450999]  [<ffffffff9d680af1>] ? mnt_get_count+0x51/0x70
            [   50.452449]  [<ffffffff9d688935>] ? simple_release_fs+0x45/0x50
            [   50.453652]  [<ffffffffc07a1692>] ? libcfs_exit+0x27/0x995 [libcfs]
            [   50.455271]  [<ffffffffc07a1706>] libcfs_exit+0x9b/0x995 [libcfs]
            [   50.456848]  [<ffffffff9d52329e>] SyS_delete_module+0x19e/0x320
            
            vsaveliev Vladimir Saveliev added a comment - Did you change anything in your cfg/local.sh file to expose this bug? It seems that something got broken with libcfs module parameters. I have libcfs_debug defined in /etc/modprobe.d/lustre.conf: [root@vmcentos7-1 ~]# cat /etc/modprobe.d/lustre.conf options lnet networks=tcp0(ens33) options libcfs libcfs_debug=33965841 Then with [root@vmcentos7-1 ~]# modprobe libcfs [root@vmcentos7-1 ~]# rmmod libcfs I get: [ 50.390835] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc [ 50.393177] IP: [<ffffffffc07933fa>] cfs_trace_lock_tcd+0xa/0x80 [libcfs] ... [ 50.422078] CPU: 1 PID: 2517 Comm: rmmod Kdump: loaded Tainted: G OE ------------ 3.10.0-1160.88.1.el7.x86_64 #4 ... [ 50.446143] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 50.447885] Call Trace: [ 50.448517] [<ffffffffc0793837>] libcfs_debug_msg+0xd7/0xa30 [libcfs] [ 50.449904] [<ffffffff9d5621ed>] ? call_rcu_sched+0x1d/0x30 [ 50.450999] [<ffffffff9d680af1>] ? mnt_get_count+0x51/0x70 [ 50.452449] [<ffffffff9d688935>] ? simple_release_fs+0x45/0x50 [ 50.453652] [<ffffffffc07a1692>] ? libcfs_exit+0x27/0x995 [libcfs] [ 50.455271] [<ffffffffc07a1706>] libcfs_exit+0x9b/0x995 [libcfs] [ 50.456848] [<ffffffff9d52329e>] SyS_delete_module+0x19e/0x320

            Did you change anything in your cfg/local.sh file to expose this bug?

            It appeared after update from "New tag 2.15.59" to "Revert "LU-17131 ldiskfs: el9.2 encdata and filename-encode".

            Somehow it eventually disappeared.
            However, this has been also spotted on automated testing system:

            [21171.870650] Lustre: Lustre: Build Version: 2.15.59_32_g1bb972b
            [21171.928601] LNet: Using FastReg for registration
            [21171.993601] LNet: Added LNI 192.168.102.20@o2ib [8/256/0/180]
            [21172.240786] Lustre: Echo OBD driver; http://www.lustre.org/
            [21186.068097] Lustre: DEBUG MARKER: fre0220: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
            [21186.993751] Lustre: DEBUG MARKER: fre0220: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid
            [21187.929835] Lustre: DEBUG MARKER: fre0220: executing wait_import_state_mount FULL mdc.lustre-MDT0002-mdc-*.mds_server_uuid
            [21188.850020] Lustre: DEBUG MARKER: fre0220: executing wait_import_state_mount FULL mdc.lustre-MDT0003-mdc-*.mds_server_uuid
            [21194.566164] Lustre: DEBUG MARKER: fre0220: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid
            [21247.718136] Lustre: DEBUG MARKER: fre0220: executing unload_modules_local
            [21248.017247] LNet: 277562:0:(lib-ptl.c:956:lnet_clear_lazy_portal()) Active lazy portal 0 on exit
            [21249.101958] LNet: Removed LNI 192.168.102.20@o2ib
            [21249.307129] Key type .llcrypt unregistered
            [21249.317131] Key type ._llcrypt unregistered
            [21301.274884] Key type ._llcrypt registered
            [21301.278217] Key type .llcrypt registered
            [30168.281419] BUG: unable to handle kernel NULL pointer dereference at 00000000000003cc
            [30168.286373] PGD 0 P4D 0 
            [30168.287238] Oops: 0000 [#1] SMP PTI
            [30168.288290] CPU: 7 PID: 388777 Comm: modprobe Kdump: loaded Tainted: G           OE    ---------r-  - 4.18.0-348.20.1.el8_5.x86_64 #1
            [30168.291221] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [30168.292748] RIP: 0010:cfs_trace_lock_tcd+0x6/0x70 [libcfs]
            [30168.294199] Code: e9 48 89 e7 49 c7 c0 c0 d8 a5 c0 be 00 00 04 00 e8 5f fd ff ff 48 c7 c7 f0 5d a6 c0 e8 13 3b 4a ee 0f 1f 00 66 66 66 66 90 53 <0f> b7 47 4c 66 83 f8 02 77 37 48 89 fb 74 16 66 83 f8 01 74 20 85
            [30168.298429] RSP: 0018:ffffb19a40ba3b60 EFLAGS: 00010202
            [30168.299802] RAX: 0000000000000007 RBX: 0000000000000380 RCX: 0000000000000000
            [30168.301508] RDX: 0000000080000000 RSI: 0000000000000000 RDI: 0000000000000380
            [30168.303229] RBP: ffffb19a40ba3c90 R08: ffffb19a40ba3ce8 R09: ffff9a6dfffd2000
            [30168.304929] R10: ffffb19a40ba3cb0 R11: fffff5fc84d55c08 R12: ffffffffc0b205c0
            [30168.306741] R13: 0000000000000000 R14: 0000000000000380 R15: 0000000000000000
            [30168.309129] FS:  00007f2de2fe5740(0000) GS:ffff9a6dfbdc0000(0000) knlGS:0000000000000000
            [30168.311474] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            [30168.313287] CR2: 00000000000003cc CR3: 0000000130abe000 CR4: 00000000000406e0
            [30168.315421] Call Trace:
            [30168.316564]  libcfs_debug_msg+0x100/0xb00 [libcfs]
            [30168.318195]  ? __wake_up_common_lock+0x89/0xc0
            [30168.319677]  ? netlink_broadcast_filtered+0x145/0x400
            [30168.321298]  ? 0xffffffffc0b4a000
            [30168.322538]  ? lnet_init+0x62/0x1000 [lnet]
            [30168.323959]  lnet_init+0x62/0x1000 [lnet]
            [30168.325348]  do_one_initcall+0x46/0x1d0
            [30168.326447]  ? do_init_module+0x22/0x220
            [30168.327550]  ? kmem_cache_alloc_trace+0x131/0x270
            [30168.328781]  do_init_module+0x5a/0x220
            [30168.329841]  load_module+0x14c5/0x17f0
            [30168.330886]  ? __do_sys_finit_module+0xa8/0x110
            [30168.332065]  __do_sys_finit_module+0xa8/0x110
            [30168.333216]  do_syscall_64+0x5b/0x1a0
            [30168.334263]  entry_SYSCALL_64_after_hwframe+0x65/0xca
            [30168.335533] RIP: 0033:0x7f2de1fb852d
            [30168.336540] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2b 79 2c 00 f7 d8 64 89 01 48
            [30168.340556] RSP: 002b:00007ffe6a6ca438 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
            [30168.342303] RAX: ffffffffffffffda RBX: 000056352b5e2420 RCX: 00007f2de1fb852d
            [30168.343956] RDX: 0000000000000000 RSI: 000056352b5df8f0 RDI: 0000000000000000
            [30168.345599] RBP: 000056352b5df8f0 R08: 0000000000000000 R09: 0000000000000050
            [30168.347250] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
            [30168.348885] R13: 000056352b5e23c0 R14: 0000000000040000 R15: 0000000000000000
            [30168.350529] Modules linked in: lnet(OE+) libcfs(OE) rpcrdma ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_rxe ip6_udp_tunnel udp_tunnel rdma_ucm rdma_cm iw_cm ib_cm ib_uverbs ib_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul cirrus drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops drm virtio_balloon pcspkr joydev i2c_piix4 sunrpc binfmt_misc ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_blk serio_raw e1000 [last unloaded: libcfs]
            [30168.362036] CR2: 00000000000003cc
            
            vsaveliev Vladimir Saveliev added a comment - Did you change anything in your cfg/local.sh file to expose this bug? It appeared after update from "New tag 2.15.59" to "Revert " LU-17131 ldiskfs: el9.2 encdata and filename-encode". Somehow it eventually disappeared. However, this has been also spotted on automated testing system: [21171.870650] Lustre: Lustre: Build Version: 2.15.59_32_g1bb972b [21171.928601] LNet: Using FastReg for registration [21171.993601] LNet: Added LNI 192.168.102.20@o2ib [8/256/0/180] [21172.240786] Lustre: Echo OBD driver; http://www.lustre.org/ [21186.068097] Lustre: DEBUG MARKER: fre0220: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid [21186.993751] Lustre: DEBUG MARKER: fre0220: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid [21187.929835] Lustre: DEBUG MARKER: fre0220: executing wait_import_state_mount FULL mdc.lustre-MDT0002-mdc-*.mds_server_uuid [21188.850020] Lustre: DEBUG MARKER: fre0220: executing wait_import_state_mount FULL mdc.lustre-MDT0003-mdc-*.mds_server_uuid [21194.566164] Lustre: DEBUG MARKER: fre0220: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid [21247.718136] Lustre: DEBUG MARKER: fre0220: executing unload_modules_local [21248.017247] LNet: 277562:0:(lib-ptl.c:956:lnet_clear_lazy_portal()) Active lazy portal 0 on exit [21249.101958] LNet: Removed LNI 192.168.102.20@o2ib [21249.307129] Key type .llcrypt unregistered [21249.317131] Key type ._llcrypt unregistered [21301.274884] Key type ._llcrypt registered [21301.278217] Key type .llcrypt registered [30168.281419] BUG: unable to handle kernel NULL pointer dereference at 00000000000003cc [30168.286373] PGD 0 P4D 0 [30168.287238] Oops: 0000 [#1] SMP PTI [30168.288290] CPU: 7 PID: 388777 Comm: modprobe Kdump: loaded Tainted: G OE ---------r- - 4.18.0-348.20.1.el8_5.x86_64 #1 [30168.291221] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [30168.292748] RIP: 0010:cfs_trace_lock_tcd+0x6/0x70 [libcfs] [30168.294199] Code: e9 48 89 e7 49 c7 c0 c0 d8 a5 c0 be 00 00 04 00 e8 5f fd ff ff 48 c7 c7 f0 5d a6 c0 e8 13 3b 4a ee 0f 1f 00 66 66 66 66 90 53 <0f> b7 47 4c 66 83 f8 02 77 37 48 89 fb 74 16 66 83 f8 01 74 20 85 [30168.298429] RSP: 0018:ffffb19a40ba3b60 EFLAGS: 00010202 [30168.299802] RAX: 0000000000000007 RBX: 0000000000000380 RCX: 0000000000000000 [30168.301508] RDX: 0000000080000000 RSI: 0000000000000000 RDI: 0000000000000380 [30168.303229] RBP: ffffb19a40ba3c90 R08: ffffb19a40ba3ce8 R09: ffff9a6dfffd2000 [30168.304929] R10: ffffb19a40ba3cb0 R11: fffff5fc84d55c08 R12: ffffffffc0b205c0 [30168.306741] R13: 0000000000000000 R14: 0000000000000380 R15: 0000000000000000 [30168.309129] FS: 00007f2de2fe5740(0000) GS:ffff9a6dfbdc0000(0000) knlGS:0000000000000000 [30168.311474] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [30168.313287] CR2: 00000000000003cc CR3: 0000000130abe000 CR4: 00000000000406e0 [30168.315421] Call Trace: [30168.316564] libcfs_debug_msg+0x100/0xb00 [libcfs] [30168.318195] ? __wake_up_common_lock+0x89/0xc0 [30168.319677] ? netlink_broadcast_filtered+0x145/0x400 [30168.321298] ? 0xffffffffc0b4a000 [30168.322538] ? lnet_init+0x62/0x1000 [lnet] [30168.323959] lnet_init+0x62/0x1000 [lnet] [30168.325348] do_one_initcall+0x46/0x1d0 [30168.326447] ? do_init_module+0x22/0x220 [30168.327550] ? kmem_cache_alloc_trace+0x131/0x270 [30168.328781] do_init_module+0x5a/0x220 [30168.329841] load_module+0x14c5/0x17f0 [30168.330886] ? __do_sys_finit_module+0xa8/0x110 [30168.332065] __do_sys_finit_module+0xa8/0x110 [30168.333216] do_syscall_64+0x5b/0x1a0 [30168.334263] entry_SYSCALL_64_after_hwframe+0x65/0xca [30168.335533] RIP: 0033:0x7f2de1fb852d [30168.336540] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2b 79 2c 00 f7 d8 64 89 01 48 [30168.340556] RSP: 002b:00007ffe6a6ca438 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [30168.342303] RAX: ffffffffffffffda RBX: 000056352b5e2420 RCX: 00007f2de1fb852d [30168.343956] RDX: 0000000000000000 RSI: 000056352b5df8f0 RDI: 0000000000000000 [30168.345599] RBP: 000056352b5df8f0 R08: 0000000000000000 R09: 0000000000000050 [30168.347250] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [30168.348885] R13: 000056352b5e23c0 R14: 0000000000040000 R15: 0000000000000000 [30168.350529] Modules linked in: lnet(OE+) libcfs(OE) rpcrdma ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_rxe ip6_udp_tunnel udp_tunnel rdma_ucm rdma_cm iw_cm ib_cm ib_uverbs ib_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul cirrus drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops drm virtio_balloon pcspkr joydev i2c_piix4 sunrpc binfmt_misc ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_blk serio_raw e1000 [last unloaded: libcfs] [30168.362036] CR2: 00000000000003cc
            simmonsja James A Simmons added a comment - - edited

            I also test on a single node for Ubuntu and didn't see this. Did you change anything in your cfg/local.sh file to expose this bug? Sounds like someone is trying to print a debug message before libcfs debugging is setup.

            simmonsja James A Simmons added a comment - - edited I also test on a single node for Ubuntu and didn't see this. Did you change anything in your cfg/local.sh file to expose this bug? Sounds like someone is trying to print a debug message before libcfs debugging is setup.

            Subject: LU-9859 libcfs: refactor libcfs initialization.

            With this patch bash /usr/lib64/lustre/tests/llmount.sh (single node setup, vmware virtual machine, CentOS Stream release 8) crashes:

            [   41.240485] libcfs: loading out-of-tree module taints kernel.
            [   41.247560] Key type ._llcrypt registered
            [   41.249128] Key type .llcrypt registered
            [   41.506114] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc
            [   41.508996] PGD 8000000004bc1067 P4D 8000000004bc1067 PUD 1988d067 PMD 0
            [   41.511787] Oops: 0000 [#1] SMP PTI
            [   41.513298] CPU: 1 PID: 3744 Comm: lctl Tainted: G           O     --------- -  - 4.18.0-477.15.1.el8.x86_64 #2
            [   41.516934] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
            [   41.521529] RIP: 0010:cfs_trace_lock_tcd+0x6/0x80 [libcfs]
            [   41.523332] Code: e9 48 89 e7 49 c7 c0 c0 86 48 c0 be 00 00 04 00 e8 3f fd ff ff 48 c7 c7 a0 0b 49 c0 e8 7b e1 67 d9 0f 1f 00 0f 1f 44 00 00 53 <0f> b7 47 4c 66 83 f8 02 77 43 48 89 fb 74 1a 66 83 f8 01 74 28 85
            [   41.529423] RSP: 0018:ffffa416c1157c60 EFLAGS: 00010202
            [   41.530802] RAX: 0000000000000001 RBX: 0000000000000080 RCX: 0000000000000000
            [   41.533363] RDX: 0000000080000000 RSI: 0000000000000000 RDI: 0000000000000080
            [   41.536707] RBP: ffffa416c1157d90 R08: 00000000ffffffff R09: 00000000130e0580
            [   41.540222] R10: ffffa416c1157da8 R11: 0000000000000001 R12: ffffffffc0498b00
            [   41.543573] R13: 0000000000000000 R14: 0000000000000080 R15: 0000000000000000
            [   41.546175] FS:  00007f28a90eb740(0000) GS:ffff93e76b040000(0000) knlGS:0000000000000000
            [   41.549481] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            [   41.551648] CR2: 00000000000000cc CR3: 00000000235cc002 CR4: 00000000003706e0
            [   41.555378] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
            [   41.558481] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
            [   41.561120] Call Trace:
            [   41.562149]  libcfs_debug_msg+0x100/0xb10 [libcfs]
            [   41.564215]  ? terminate_walk+0xd0/0xf0
            [   41.565779]  ? filename_lookup.part.64+0xe0/0x170
            [   41.568254]  ? libcfs_debug_subsys2str+0x30/0x30 [libcfs]
            [   41.571316]  ? libcfs_debug_subsys2str+0x30/0x30 [libcfs]
            [   41.573735]  ? cfs_str2mask+0x332/0x340 [libcfs]
            [   41.575703]  cfs_str2mask+0x332/0x340 [libcfs]
            [   41.577523]  ? libcfs_debug_subsys2str+0x30/0x30 [libcfs]
            [   41.580080]  libcfs_debug_str2mask+0xe8/0x190 [libcfs]
            [   41.582512]  proc_dobitmasks+0x108/0x150 [libcfs]
            [   41.584494]  lnet_debugfs_write+0x3f/0x70 [libcfs]
            [   41.586557]  full_proxy_write+0x53/0x80
            [   41.588260]  vfs_write+0xa5/0x1b0
            [   41.589664]  ksys_write+0x4f/0xb0
            [   41.591081]  do_syscall_64+0x5b/0x1b0
            

            It looks like tcd is NULL in:

            static inline struct cfs_trace_cpu_data *
            cfs_trace_get_tcd(void)
            {
                    struct cfs_trace_cpu_data *tcd =
                            &(*cfs_trace_data[cfs_trace_buf_idx_get()])[get_cpu()].tcd;
            
                    cfs_trace_lock_tcd(tcd, 0);
            
                    return tcd;
            }
            

            All is ok if this patch is reverted.

            vsaveliev Vladimir Saveliev added a comment - Subject: LU-9859 libcfs: refactor libcfs initialization. With this patch bash /usr/lib64/lustre/tests/llmount.sh (single node setup, vmware virtual machine, CentOS Stream release 8) crashes: [ 41.240485] libcfs: loading out-of-tree module taints kernel. [ 41.247560] Key type ._llcrypt registered [ 41.249128] Key type .llcrypt registered [ 41.506114] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc [ 41.508996] PGD 8000000004bc1067 P4D 8000000004bc1067 PUD 1988d067 PMD 0 [ 41.511787] Oops: 0000 [#1] SMP PTI [ 41.513298] CPU: 1 PID: 3744 Comm: lctl Tainted: G O --------- - - 4.18.0-477.15.1.el8.x86_64 #2 [ 41.516934] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 [ 41.521529] RIP: 0010:cfs_trace_lock_tcd+0x6/0x80 [libcfs] [ 41.523332] Code: e9 48 89 e7 49 c7 c0 c0 86 48 c0 be 00 00 04 00 e8 3f fd ff ff 48 c7 c7 a0 0b 49 c0 e8 7b e1 67 d9 0f 1f 00 0f 1f 44 00 00 53 <0f> b7 47 4c 66 83 f8 02 77 43 48 89 fb 74 1a 66 83 f8 01 74 28 85 [ 41.529423] RSP: 0018:ffffa416c1157c60 EFLAGS: 00010202 [ 41.530802] RAX: 0000000000000001 RBX: 0000000000000080 RCX: 0000000000000000 [ 41.533363] RDX: 0000000080000000 RSI: 0000000000000000 RDI: 0000000000000080 [ 41.536707] RBP: ffffa416c1157d90 R08: 00000000ffffffff R09: 00000000130e0580 [ 41.540222] R10: ffffa416c1157da8 R11: 0000000000000001 R12: ffffffffc0498b00 [ 41.543573] R13: 0000000000000000 R14: 0000000000000080 R15: 0000000000000000 [ 41.546175] FS: 00007f28a90eb740(0000) GS:ffff93e76b040000(0000) knlGS:0000000000000000 [ 41.549481] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 41.551648] CR2: 00000000000000cc CR3: 00000000235cc002 CR4: 00000000003706e0 [ 41.555378] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 41.558481] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 41.561120] Call Trace: [ 41.562149] libcfs_debug_msg+0x100/0xb10 [libcfs] [ 41.564215] ? terminate_walk+0xd0/0xf0 [ 41.565779] ? filename_lookup.part.64+0xe0/0x170 [ 41.568254] ? libcfs_debug_subsys2str+0x30/0x30 [libcfs] [ 41.571316] ? libcfs_debug_subsys2str+0x30/0x30 [libcfs] [ 41.573735] ? cfs_str2mask+0x332/0x340 [libcfs] [ 41.575703] cfs_str2mask+0x332/0x340 [libcfs] [ 41.577523] ? libcfs_debug_subsys2str+0x30/0x30 [libcfs] [ 41.580080] libcfs_debug_str2mask+0xe8/0x190 [libcfs] [ 41.582512] proc_dobitmasks+0x108/0x150 [libcfs] [ 41.584494] lnet_debugfs_write+0x3f/0x70 [libcfs] [ 41.586557] full_proxy_write+0x53/0x80 [ 41.588260] vfs_write+0xa5/0x1b0 [ 41.589664] ksys_write+0x4f/0xb0 [ 41.591081] do_syscall_64+0x5b/0x1b0 It looks like tcd is NULL in: static inline struct cfs_trace_cpu_data * cfs_trace_get_tcd(void) { struct cfs_trace_cpu_data *tcd = &(*cfs_trace_data[cfs_trace_buf_idx_get()])[get_cpu()].tcd; cfs_trace_lock_tcd(tcd, 0); return tcd; } All is ok if this patch is reverted.

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52700/
            Subject: LU-9859 libcfs: refactor libcfs initialization.
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: f3494a6e9ffeb82bf1b34e557b4cfda1eaf8ef9d

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52700/ Subject: LU-9859 libcfs: refactor libcfs initialization. Project: fs/lustre-release Branch: master Current Patch Set: Commit: f3494a6e9ffeb82bf1b34e557b4cfda1eaf8ef9d

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52701/
            Subject: LU-9859 libcfs: migrate libcfs_mem.c to lnet/lib-mem.c
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 24d515367f44de6b92b453cc9a1c8384e52b5e3f

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52701/ Subject: LU-9859 libcfs: migrate libcfs_mem.c to lnet/lib-mem.c Project: fs/lustre-release Branch: master Current Patch Set: Commit: 24d515367f44de6b92b453cc9a1c8384e52b5e3f

            "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52923
            Subject: LU-9859 lnet: move CPT handling to LNet
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 012bbbb81cf9e8c77b2e2eeed611c59d1c4aa919

            gerrit Gerrit Updater added a comment - "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52923 Subject: LU-9859 lnet: move CPT handling to LNet Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 012bbbb81cf9e8c77b2e2eeed611c59d1c4aa919

            People

              simmonsja James A Simmons
              simmonsja James A Simmons
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated: