Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • Upstream
    • Lustre 2.11.0
    • 9223372036854775807

    Description

      Now that libcfs has been sorted out with user land we can look to simplify the code and use kernel function that already exist. Good example are prng.c or the cfs_alloc_array_* functions. Also only place in the linux directory only code that is needed to support various distros.

      Attachments

        Issue Links

          Activity

            [LU-9859] libcfs simplification

            Yes but it is mostly done.

            simmonsja James A Simmons added a comment - Yes but it is mostly done.
            pjones Peter Jones added a comment -

            James

            Is there still more to come here?

            Peter

            pjones Peter Jones added a comment - James Is there still more to come here? Peter

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50845/
            Subject: LU-9859 lnet: move expr parsing from libcfs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 84d3585890c4758178cedbef0e4f032bf0eeaa03

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50845/ Subject: LU-9859 lnet: move expr parsing from libcfs Project: fs/lustre-release Branch: master Current Patch Set: Commit: 84d3585890c4758178cedbef0e4f032bf0eeaa03

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52923/
            Subject: LU-9859 lnet: move CPT handling to LNet
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 7f8cde3b77ada95e8b96dee1996f8d40bd17a538

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52923/ Subject: LU-9859 lnet: move CPT handling to LNet Project: fs/lustre-release Branch: master Current Patch Set: Commit: 7f8cde3b77ada95e8b96dee1996f8d40bd17a538

            "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54178
            Subject: LU-9859 libcfs: move libcfs hash to obdclass
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 44901b543321d443a5c83a363fa28c5cc9eee2a1

            gerrit Gerrit Updater added a comment - "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54178 Subject: LU-9859 libcfs: move libcfs hash to obdclass Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 44901b543321d443a5c83a363fa28c5cc9eee2a1

            "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54177
            Subject: LU-9859 libcfs: move libcfs hash to obdclass
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2da0309e3481134f25bd72a57c5e15c1e808e8e9

            gerrit Gerrit Updater added a comment - "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54177 Subject: LU-9859 libcfs: move libcfs hash to obdclass Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2da0309e3481134f25bd72a57c5e15c1e808e8e9

            Yes I can reproduce it. Also patch https://review.whamcloud.com/c/fs/lustre-release/+/41664 resolves this

            simmonsja James A Simmons added a comment - Yes I can reproduce it. Also patch https://review.whamcloud.com/c/fs/lustre-release/+/41664 resolves this

            Thanks. Let me try to reproduce this problem.

            simmonsja James A Simmons added a comment - Thanks. Let me try to reproduce this problem.

            Did you change anything in your cfg/local.sh file to expose this bug?

            It seems that something got broken with libcfs module parameters.
            I have libcfs_debug defined in /etc/modprobe.d/lustre.conf:

            [root@vmcentos7-1 ~]# cat /etc/modprobe.d/lustre.conf
            options lnet networks=tcp0(ens33)
            options libcfs libcfs_debug=33965841
            

            Then with

            [root@vmcentos7-1 ~]# modprobe libcfs
            [root@vmcentos7-1 ~]# rmmod libcfs
            

            I get:

            [   50.390835] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc
            [   50.393177] IP: [<ffffffffc07933fa>] cfs_trace_lock_tcd+0xa/0x80 [libcfs]
            ...
            [   50.422078] CPU: 1 PID: 2517 Comm: rmmod Kdump: loaded Tainted: G           OE  ------------   3.10.0-1160.88.1.el7.x86_64 #4
            ...
            [   50.446143] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
            [   50.447885] Call Trace:
            [   50.448517]  [<ffffffffc0793837>] libcfs_debug_msg+0xd7/0xa30 [libcfs]
            [   50.449904]  [<ffffffff9d5621ed>] ? call_rcu_sched+0x1d/0x30
            [   50.450999]  [<ffffffff9d680af1>] ? mnt_get_count+0x51/0x70
            [   50.452449]  [<ffffffff9d688935>] ? simple_release_fs+0x45/0x50
            [   50.453652]  [<ffffffffc07a1692>] ? libcfs_exit+0x27/0x995 [libcfs]
            [   50.455271]  [<ffffffffc07a1706>] libcfs_exit+0x9b/0x995 [libcfs]
            [   50.456848]  [<ffffffff9d52329e>] SyS_delete_module+0x19e/0x320
            
            vsaveliev Vladimir Saveliev added a comment - Did you change anything in your cfg/local.sh file to expose this bug? It seems that something got broken with libcfs module parameters. I have libcfs_debug defined in /etc/modprobe.d/lustre.conf: [root@vmcentos7-1 ~]# cat /etc/modprobe.d/lustre.conf options lnet networks=tcp0(ens33) options libcfs libcfs_debug=33965841 Then with [root@vmcentos7-1 ~]# modprobe libcfs [root@vmcentos7-1 ~]# rmmod libcfs I get: [ 50.390835] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc [ 50.393177] IP: [<ffffffffc07933fa>] cfs_trace_lock_tcd+0xa/0x80 [libcfs] ... [ 50.422078] CPU: 1 PID: 2517 Comm: rmmod Kdump: loaded Tainted: G OE ------------ 3.10.0-1160.88.1.el7.x86_64 #4 ... [ 50.446143] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 50.447885] Call Trace: [ 50.448517] [<ffffffffc0793837>] libcfs_debug_msg+0xd7/0xa30 [libcfs] [ 50.449904] [<ffffffff9d5621ed>] ? call_rcu_sched+0x1d/0x30 [ 50.450999] [<ffffffff9d680af1>] ? mnt_get_count+0x51/0x70 [ 50.452449] [<ffffffff9d688935>] ? simple_release_fs+0x45/0x50 [ 50.453652] [<ffffffffc07a1692>] ? libcfs_exit+0x27/0x995 [libcfs] [ 50.455271] [<ffffffffc07a1706>] libcfs_exit+0x9b/0x995 [libcfs] [ 50.456848] [<ffffffff9d52329e>] SyS_delete_module+0x19e/0x320

            Did you change anything in your cfg/local.sh file to expose this bug?

            It appeared after update from "New tag 2.15.59" to "Revert "LU-17131 ldiskfs: el9.2 encdata and filename-encode".

            Somehow it eventually disappeared.
            However, this has been also spotted on automated testing system:

            [21171.870650] Lustre: Lustre: Build Version: 2.15.59_32_g1bb972b
            [21171.928601] LNet: Using FastReg for registration
            [21171.993601] LNet: Added LNI 192.168.102.20@o2ib [8/256/0/180]
            [21172.240786] Lustre: Echo OBD driver; http://www.lustre.org/
            [21186.068097] Lustre: DEBUG MARKER: fre0220: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
            [21186.993751] Lustre: DEBUG MARKER: fre0220: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid
            [21187.929835] Lustre: DEBUG MARKER: fre0220: executing wait_import_state_mount FULL mdc.lustre-MDT0002-mdc-*.mds_server_uuid
            [21188.850020] Lustre: DEBUG MARKER: fre0220: executing wait_import_state_mount FULL mdc.lustre-MDT0003-mdc-*.mds_server_uuid
            [21194.566164] Lustre: DEBUG MARKER: fre0220: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid
            [21247.718136] Lustre: DEBUG MARKER: fre0220: executing unload_modules_local
            [21248.017247] LNet: 277562:0:(lib-ptl.c:956:lnet_clear_lazy_portal()) Active lazy portal 0 on exit
            [21249.101958] LNet: Removed LNI 192.168.102.20@o2ib
            [21249.307129] Key type .llcrypt unregistered
            [21249.317131] Key type ._llcrypt unregistered
            [21301.274884] Key type ._llcrypt registered
            [21301.278217] Key type .llcrypt registered
            [30168.281419] BUG: unable to handle kernel NULL pointer dereference at 00000000000003cc
            [30168.286373] PGD 0 P4D 0 
            [30168.287238] Oops: 0000 [#1] SMP PTI
            [30168.288290] CPU: 7 PID: 388777 Comm: modprobe Kdump: loaded Tainted: G           OE    ---------r-  - 4.18.0-348.20.1.el8_5.x86_64 #1
            [30168.291221] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [30168.292748] RIP: 0010:cfs_trace_lock_tcd+0x6/0x70 [libcfs]
            [30168.294199] Code: e9 48 89 e7 49 c7 c0 c0 d8 a5 c0 be 00 00 04 00 e8 5f fd ff ff 48 c7 c7 f0 5d a6 c0 e8 13 3b 4a ee 0f 1f 00 66 66 66 66 90 53 <0f> b7 47 4c 66 83 f8 02 77 37 48 89 fb 74 16 66 83 f8 01 74 20 85
            [30168.298429] RSP: 0018:ffffb19a40ba3b60 EFLAGS: 00010202
            [30168.299802] RAX: 0000000000000007 RBX: 0000000000000380 RCX: 0000000000000000
            [30168.301508] RDX: 0000000080000000 RSI: 0000000000000000 RDI: 0000000000000380
            [30168.303229] RBP: ffffb19a40ba3c90 R08: ffffb19a40ba3ce8 R09: ffff9a6dfffd2000
            [30168.304929] R10: ffffb19a40ba3cb0 R11: fffff5fc84d55c08 R12: ffffffffc0b205c0
            [30168.306741] R13: 0000000000000000 R14: 0000000000000380 R15: 0000000000000000
            [30168.309129] FS:  00007f2de2fe5740(0000) GS:ffff9a6dfbdc0000(0000) knlGS:0000000000000000
            [30168.311474] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            [30168.313287] CR2: 00000000000003cc CR3: 0000000130abe000 CR4: 00000000000406e0
            [30168.315421] Call Trace:
            [30168.316564]  libcfs_debug_msg+0x100/0xb00 [libcfs]
            [30168.318195]  ? __wake_up_common_lock+0x89/0xc0
            [30168.319677]  ? netlink_broadcast_filtered+0x145/0x400
            [30168.321298]  ? 0xffffffffc0b4a000
            [30168.322538]  ? lnet_init+0x62/0x1000 [lnet]
            [30168.323959]  lnet_init+0x62/0x1000 [lnet]
            [30168.325348]  do_one_initcall+0x46/0x1d0
            [30168.326447]  ? do_init_module+0x22/0x220
            [30168.327550]  ? kmem_cache_alloc_trace+0x131/0x270
            [30168.328781]  do_init_module+0x5a/0x220
            [30168.329841]  load_module+0x14c5/0x17f0
            [30168.330886]  ? __do_sys_finit_module+0xa8/0x110
            [30168.332065]  __do_sys_finit_module+0xa8/0x110
            [30168.333216]  do_syscall_64+0x5b/0x1a0
            [30168.334263]  entry_SYSCALL_64_after_hwframe+0x65/0xca
            [30168.335533] RIP: 0033:0x7f2de1fb852d
            [30168.336540] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2b 79 2c 00 f7 d8 64 89 01 48
            [30168.340556] RSP: 002b:00007ffe6a6ca438 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
            [30168.342303] RAX: ffffffffffffffda RBX: 000056352b5e2420 RCX: 00007f2de1fb852d
            [30168.343956] RDX: 0000000000000000 RSI: 000056352b5df8f0 RDI: 0000000000000000
            [30168.345599] RBP: 000056352b5df8f0 R08: 0000000000000000 R09: 0000000000000050
            [30168.347250] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
            [30168.348885] R13: 000056352b5e23c0 R14: 0000000000040000 R15: 0000000000000000
            [30168.350529] Modules linked in: lnet(OE+) libcfs(OE) rpcrdma ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_rxe ip6_udp_tunnel udp_tunnel rdma_ucm rdma_cm iw_cm ib_cm ib_uverbs ib_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul cirrus drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops drm virtio_balloon pcspkr joydev i2c_piix4 sunrpc binfmt_misc ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_blk serio_raw e1000 [last unloaded: libcfs]
            [30168.362036] CR2: 00000000000003cc
            
            vsaveliev Vladimir Saveliev added a comment - Did you change anything in your cfg/local.sh file to expose this bug? It appeared after update from "New tag 2.15.59" to "Revert " LU-17131 ldiskfs: el9.2 encdata and filename-encode". Somehow it eventually disappeared. However, this has been also spotted on automated testing system: [21171.870650] Lustre: Lustre: Build Version: 2.15.59_32_g1bb972b [21171.928601] LNet: Using FastReg for registration [21171.993601] LNet: Added LNI 192.168.102.20@o2ib [8/256/0/180] [21172.240786] Lustre: Echo OBD driver; http://www.lustre.org/ [21186.068097] Lustre: DEBUG MARKER: fre0220: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid [21186.993751] Lustre: DEBUG MARKER: fre0220: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid [21187.929835] Lustre: DEBUG MARKER: fre0220: executing wait_import_state_mount FULL mdc.lustre-MDT0002-mdc-*.mds_server_uuid [21188.850020] Lustre: DEBUG MARKER: fre0220: executing wait_import_state_mount FULL mdc.lustre-MDT0003-mdc-*.mds_server_uuid [21194.566164] Lustre: DEBUG MARKER: fre0220: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid [21247.718136] Lustre: DEBUG MARKER: fre0220: executing unload_modules_local [21248.017247] LNet: 277562:0:(lib-ptl.c:956:lnet_clear_lazy_portal()) Active lazy portal 0 on exit [21249.101958] LNet: Removed LNI 192.168.102.20@o2ib [21249.307129] Key type .llcrypt unregistered [21249.317131] Key type ._llcrypt unregistered [21301.274884] Key type ._llcrypt registered [21301.278217] Key type .llcrypt registered [30168.281419] BUG: unable to handle kernel NULL pointer dereference at 00000000000003cc [30168.286373] PGD 0 P4D 0 [30168.287238] Oops: 0000 [#1] SMP PTI [30168.288290] CPU: 7 PID: 388777 Comm: modprobe Kdump: loaded Tainted: G OE ---------r- - 4.18.0-348.20.1.el8_5.x86_64 #1 [30168.291221] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [30168.292748] RIP: 0010:cfs_trace_lock_tcd+0x6/0x70 [libcfs] [30168.294199] Code: e9 48 89 e7 49 c7 c0 c0 d8 a5 c0 be 00 00 04 00 e8 5f fd ff ff 48 c7 c7 f0 5d a6 c0 e8 13 3b 4a ee 0f 1f 00 66 66 66 66 90 53 <0f> b7 47 4c 66 83 f8 02 77 37 48 89 fb 74 16 66 83 f8 01 74 20 85 [30168.298429] RSP: 0018:ffffb19a40ba3b60 EFLAGS: 00010202 [30168.299802] RAX: 0000000000000007 RBX: 0000000000000380 RCX: 0000000000000000 [30168.301508] RDX: 0000000080000000 RSI: 0000000000000000 RDI: 0000000000000380 [30168.303229] RBP: ffffb19a40ba3c90 R08: ffffb19a40ba3ce8 R09: ffff9a6dfffd2000 [30168.304929] R10: ffffb19a40ba3cb0 R11: fffff5fc84d55c08 R12: ffffffffc0b205c0 [30168.306741] R13: 0000000000000000 R14: 0000000000000380 R15: 0000000000000000 [30168.309129] FS: 00007f2de2fe5740(0000) GS:ffff9a6dfbdc0000(0000) knlGS:0000000000000000 [30168.311474] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [30168.313287] CR2: 00000000000003cc CR3: 0000000130abe000 CR4: 00000000000406e0 [30168.315421] Call Trace: [30168.316564] libcfs_debug_msg+0x100/0xb00 [libcfs] [30168.318195] ? __wake_up_common_lock+0x89/0xc0 [30168.319677] ? netlink_broadcast_filtered+0x145/0x400 [30168.321298] ? 0xffffffffc0b4a000 [30168.322538] ? lnet_init+0x62/0x1000 [lnet] [30168.323959] lnet_init+0x62/0x1000 [lnet] [30168.325348] do_one_initcall+0x46/0x1d0 [30168.326447] ? do_init_module+0x22/0x220 [30168.327550] ? kmem_cache_alloc_trace+0x131/0x270 [30168.328781] do_init_module+0x5a/0x220 [30168.329841] load_module+0x14c5/0x17f0 [30168.330886] ? __do_sys_finit_module+0xa8/0x110 [30168.332065] __do_sys_finit_module+0xa8/0x110 [30168.333216] do_syscall_64+0x5b/0x1a0 [30168.334263] entry_SYSCALL_64_after_hwframe+0x65/0xca [30168.335533] RIP: 0033:0x7f2de1fb852d [30168.336540] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2b 79 2c 00 f7 d8 64 89 01 48 [30168.340556] RSP: 002b:00007ffe6a6ca438 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [30168.342303] RAX: ffffffffffffffda RBX: 000056352b5e2420 RCX: 00007f2de1fb852d [30168.343956] RDX: 0000000000000000 RSI: 000056352b5df8f0 RDI: 0000000000000000 [30168.345599] RBP: 000056352b5df8f0 R08: 0000000000000000 R09: 0000000000000050 [30168.347250] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [30168.348885] R13: 000056352b5e23c0 R14: 0000000000040000 R15: 0000000000000000 [30168.350529] Modules linked in: lnet(OE+) libcfs(OE) rpcrdma ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_rxe ip6_udp_tunnel udp_tunnel rdma_ucm rdma_cm iw_cm ib_cm ib_uverbs ib_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul cirrus drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops drm virtio_balloon pcspkr joydev i2c_piix4 sunrpc binfmt_misc ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_blk serio_raw e1000 [last unloaded: libcfs] [30168.362036] CR2: 00000000000003cc
            simmonsja James A Simmons added a comment - - edited

            I also test on a single node for Ubuntu and didn't see this. Did you change anything in your cfg/local.sh file to expose this bug? Sounds like someone is trying to print a debug message before libcfs debugging is setup.

            simmonsja James A Simmons added a comment - - edited I also test on a single node for Ubuntu and didn't see this. Did you change anything in your cfg/local.sh file to expose this bug? Sounds like someone is trying to print a debug message before libcfs debugging is setup.

            People

              simmonsja James A Simmons
              simmonsja James A Simmons
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated: