Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
3
-
9223372036854775807
Description
Null reference at the start obd_statfs() function.
Looks like a race between
PID: 4360 TASK: ffff94719c7fd140 CPU: 15 COMMAND: "lctl"
#0 [ffff94719c7bb8b0] machine_kexec at ffffffff95a63674
#1 [ffff94719c7bb910] __crash_kexec at ffffffff95b1cf02
#2 [ffff94719c7bb9e0] crash_kexec at ffffffff95b1cff0
#3 [ffff94719c7bb9f8] oops_end at ffffffff9616e758
#4 [ffff94719c7bba20] no_context at ffffffff9615cafe
#5 [ffff94719c7bba70] __bad_area_nosemaphore at ffffffff9615cb95
#6 [ffff94719c7bbac0] bad_area_nosemaphore at ffffffff9615cd06
#7 [ffff94719c7bbad0] __do_page_fault at ffffffff961716b0
#8 [ffff94719c7bbb40] do_page_fault at ffffffff96171915
#9 [ffff94719c7bbb70] page_fault at ffffffff9616d758
[exception RIP: obd_statfs.constprop.43+36]
RIP: ffffffffc1a47d64 RSP: ffff94719c7bbc20 RFLAGS: 00010246
RAX: 0000000000000001 RBX: 000000000000b2c7 RCX: 0000000000000001
RDX: 000000000000b2c7 RSI: ffff94719c7bbd40 RDI: 0000000000000000
RBP: ffff94719c7bbc60 R8: ffff94716feace40 R9: 0000000000000000
R10: 0000000000001000 R11: ffffffff95bd609d R12: 0000000000000000
R13: 000000000000b2c7 R14: ffff94719c7bbd40 R15: 0000000000000001
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff94719c7bbc68] ll_statfs_internal at ffffffffc1a4fd9d [lustre]
#11 [ffff94719c7bbd38] filesfree_show at ffffffffc1a5df6b [lustre]
#12 [ffff94719c7bbde8] lustre_attr_show at ffffffffc13ffe79 [obdclass]
#13 [ffff94719c7bbdf8] sysfs_kf_seq_show at ffffffff95ccbeaf
#14 [ffff94719c7bbe18] kernfs_seq_show at ffffffff95cca5e6
#15 [ffff94719c7bbe28] seq_read at ffffffff95c68b50
#16 [ffff94719c7bbe98] kernfs_fop_read at ffffffff95ccaf35
#17 [ffff94719c7bbed8] vfs_read at ffffffff95c4118f
#18 [ffff94719c7bbf08] sys_read at ffffffff95c4204f
#19 [ffff94719c7bbf50] system_call_fastpath at ffffffff96176ddb
RIP: 00007f399f7c66e0 RSP: 00007fff98d7e7e0 RFLAGS: 00010206
RAX: 0000000000000000 RBX: 00000000006480c0 RCX: 0000000000648100
RDX: 0000000000001000 RSI: 0000000000648100 RDI: 0000000000000003
RBP: 000000000064810a R8: 00000000006480e0 R9: 0000000000001000
R10: 00007fff98d7e360 R11: 0000000000000246 R12: 0000000000648100
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000003
ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
and
PID: 4043 TASK: ffff9471ca155140 CPU: 3 COMMAND: "mount.lustre"
#0 [ffff947178f3f7b8] __schedule at ffffffff96169b97
#1 [ffff947178f3f848] schedule at ffffffff9616a099
#2 [ffff947178f3f858] schedule_timeout at ffffffff96167b71
#3 [ffff947178f3f908] wait_for_completion at ffffffff9616a44d
#4 [ffff947178f3f968] llog_process_or_fork at ffffffffc13ddc14 [obdclass]
#5 [ffff947178f3f9d0] llog_process at ffffffffc13ddef4 [obdclass]
#6 [ffff947178f3f9e0] class_config_parse_llog at ffffffffc1411b65 [obdclass]
#7 [ffff947178f3fa28] mgc_process_cfg_log at ffffffffc19a08c8 [mgc]
#8 [ffff947178f3fab0] mgc_process_log at ffffffffc19a1c23 [mgc]
#9 [ffff947178f3fb70] mgc_process_config at ffffffffc19a37f3 [mgc]
#10 [ffff947178f3fbf0] lustre_process_log at ffffffffc141d9b8 [obdclass]
#11 [ffff947178f3fc88] ll_fill_super at ffffffffc1a4dc55 [lustre]
#12 [ffff947178f3fd78] lustre_fill_super at ffffffffc1423b03 [obdclass]
#13 [ffff947178f3fdb0] mount_nodev at ffffffff95c452df
#14 [ffff947178f3fde8] lustre_mount at ffffffffc141b808 [obdclass]
#15 [ffff947178f3fe10] mount_fs at ffffffff95c45e5e
#16 [ffff947178f3fe58] vfs_kern_mount at ffffffff95c63a07
#17 [ffff947178f3fe90] do_mount at ffffffff95c6602f
#18 [ffff947178f3ff18] sys_mount at ffffffff95c66e63
#19 [ffff947178f3ff50] system_call_fastpath at ffffffff96176ddb
RIP: 00007ff8530ed60a RSP: 00007ffc04d9e948 RFLAGS: 00010206
RAX: 00000000000000a5 RBX: 0000000000000000 RCX: 0000000001000000
RDX: 0000000000409e34 RSI: 00007ffc04da4cf8 RDI: 0000000000615010
RBP: 0000000000000000 R8: 0000000000615420 R9: 0000000000000001
R10: 0000000001000000 R11: 0000000000000206 R12: 00007ffc04da4cf8
R13: 00000000fffffff5 R14: 0000000000000301 R15: 0000000000615420
ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b
exp_obd is filled in ll_fill_super() -> client_common_fill_super(), but mount process is stuck in lustre_process_log() and didn't reached client_common_fill_super() yet.
This command has been executed before the client mount is complete
crash> ps -a 4360
PID: 4360 TASK: ffff94719c7fd140 CPU: 15 COMMAND: "lctl"
ARG: lctl get_param llite/snx11214-ffff947163641800/filesfree
ENV: SHELL=/bin/bash
USER=admin
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
PWD=/
SHLVL=1
HOME=/home/admin
LOGNAME=admin
_=/usr/sbin/lctl
Solution - check if exp_obd initialized and return error code to lctl user if not initialized.
Workaround - check if mount completed before calling lctl get_param
Landed for 2.15