[LU-4134] obdfilter-suvery bugs and panics (ioctl API isn't protected over shutdown/setup property). Created: 22/Oct/13  Updated: 04/Jan/18  Resolved: 11/Dec/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0, Lustre 2.5.0
Fix Version/s: Lustre 2.11.0, Lustre 2.10.3

Type: Bug Priority: Critical
Reporter: Alexey Lyashkov Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: patch
Environment:

lustre 2.1/2.5 on any OS.


Issue Links:
Duplicate
Related
is related to LU-10209 conf-sanity test 41c crashes Resolved
Severity: 3
Rank (Obsolete): 11207

 Description   

using ioctl api - isn't safe as we lack a protect to use name2obd / uuid2obd / num2obd calls result during shutdown.

Xyratex bugs: MRP-509, MRP-1396



 Comments   
Comment by Alexey Lyashkov [ 23/Oct/13 ]

patch tries to make shutdown / setup obd device make lots clear to fix various bugs in lustre code.

http://review.whamcloud.com/8045

Comment by Cliff White (Inactive) [ 20/Jan/14 ]

Alexey, the patch needs a rebase, are you able to re-submit? Please see the Gerrit comments

Comment by Alexey Lyashkov [ 20/Jan/14 ]

Cliff,

I was busy with LU-4495, and will refresh it quickly. Xyratex inspection found a few bugs which fixed in new version also.

Comment by Cliff White (Inactive) [ 04/Mar/14 ]

Alexey, there are a few issues with the patch in our last tests, would a refresh be possible? Please see the comments on macros in Gerrit.

Comment by Peter Jones [ 10/Jul/15 ]

Needs rebasing to make any progress.

Comment by Alexey Lyashkov [ 10/Jul/15 ]

need fix a gerrit to ability to login via google.

Comment by James A Simmons [ 18/Sep/15 ]

Alexey I rebased your patch against the latest master. Please let me know if it is correct.

P.S
Not the greatest thing but I moved to yahoo mail to continue my gerrit work.

Comment by Alexander Boyko [ 21/Jun/17 ]

I updated the patch, all tests are passed.

Comment by Gerrit Updater [ 24/Oct/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/8045/
Subject: LU-4134 obdclass: obd_device improvement
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 45900a7777ac02130d8bf65724c4b6cffca9d546

Comment by Peter Jones [ 24/Oct/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 24/Oct/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/29740
Subject: LU-4134 obdclass: obd_device improvement
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: c82f9686fbe5d5a9998e2738efdc5fc4761eed7c

Comment by Yang Sheng [ 07/Nov/17 ]

This patch will cause a double free in failure path as below:

[23427.124766] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == conf-sanity test 41c: concurrent mounts of MDT\/OST should all fail but one ======================== 16:20:02 \(1509664802\)
[23427.175328] Lustre: DEBUG MARKER: == conf-sanity test 41c: concurrent mounts of MDT/OST should all fail but one ======================== 16:20:02 (1509664802)
[23427.320400] Lustre: DEBUG MARKER: grep -c /mnt/lustre-ost1' ' /proc/mounts
[23427.356134] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
[23433.423356] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-33vm7: executing set_default_debug -1 all 4
[23433.459190] Lustre: DEBUG MARKER: trevis-33vm7: executing set_default_debug -1 all 4
[23434.318418] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-33vm3: executing set_default_debug -1 all 4
[23434.356057] Lustre: DEBUG MARKER: trevis-33vm3: executing set_default_debug -1 all 4
[23435.195522] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-33vm7: executing set_default_debug -1 all 4
[23435.238307] Lustre: DEBUG MARKER: trevis-33vm7: executing set_default_debug -1 all 4
[23436.043367] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-33vm3: executing set_default_debug -1 all 4
[23436.078104] Lustre: DEBUG MARKER: trevis-33vm3: executing set_default_debug -1 all 4
[23436.305850] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-ost1
[23436.354028] Lustre: DEBUG MARKER: test -b /dev/lvm-Role_OSS/P1
[23436.392706] Lustre: DEBUG MARKER: e2label /dev/lvm-Role_OSS/P1
[23436.427055] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-ost1; mount -t lustre   		                   /dev/lvm-Role_OSS/P1 /mnt/lustre-ost1
[23436.493237] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: ,errors=remount-ro,no_mbcache,nodelalloc
[23436.602541] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
[23436.637692] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust
[23436.877956] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-33vm8: executing set_default_debug -1 all 4
[23436.915153] Lustre: DEBUG MARKER: trevis-33vm8: executing set_default_debug -1 all 4
[23436.944609] Lustre: DEBUG MARKER: e2label /dev/lvm-Role_OSS/P1 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
[23436.978188] Lustre: DEBUG MARKER: e2label /dev/lvm-Role_OSS/P1 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
[23437.019478] Lustre: DEBUG MARKER: e2label /dev/lvm-Role_OSS/P1 2>/dev/null
[23439.851173] Lustre: lustre-OST0000: deleting orphan objects from 0x0:3 to 0x0:33
[23443.391660] Lustre: DEBUG MARKER: grep -c /mnt/lustre-ost1' ' /proc/mounts
[23443.424213] Lustre: DEBUG MARKER: umount -f /mnt/lustre-ost1
[23449.552602] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
[23473.647115] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust
[23473.914217] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-33vm2: executing load_modules_local
[23473.920620] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-33vm7: executing load_modules_local
[23473.927710] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-33vm8: executing load_modules_local
[23473.936093] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-33vm3: executing load_modules_local
[23473.966215] Lustre: DEBUG MARKER: trevis-33vm2: executing load_modules_local
[23473.973080] Lustre: DEBUG MARKER: trevis-33vm7: executing load_modules_local
[23473.996666] Lustre: DEBUG MARKER: trevis-33vm8: executing load_modules_local
[23474.007128] Lustre: DEBUG MARKER: trevis-33vm3: executing load_modules_local
[23475.560023] Lustre: 9927:0:(gss_svc_upcall.c:1186:gss_init_svc_upcall()) Init channel is not opened by lsvcgssd, following request might be dropped until lsvcgssd is active
[23475.560034] Lustre: 9927:0:(gss_mech_switch.c:71:lgss_mech_register()) Register gssnull mechanism
[23475.560039] Key type lgssc registered
[23475.624355] Lustre: Echo OBD driver; http://www.lustre.org/
[23476.635995] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-33vm3: executing set_default_debug -1 all 4
[23476.669582] Lustre: DEBUG MARKER: trevis-33vm3: executing set_default_debug -1 all 4
[23477.478297] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-33vm7: executing set_default_debug -1 all 4
[23477.513178] Lustre: DEBUG MARKER: trevis-33vm7: executing set_default_debug -1 all 4
[23478.326013] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-33vm3: executing set_default_debug -1 all 4
[23478.362859] Lustre: DEBUG MARKER: trevis-33vm3: executing set_default_debug -1 all 4
[23478.582138] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust
[23478.862393] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-33vm8: executing lsmod
[23478.898933] Lustre: DEBUG MARKER: trevis-33vm8: executing lsmod
[23478.942337] Lustre: DEBUG MARKER: test -b /dev/lvm-Role_OSS/P1
[23478.974285] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x80000716
[23479.008341] Lustre: DEBUG MARKER: mount -t lustre /dev/lvm-Role_OSS/P1 /mnt/lustre-ost1
[23479.010416] Lustre: DEBUG MARKER: mount -t lustre /dev/lvm-Role_OSS/P1 /mnt/lustre-ost1
[23479.061423] LustreError: 10335:0:(libcfs_fail.h:165:cfs_race()) cfs_race id 716 sleeping
[23479.063837] LustreError: 10338:0:(libcfs_fail.h:170:cfs_race()) cfs_fail_race id 716 waking
[23479.063860] LustreError: 10335:0:(libcfs_fail.h:168:cfs_race()) cfs_fail_race id 716 awake, rc=0
[23479.063923] LustreError: 10335:0:(genops.c:489:class_register_device()) lustre-OST0000-osd: already exists, won't add
[23479.063932] LustreError: 10335:0:(genops.c:415:class_free_dev()) ASSERTION( obd->obd_magic == OBD_DEVICE_MAGIC ) failed: ffff8800793ae9c8 obd_magic 5a5a5a5a != ab5cd6ef
[23479.075186] LustreError: 10335:0:(genops.c:415:class_free_dev()) LBUG
[23479.075187] Pid: 10335, comm: mount.lustre
[23479.075187] 
[23479.075187][    0.075856] ioremap error for 0x7ffff000-0x80000000, requested 0x2, got 0x0
[    0.076057] dmi: Firmware registration failed.
Comment by Gerrit Updater [ 07/Nov/17 ]

Yang Sheng (yang.sheng@intel.com) uploaded a new patch: https://review.whamcloud.com/29967
Subject: LU-4134 obdclass: fix double free in failure path
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4fbe59cd035b7656645a9bd2a98a8b215390dd8f

Comment by Gerrit Updater [ 11/Dec/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29967/
Subject: LU-4134 obdclass: fix double free in failure path
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 77d2e604b6bc3c319adefca069b5182c325a43b0

Comment by Yang Sheng [ 11/Dec/17 ]

Patch landed to 2.11.0. Close this ticket.

Comment by Gerrit Updater [ 12/Dec/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30487
Subject: LU-4134 obdclass: fix double free in failure path
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 30db03bdae598353ca803b8f0342258167ebc9b7

Comment by Gerrit Updater [ 04/Jan/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/29740/
Subject: LU-4134 obdclass: obd_device improvement
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 09debb54bcf6d04ca9ff257fa721260f80ce8138

Comment by Gerrit Updater [ 04/Jan/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30487/
Subject: LU-4134 obdclass: fix double free in failure path
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 2c2f42ccd55601b410e35f76d89b2691f11250b1

Generated at Sat Feb 10 01:39:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.