[LU-2603] kernel panic in class_handle_ioctl() when deactivating OSC Created: 11/Jan/13  Updated: 05/Aug/14  Resolved: 05/Aug/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0, Lustre 2.1.5, Lustre 1.8.8
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Hiroya Nozaki Assignee: Cliff White (Inactive)
Resolution: Won't Fix Votes: 0
Labels: patch

Severity: 3
Rank (Obsolete): 6072

 Description   

When I was repeating mount/umount and "lctl --device ${devno} deactivate" on a client node, kernel panic happened.
The below is the stack trace of the thread which caused the kernel panic.

PID: 7160   TASK: ffff88060a370aa0  CPU: 2   COMMAND: "lctl"
 #0 [ffff8805f8ba7b20] machine_kexec at ffffffff8103281b
 #1 [ffff8805f8ba7b80] crash_kexec at ffffffff810ba792
 #2 [ffff8805f8ba7c50] oops_end at ffffffff81501700
 #3 [ffff8805f8ba7c80] die at ffffffff8100f26b
 #4 [ffff8805f8ba7cb0] do_general_protection at ffffffff81501292
 #5 [ffff8805f8ba7ce0] general_protection at ffffffff81500a65
    [exception RIP: class_handle_ioctl+4926]
    RIP: ffffffffa04e776e  RSP: ffff8805f8ba7d98  RFLAGS: 00010206
    RAX: 0000000000000000  RBX: ffff8805f4f0c800  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: ffff88062808b500
    RBP: ffff8805f8ba7e38   R8: 0000000000000000   R9: ffffffff8163abc0
    R10: 0000000000000001  R11: 0000000000000000  R12: 00000000c0086815
    R13: 00007fff38122d70  R14: 5a5a5a5a5a5a5a5a  R15: 0000000000000240
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #6 [ffff8805f8ba7d90] class_handle_ioctl at ffffffffa04e76f4 [obdclass]
 #7 [ffff8805f8ba7e40] obd_class_ioctl at ffffffffa04d12ab [obdclass]
 #8 [ffff8805f8ba7e60] vfs_ioctl at ffffffff8118dff2
 #9 [ffff8805f8ba7ea0] do_vfs_ioctl at ffffffff8118e194
#10 [ffff8805f8ba7f30] sys_ioctl at ffffffff8118e711
#11 [ffff8805f8ba7f80] system_call_fastpath at ffffffff8100b0f2
    RIP: 00000032e3adf7b7  RSP: 00007fff38122d08  RFLAGS: 00010202
    RAX: 0000000000000010  RBX: ffffffff8100b0f2  RCX: 0000000000000018
    RDX: 00007fff38122d70  RSI: 00000000c0086815  RDI: 0000000000000003
    RBP: 0000000000000001   R8: 0000000000000000   R9: 0000000000000240
    R10: 00007fff38122a90  R11: 0000000000000202  R12: 00000000c0086815
    R13: 00007fff38122d70  R14: 0000000000676ae0  R15: 0000000000000003
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

Actually class_handle_ioctl() has some if-statements which check obd status such as obd_stopping, obd_set_up and obd_attached. But there's no protection when class_handle_ioctl has already passed all of the if-statements. So, class_decref() can release an obd_device while class_handle_ioctl() is refering the obd_device.

I'll upload a patch for this problem soon, so I will be pleased if someone checks my patch.

Thank you.



 Comments   
Comment by Hiroya Nozaki [ 14/Jan/13 ]

patch for master
http://review.whamcloud.com/5024

Comment by Keith Mannthey (Inactive) [ 09/Aug/13 ]

Has there been any time to work on an improved patch?

Comment by Hiroya Nozaki [ 09/Aug/13 ]

Just only I was thinking about the problem.

I guess that we need to make some more similar functions like class_name2obd() and class_uuid2obd() with checking some obd's flags such as obd_stopping and obd_setup,
and replace some class_uuid2obd() and class_name2obd() with the new ones if needed.

And ... you know, it seems to me that it'll take me excessively lots of time, so now I'm hesitating starting it ... Do you have better ideas ?

Comment by Cliff White (Inactive) [ 09/May/14 ]

Has there been time for any further work on this? Is this still an issue?

Comment by Cliff White (Inactive) [ 05/Aug/14 ]

I am going to close this issue, due to lack of activity, please reopen if needed.

Generated at Sat Feb 10 01:26:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.