Details
-
Bug
-
Resolution: Won't Fix
-
Major
-
None
-
Lustre 2.12.9
-
None
-
3
-
9223372036854775807
Description
This does not impact Lustre version above 2.15.
Call Trace:
[12029.369503] BUG: unable to handle kernel NULL pointer dereference at (null) [12029.369605] IP: [<ffffffffc1041a67>] osc_lru_reserve+0x27/0x170 [osc] [12029.369702] PGD 800000006895f067 PUD 4a719067 PMD 0 [12029.369767] Oops: 0000 [#1] SMP ... [12029.389698] RIP: 0010:[<ffffffffc1041a67>] [<ffffffffc1041a67>] osc_lru_reserve+0x27/0x170 [osc] [12029.391039] RSP: 0018:ffff9756a8b83a70 EFLAGS: 00010206 [12029.392217] RAX: ffff97567fb4b000 RBX: ffff9756ba9cab88 RCX: 0000000000000000 [12029.393346] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9756818946c0 [12029.394480] RBP: ffff9756a8b83a88 R08: 0000000000000001 R09: 0000000000000000 [12029.395523] R10: ffff9756a4b6cf28 R11: 000000000000000f R12: ffff9756818946c0 [12029.396445] R13: 0000000000020000 R14: ffff9756ba9cab88 R15: ffff975682b0ba00 [12029.397361] FS: 00007efdb61e4740(0000) GS:ffff9756bfd00000(0000) knlGS:0000000000000000 [12029.398336] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [12029.399246] CR2: 0000000000000000 CR3: 000000004cf86000 CR4: 00000000000606e0 [12029.400145] Call Trace: [12029.400914] [<ffffffffc1047ac6>] osc_io_write_iter_init+0x86/0x1e0 [osc] [12029.401708] [<ffffffffc0afb1bf>] cl_io_iter_init+0x5f/0x120 [obdclass] [12029.402488] [<ffffffffc10d4ff3>] lov_io_add_sub.isra.34+0xc3/0x380 [lov] [12029.403245] [<ffffffffc10d9967>] lov_io_iter_init+0x257/0x720 [lov] [12029.403974] [<ffffffffc10da38a>] lov_io_rw_iter_init+0x38a/0x520 [lov] [12029.404717] [<ffffffffc0afb1bf>] cl_io_iter_init+0x5f/0x120 [obdclass] [12029.405401] [<ffffffffc0afd4d2>] cl_io_loop+0x42/0x1c0 [obdclass] [12029.406016] [<ffffffffc169a3bb>] ll_file_io_generic+0x63b/0xc90 [lustre] [12029.406637] [<ffffffffc169aea9>] ll_file_aio_write+0x289/0x660 [lustre] [12029.407251] [<ffffffffc169b380>] ll_file_write+0x100/0x1c0 [lustre] [12029.407850] [<ffffffff9d44e590>] vfs_write+0xc0/0x1f0 [12029.408448] [<ffffffff9d9aaed5>] ? system_call_after_swapgs+0xa2/0x13a [12029.409027] [<ffffffff9d44f36f>] SyS_write+0x7f/0xf0 [12029.409603] [<ffffffff9d9aaed5>] ? system_call_after_swapgs+0xa2/0x13a [12029.410153] [<ffffffff9d9aaf92>] system_call_fastpath+0x25/0x2a [12029.410649] [<ffffffff9d9aaed5>] ? system_call_after_swapgs+0xa2/0x13a
Code:
unsigned long osc_lru_reserve(struct client_obd *cli, unsigned long npages) { unsigned long reserved = 0; unsigned long max_pages; unsigned long c; /* reserve a full RPC window at most to avoid that a thread accidentally * consumes too many LRU slots */ max_pages = cli->cl_max_pages_per_rpc * cli->cl_max_rpcs_in_flight; if (npages > max_pages) npages = max_pages; c = atomic_long_read(cli->cl_lru_left); <-------- Crash ....
Reproducer:
- client> lfs setstripe -c1 -i0 test
- mgs> lctl conf_param lustrefs-OST0000.osc.active=0
- client> umount /mnt/lustre && mount /mnt/lustre
- client> dd if=/dev/zero of=yep bs=1M count=1 conv=notrunc
==> Crash
The issue is present uniquely after remounting the client this way the client will not try to connect to the OST and will not init the LRU cache:
lctl dl ... 4 UP osc lustrefs-OST0000-osc-ffff8cb84915e000 ... crash> p obd_devs[4]->u.cli.cl_lru_left $3 = (atomic_long_t *) 0x0
Most of the jobs do not cause a crash because most of the time they do some operation before a write (e.g: stats, seek etc...). Those operations are not impacted by this bug and will return an EIO error (in osc_io_iter_init()).
This is the same crash that LU-11658 (but the patches did not fix fully the issue).
Attachments
Issue Links
- is related to
-
LU-11658 The cl_cache may be uninitialized while osc is activating
- Resolved