[Eisfair] Testaufruf bei Instabilität des eiskernels 2.18.0

Thomas Bork tom at eisfair.org
Mo Feb 15 20:20:55 CET 2016


Hi @all,

da das anscheinend im Thread "Absturz, Log-Datei vor dem Absturz" zu 
versteckt ist:

Es gibt User, die mit dem Kernel 2.18.0 Probleme haben. Diese Probleme 
äussern sich in Meldungen in /var/log/messages wie

[    0.000000] INFO: rcu_bh detected stall on CPU 1 (t=0 jiffies)
[    0.000000] Pid: 0, comm: swapper/1 Not tainted
3.2.71-eisfair-1-SMP #1
[    0.000000] Call Trace:
[    0.000000]  [<c1057f52>] __rcu_pending+0x64/0x28f
[    0.000000]  [<c10585c9>] rcu_check_callbacks+0x87/0x98
[    0.000000]  [<c1031771>] update_process_times+0x2d/0x58
[    0.000000]  [<c1047207>] tick_sched_timer+0x13f/0x166
[    0.000000]  [<c103e194>] __run_hrtimer.isra.27+0x3d/0x91
[    0.000000]  [<c103e801>] hrtimer_interrupt+0xe2/0x1cb
[    0.000000]  [<c1014e67>] smp_apic_timer_interrupt+0x67/0x7a
[    0.000000]  [<c12fa3fa>] apic_timer_interrupt+0x2a/0x30
[    0.000000]  [<c10400d8>] ? __lowest_in_progress+0x34/0x53
[    0.000000]  [<c11ebfa4>] ? acpi_idle_enter_simple+0x102/0x13b
[    0.000000]  [<c1269431>] cpuidle_idle_call+0x5a/0xa5
[    0.000000]  [<c100159a>] cpu_idle+0x3d/0x5c
[    0.000000]  [<c12f0b2e>] start_secondary+0x190/0x195

- siehe Thread "Fehlermeldung in dmesg"

oder

Feb  8 16:33:56 server kernel: INFO: rcu_sched detected stall on CPU 3
(t=150093 jiffies)
Feb  8 16:33:56 server kernel: Pid: 2544, comm: smbd Tainted: P
   O 3.2.75-eisfair-1-SMP #1
Feb  8 16:33:56 server kernel: Call Trace:
Feb  8 16:33:56 server kernel:  [__rcu_pending+0x64/0x28f]
__rcu_pending+0x64/0x28f
Feb  8 16:33:56 server kernel:  [rcu_check_callbacks+0x6d/0x98]
rcu_check_callbacks+0x6d/0x98
Feb  8 16:33:56 server kernel:  [update_process_times+0x2d/0x58]
update_process_times+0x2d/0x58
Feb  8 16:33:56 server kernel:  [tick_sched_timer+0x13f/0x166]
tick_sched_timer+0x13f/0x166
Feb  8 16:33:56 server kernel:  [__run_hrtimer.isra.27+0x3d/0x91]
__run_hrtimer.isra.27+0x3d/0x91
Feb  8 16:33:56 server kernel:  [hrtimer_interrupt+0xe2/0x1cb]
hrtimer_interrupt+0xe2/0x1cb
Feb  8 16:33:56 server kernel:  [smp_apic_timer_interrupt+0x67/0x7a]
smp_apic_timer_interrupt+0x67/0x7a
Feb  8 16:33:56 server kernel:  [apic_timer_interrupt+0x2a/0x30]
apic_timer_interrupt+0x2a/0x30
Feb  8 16:33:56 server kernel:  [any_slab_objects+0x15/0x1b] ?
any_slab_objects+0x15/0x1b
Feb  8 16:33:56 server kernel:  [_raw_spin_lock+0x10/0x1c] ?
_raw_spin_lock+0x10/0x1c
Feb  8 16:33:56 server kernel:  [unix_state_double_lock+0x3d/0x41]
unix_state_double_lock+0x3d/0x41
Feb  8 16:33:56 server kernel:  [unix_dgram_connect+0x83/0x153]
unix_dgram_connect+0x83/0x153
Feb  8 16:33:56 server kernel:  [sys_connect+0x63/0x88]
sys_connect+0x63/0x88
Feb  8 16:33:56 server kernel:  [sys_socketcall+0x76/0x192]
sys_socketcall+0x76/0x192
Feb  8 16:33:57 server kernel:  [syscall_after_call+0x0/0x04]
syscall_call+0x7/0x7
Feb  8 16:33:57 server kernel:  [mcheck_cpu_init+0x137/0x2d2] ?
mcheck_cpu_init+0x137/0x2d2

siehe Thread "E1 friert ein wg. Speicherleck"

oder

Feb 12 13:27:36 myeis kernel: INFO: rcu_sched detected stall on CPU 1 
(t=15000 jiffies)
Feb 12 13:27:36 myeis kernel: Pid: 3652, comm: smbd Tainted: G  O 
3.2.75-eisfair-1-VIRT #1
Feb 12 13:27:36 myeis kernel: Call Trace:
Feb 12 13:27:36 myeis kernel:  [__rcu_pending+0x64/0x28f] 
__rcu_pending+0x64/0x28f
Feb 12 13:27:36 myeis kernel:  [account_process_tick+0x104/0x15a] ? 
account_process_tick+0x104/0x15a
Feb 12 13:27:36 myeis kernel:  [rcu_check_callbacks+0x6d/0x98] 
rcu_check_callbacks+0x6d/0x98
Feb 12 13:27:36 myeis kernel:  [update_process_times+0x2d/0x58] 
update_process_times+0x2d/0x58
Feb 12 13:27:36 myeis kernel:  [tick_sched_timer+0x0/0x16b] ? 
tick_init_highres+0x11/0x11
Feb 12 13:27:36 myeis kernel:  [tick_sched_timer+0x144/0x16b] 
tick_sched_timer+0x144/0x16b
Feb 12 13:27:36 myeis kernel:  [tick_sched_timer+0x0/0x16b] ? 
tick_init_highres+0x11/0x11
Feb 12 13:27:36 myeis kernel:  [__run_hrtimer.isra.27+0x4d/0x9c] 
__run_hrtimer.isra.27+0x4d/0x9c
Feb 12 13:27:36 myeis kernel:  [hrtimer_interrupt+0xe2/0x1dd] 
hrtimer_interrupt+0xe2/0x1dd
Feb 12 13:27:36 myeis kernel:  [smp_apic_timer_interrupt+0x67/0x7a] 
smp_apic_timer_interrupt+0x67/0x7a
Feb 12 13:27:36 myeis kernel:  [apic_timer_interrupt+0x2a/0x30] 
apic_timer_interrupt+0x2a/0x30
Feb 12 13:27:36 myeis kernel:  [link_path_walk+0xfb/0x61b] ? 
link_path_walk+0xfb/0x61b
Feb 12 13:27:36 myeis kernel:  [try_to_merge_with_ksm_page+0x2d0/0x451] 
? try_to_merge_with_ksm_page+0x2d0/0x451
Feb 12 13:27:36 myeis kernel:  [__ticket_spin_lock+0x16/0x1c] ? 
__ticket_spin_lock+0x16/0x1c
Feb 12 13:27:36 myeis kernel:  [_raw_spin_lock+0x8/0x0b] 
_raw_spin_lock+0x8/0xb
Feb 12 13:27:36 myeis kernel:  [unix_state_double_lock+0x3d/0x41] 
unix_state_double_lock+0x3d/0x41
Feb 12 13:27:36 myeis kernel:  [unix_dgram_connect+0x83/0x153] 
unix_dgram_connect+0x83/0x153
Feb 12 13:27:36 myeis kernel:  [sys_connect+0x63/0x88] sys_connect+0x63/0x88
Feb 12 13:27:36 myeis kernel:  [sys_socketcall+0x76/0x192] 
sys_socketcall+0x76/0x192
Feb 12 13:27:36 myeis kernel:  [syscall_after_call+0x0/0x04] 
syscall_call+0x7/0x7
Feb 12 13:27:36 myeis kernel:  [get_cpu_leaves+0x1dd/0x28a] ? 
get_cpu_leaves+0x1dd/0x28a

im Thread "Absturz, Log-Datei vor dem Absturz".

Um einzugrenzen, ob das Problem in einem bestimmten Patch begründet 
liegt, der in den Longterm-Kernel 3.2.y und damit bei uns eingeflossen 
ist, habe ich diesen Patch für einen Testkernel für die User mit obigen 
Problemen entfernt.

Ich bitte hiermit alle User mit obigem Problem darum, diesen Kernel zu 
installieren, um einzugrenzen, ob der entfernte Patch das Problem löst.


Unter

http://download.eisfair.org/tombork/test/crash/

liegen die entsprechenden Versionen. In ein leeres Verzeichnis kopieren 
und mit

/var/install/bin/install-local-package Verzeichnis

installieren.

Diese Kernel-Pakete räumen 3.2.71 nicht ab - kernel-dev räumt aber 
/usr/src/linux-3.2.71-eisfair-1 weiterhin ab. Um einzugrenzen, ob das 
Problem mit diesem Kernel noch existiert, ist es nicht nötig, kernel-dev 
zu installieren. kernel-dev ist nur dabei, damit man sich den kompletten 
Patch ansehen kann, der bei eisfair angewendet wird und der sich nun vom 
Patch im normalen eiskernel 2.18.0 unterscheidet.

Die Versionsnummern haben sich nicht geändert, 2.18.0 ist also geblieben.


Zur Information eine Antwort von Ben Hutchings, dem Maintainer des 
longterm-3.2.y:

#####################################################
On Sun, 2016-02-14 at 11:51 +0100, Thomas Bork wrote:
 > Am 10.02.2016 um 12:18 schrieb Karolin Seeger:
 >
 >> this is a heads-up that we have seen some system crashes after updating
 >> to Ubuntu LTS kernel 3.13.0-77 on systems running Samba.
 >>
 >> It looks like a kernel bug triggered by Samba calls.
 >> A bug report has been created [1].
 >>
 >> Downgrading to kernel 3.13.0-76 solves the problem.
 >>
 >> [1] 
https://bugs.launchpad.net/ubuntu/+source/linux-lts-trusty/+bug/1543980
 >
 > I want to let you know that some of our samba users have a similar
 > problem after switching the kernel from 3.2.74 to 3.2.76:
[...]
 > This stalls later on seams to leading to a memory leak till the
 > oom-killer kills processes and the machines crashes.
 >
 > Downgrading to kernel 3.2.74 solves the problem.
 >
 > After reading
 >
 > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1543980
 > https://forge.univention.org/bugzilla/show_bug.cgi?id=40558
 > https://patchwork.ozlabs.org/patch/582017/
 >
 > I think the patch
 >
 >> 
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/net/unix/af_unix.c?id=a3b0f6e8a21ef02f69a15abac440572d8cde8c2a
 >
 > in 3.2.75 is the problem.

I think it's fixed by this kernel patch:
http://mid.gmane.org/87r3gj11jc.fsf_-_@doppelsaurus.mobileactivedefense.com

Assuming it's applied upstream, it will get into stable updates in due
course.  I've also queued this up for inclusion in Debian security
updates.

Ben.
#####################################################

-- 
der tom
[eisfair-team]


Mehr Informationen über die Mailingliste Eisfair