Summary: | Linux ARM freezes (Java, EGL/ES, JOGL) | ||
---|---|---|---|
Product: | [JogAmp] Jogl | Reporter: | Sven Gothel <sgothel> |
Component: | embedded | Assignee: | Sven Gothel <sgothel> |
Status: | RESOLVED WORKSFORME | ||
Severity: | blocker | CC: | xerxes |
Priority: | P1 | ||
Version: | 2 | ||
Hardware: | embedded_arm | ||
OS: | linux | ||
Type: | --- | SCM Refs: | |
Workaround: | --- |
Description
Sven Gothel
2012-03-05 18:26:08 CET
Similar experiences: <https://bugs.launchpad.net/ubuntu/+source/openjdk-6/+bug/845158> <http://forums.debian.net/viewtopic.php?f=5&t=49368> +++ I followed: <http://www.nico.schottelius.org/blog/reboot-linux-if-task-blocked-for-more-than-n-seconds/> and set </proc/sys/kernel/hung_task_timeout_secs> from 120 (2 min) to 360 (6 min). /etc/sysctl.conf: vm.min_free_kbytes = 32000 kernel.hung_task_timeout_secs = 360 Results on platform-1a: - passed 2 consecutive remote NEWT junit test runs (no AWT) - freezes w/ all test runs (before hang timeout), somewhere within the AWT tests. Results on platform-2: - freezes w/ all test runs (before hang timeout), somewhere within the AWT tests. So this is inconclusive - reset the timeout value back to 120. +++ Freeze could also be reproduced when running all tests locally. platform-1a: remote ssh (NEWT only) freeze @ 6th run platform-2: remote ssh (NEWT only) freeze @ 4th run It has been determined that the root cause is not within JOGL itself but probably within the Linux kernel version we use on pandaboard and AC100. keeping the bug open .. to track success, Xerxes is currently looking for a Linux kernel remedy within the vm page_fault area. This 'external' bug is a P1 blocker prohibiting us from running our unit tests on linux-arm. Inside the linux kernel while it are handling a java JVM pagefault, arch/arm/mm/fault.c do_page_fault around line 300, the linux kernel are trying to grab the &mm->mmap_sem kernel semaphore lock. down_read(&mm->mmap_sem); This can be seen in the kernel dmesg dump: (__schedule+0x4f0/0x5cc) from [<c0578d1c>] (__down_read+0xc0/0xd8) Mar 5 17:27:34 panda01 kernel: [ 372.084716] [<c0578d1c>] (__down_read+0xc0/0xd8) from [<c057b0e8>] (do_page_fault.part.2+0x90/0x1f8) Mar 5 17:27:34 panda01 kernel: [ 372.085205] [<c057b0e8>] Basically what happens are that some part of the kernel are already holding this mm->mmap_sem lock and have forgotten to release it, thus the java process are stuck in the linux kernel waiting for this lock to get released. https://bugs.launchpad.net/ubuntu-leb/+source/linux-ti-omap4/+bug/845158 It is confirmed that this bug no more appears on Ubuntu 12.* armhf build running on pandaboard es. |