Applies to:
Linux OS - Version Oracle Linux 4.4 to Oracle Linux 6.2 with Unbreakable Enterprise Kernel [2.6.39] [Release OL4U4 to OL6U2]Oracle VM - Version 2.1.1 to 3.1.1 [Release OVM211 to OVM31]
Information in this document applies to any platform.
Description
The reason this is occurring before the leap second is actually scheduled to occur is that ntpd lets the kernel handle the leap second at midnight, but needs to alert the kernel to insert the leap second before midnight. ntpd therefore calls adjtimex(2) sometime during the day of the leap second, at which point this bug is triggered.
This is caused by a livelock when ntpd calls adjtimex(2) to tell the kernel to insert a leap second. See lkml postinghttp://lkml.indiana.edu/hypermail/linux/kernel/1203.1/04598.html
Occurrence
Can occur on Oracle Linux / RHEL / Oracle VM (including UEK and RedHat compitable kernels) on bare metal machine or Oracle VM environment.
Symptoms
Servers will become unresponsive and the following can be seen in system logs, console, netconsole or vmcore dump analysis outputs:
INFO: task kjournald:1119 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald D ffff880028087f00 0 1119 2 0x00000000
ffff8807ac15dc40 0000000000000246 ffffffff8100e6a1 ffffffffb053069f
ffff8807ac22e140 ffff8807ada96080 ffff8807ac22e510 ffff880028073000
ffff8807ac15dcd0 ffff88002802ea60 ffff8807ac15dc20 ffff8807ac22e140
Or java applications suddenly started to use 100% CPU (Leap second insertion causes futex to repeatedly timeout).
See also:
- Relevant lwn.net article
- Information on serverfault.com
- Fix leap-second hrtimer livelock on kernel git
Workaround
# /etc/init.d/ntpd stop
# date -s "`date`" (reset the system clock)
# /etc/init.d/ntpd start
Or Reboot the Server.
History
[02-JUL-2012] - Alert Created
[03-JUL-2012] - Alert Published
Niciun comentariu:
Trimiteți un comentariu