2012-07-04

leap second in oracle linux

Leap Second Hang - CPU Can Be Seen at 100% [ID 1472421.1]

Applies to:

Linux OS - Version Oracle Linux 4.4 to Oracle Linux 6.2 with Unbreakable Enterprise Kernel [2.6.39] [Release OL4U4 to OL6U2]
Oracle VM - Version 2.1.1 to 3.1.1 [Release OVM211 to OVM31]
Information in this document applies to any platform.

Description

The reason this is occurring before the leap second is actually scheduled to occur is that ntpd lets the kernel handle the leap second at midnight, but needs to alert the kernel to insert the leap second before midnight. ntpd therefore calls adjtimex(2) sometime during the day of the leap second, at which point this bug is triggered.
 This is caused by a livelock when ntpd calls adjtimex(2) to tell the kernel to insert a leap second. See lkml postinghttp://lkml.indiana.edu/hypermail/linux/kernel/1203.1/04598.html

Occurrence

Can occur on Oracle Linux / RHEL / Oracle VM (including UEK and RedHat compitable kernels) on bare metal machine or Oracle VM environment.  

Symptoms

Servers will become unresponsive and the following can be seen in system logs, console, netconsole or vmcore dump analysis outputs:
INFO: task kjournald:1119 blocked for more than 120 seconds.
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  kjournald     D ffff880028087f00     0  1119      2 0x00000000
  ffff8807ac15dc40 0000000000000246 ffffffff8100e6a1 ffffffffb053069f
  ffff8807ac22e140 ffff8807ada96080 ffff8807ac22e510 ffff880028073000
  ffff8807ac15dcd0 ffff88002802ea60 ffff8807ac15dc20 ffff8807ac22e140
  
Or java applications suddenly started to use 100% CPU (Leap second insertion causes futex to repeatedly timeout).
See also:

Workaround

# /etc/init.d/ntpd stop
#  date -s "`date`"    (reset the system clock)
# /etc/init.d/ntpd start
Or Reboot the Server.

History

[02-JUL-2012] - Alert Created
[03-JUL-2012] - Alert Published

References

BUG:14264454 - KERNEL HANG DUE TO LEAP SECOND

Niciun comentariu:

Trimiteți un comentariu