1284990ScyLeap Second Smearing with NTP
2284990Scy-----------------------------
3284990Scy
4284990ScyBy Martin Burnicki
5284990Scywith some edits by Harlan Stenn
6284990Scy
7284990ScyThe NTP software protocol and its reference implementation, ntpd, were
8284990Scyoriginally designed to distribute UTC time over a network as accurately as
9284990Scypossible.
10284990Scy
11284990ScyUnfortunately, leap seconds are scheduled to be inserted into or deleted
12284990Scyfrom the UTC time scale in irregular intervals to keep the UTC time scale
13284990Scysynchronized with the Earth rotation.  Deletions haven't happened, yet, but
14284990Scyinsertions have happened over 30 times.
15284990Scy
16284990ScyThe problem is that POSIX requires 86400 seconds in a day, and there is no
17284990Scyprescribed way to handle leap seconds in POSIX.
18284990Scy
19284990ScyWhenever a leap second is to be handled ntpd either:
20284990Scy
21284990Scy- passes the leap second announcement down to the OS kernel (if the OS
22284990Scysupports this) and the kernel handles the leap second automatically, or
23284990Scy
24284990Scy- applies the leap second correction itself.
25284990Scy
26284990ScyNTP servers also pass a leap second warning flag down to their clients via
27284990Scythe normal NTP packet exchange, so clients also become aware of an
28284990Scyapproaching leap second, and can handle the leap second appropriately.
29284990Scy
30284990Scy
31284990ScyThe Problem on Unix-like Systems
32284990Scy--------------------------------
33284990ScyIf a leap second is to be inserted then in most Unix-like systems the OS
34284990Scykernel just steps the time back by 1 second at the beginning of the leap
35284990Scysecond, so the last second of the UTC day is repeated and thus duplicate
36284990Scytimestamps can occur.
37284990Scy
38284990ScyUnfortunately there are lots of applications which get confused it the
39284990Scysystem time is stepped back, e.g. due to a leap second insertion.  Thus,
40284990Scymany users have been looking for ways to avoid this, and tried to introduce
41284990Scyworkarounds which may work properly, or not.
42284990Scy
43284990ScySo even though these Unix kernels normally can handle leap seconds, the way
44284990Scythey do this is not optimal for applications.
45284990Scy
46284990ScyOne good way to handle the leap second is to use ntp_gettime() instead of
47284990Scythe usual calls, because ntp_gettime() includes a "clock state" variable
48284990Scythat will actually tell you if the time you are receiving is OK or not, and
49284990Scyif it is OK, if the current second is an in-progress leap second.  But even
50284990Scythough this mechanism has been available for about 20 years' time, almost
51284990Scynobody uses it.
52284990Scy
53284990Scy
54284990ScyNTP Client for Windows Contains a Workaround
55284990Scy--------------------------------------------
56284990ScyThe Windows system time knows nothing about leap seconds, so for many years
57284990Scythe Windows port of ntpd provides a workaround where the system time is
58284990Scyslewed by the client to compensate the leap second.
59284990Scy
60284990ScyThus it is not required to use a smearing NTP server for Windows clients,
61284990Scybut of course the smearing server approach also works.
62284990Scy
63284990Scy
64284990ScyThe Leap Smear Approach
65284990Scy-----------------------
66284990ScyDue to the reasons mentioned above some support for leap smearing has
67284990Scyrecently been implemented in ntpd.  This means that to insert a leap second
68284990Scyan NTP server adds a certain increasing "smear" offset to the real UTC time
69284990Scysent to its clients, so that after some predefined interval the leap second
70284990Scyoffset is compensated.  The smear interval should be long enough,
71284990Scye.g. several hours, so that NTP clients can easily follow the clock drift
72284990Scycaused by the smeared time.
73284990Scy
74284990ScyDuring the period while the leap smear is being performed, ntpd will include
75284990Scya specially-formatted 'refid' in time packets that contain "smeared" time.
76284990ScyThis refid is of the form 254.x.y.z, where x.y.z are 24 encoded bits of the
77284990Scysmear value.
78284990Scy
79284990ScyWith this approach the time an NTP server sends to its clients still matches
80284990ScyUTC before the leap second, up to the beginning of the smear interval, and
81284990Scyagain corresponds to UTC after the insertion of the leap second has
82284990Scyfinished, at the end of the smear interval.  By examining the first byte of
83284990Scythe refid, one can also determine if the server is offering smeared time or
84284990Scynot.
85284990Scy
86284990ScyOf course, clients which receive the "smeared" time from an NTP server don't
87284990Scyhave to (and even must not) care about the leap second anymore.  Smearing is
88284990Scyjust transparent to the clients, and the clients don't even notice there's a
89284990Scyleap second.
90284990Scy
91284990Scy
92284990ScyPros and Cons of the Smearing Approach
93284990Scy--------------------------------------
94284990ScyThe disadvantages of this approach are:
95284990Scy
96284990Scy- During the smear interval the time provided by smearing NTP servers
97284990Scydiffers significantly from UTC, and thus from the time provided by normal,
98284990Scynon-smearing NTP servers.  The difference can be up to 1 second, depending
99284990Scyon the smear algorithm.
100284990Scy
101284990Scy- Since smeared time differs from true UTC, and many applications require
102284990Scycorrect legal time (UTC), there may be legal consequences to using smeared
103284990Scytime.  Make sure you check to see if this requirement affects you.
104284990Scy
105284990ScyHowever, for applications where it's only important that all computers have
106284990Scythe same time and a temporary offset of up to 1 s to UTC is acceptable, a
107284990Scybetter approach may be to slew the time in a well defined way, over a
108284990Scycertain interval, which is what we call smearing the leap second.
109284990Scy
110284990Scy
111284990ScyThe Motivation to Implement Leap Smearing
112284990Scy-----------------------------------------
113284990ScyHere is some historical background for ntpd, related to smearing/slewing
114284990Scytime.
115284990Scy
116284990ScyUp to ntpd 4.2.4, if kernel support for leap seconds was either not
117284990Scyavailable or was not enabled, ntpd didn't care about the leap second at all.
118284990ScySo if ntpd was run with -x and thus kernel support wasn't used, ntpd saw a
119284990Scysudden 1 s offset after the leap second and normally would have stepped the
120284990Scytime by -1 s a few minutes later.  However, 'ntpd -x' does not step the time
121284990Scybut "slews" the 1-second correction, which takes 33 minutes and 20 seconds
122284990Scyto complete.  This could be considered a bug, but certainly this was only an
123284990Scyaccidental behavior.
124284990Scy
125284990ScyHowever, as we learned in the discussion in http://bugs.ntp.org/2745, this
126284990Scybehavior was very much appreciated since indeed the time was never stepped
127284990Scyback, and even though the start of the slewing was somewhat undefined and
128284990Scydepended on the poll interval.  The system time was off by 1 second for
129284990Scyseveral minutes before slewing even started.
130284990Scy
131284990ScyIn ntpd 4.2.6 some code was added which let ntpd step the time at UTC
132284990Scymidnight to insert a leap second, if kernel support was not used.
133284990ScyUnfortunately this also happened if ntpd was started with -x, so the folks
134284990Scywho expected that the time was never stepped when ntpd was run with -x found
135284990Scythis wasn't true anymore, and again from the discussion in NTP bug 2745 we
136284990Scylearn that there were even some folks who patched ntpd to get the 4.2.4
137284990Scybehavior back.
138284990Scy
139284990ScyIn 4.2.8 the leap second code was rewritten and some enhancements were
140284990Scyintroduced, but the resulting code still showed the behavior of 4.2.6,
141284990Scyi.e. ntpd with -x would still step the time.  This has only recently been
142284990Scyfixed in the current ntpd stable code, but this fix is only available with a
143284990Scycertain patch level of ntpd 4.2.8.
144284990Scy
145284990ScySo a possible solution for users who were looking for a way to come over the
146284990Scyleap second without the time being stepped could have been to check the
147284990Scyversion of ntpd installed on each of their systems.  If it's still 4.2.4 be
148284990Scysure to start the client ntpd with -x.  If it's 4.2.6 or 4.2.8 it won't work
149284990Scyanyway except if you had a patched ntpd version instead of the original
150284990Scyversion.  So you'd need to upgrade to the current -stable code to be able to
151284990Scyrun ntpd with -x and get the desired result, so you'd still have the
152284990Scyrequirement to check/update/configure every single machine in your network
153284990Scythat runs ntpd.
154284990Scy
155284990ScyGoogle's leap smear approach is a very efficient solution for this, for
156284990Scysites that do not require correct timestamps for legal purposes.  You just
157284990Scyhave to take care that your NTP servers support leap smearing and configure
158284990Scythose few servers accordingly.  If the smear interval is long enough so that
159284990ScyNTP clients can follow the smeared time it doesn't matter at all which
160284990Scyversion of ntpd is installed on a client machine, it just works, and it even
161284990Scyworks around kernel bugs due to the leap second.
162284990Scy
163284990ScySince all clients follow the same smeared time the time difference between
164284990Scythe clients during the smear interval is as small as possible, compared to
165284990Scythe -x approach.  The current leap second code in ntpd determines the point
166284990Scyin system time when the leap second is to be inserted, and given a
167284990Scyparticular smear interval it's easy to determine the start point of the
168284990Scysmearing, and the smearing is finished when the leap second ends, i.e. the
169284990Scynext UTC day begins.
170284990Scy
171284990ScyThe maximum error doesn't exceed what you'd get with the old smearing caused
172284990Scyby -x in ntpd 4.2.4, so if users could accept the old behavior they would
173284990Scyeven accept the smearing at the server side.
174284990Scy
175284990ScyIn order to affect the local timekeeping as little as possible the leap
176284990Scysmear support currently implemented in ntpd does not affect the internal
177284990Scysystem time at all.  Only the timestamps and refid in outgoing reply packets
178284990Scy*to clients* are modified by the smear offset, so this makes sure the basic
179284990Scyfunctionality of ntpd is not accidentally broken.  Also peer packets
180284990Scyexchanged with other NTP servers are based on the real UTC system time and
181284990Scythe normal refid, as usual.
182284990Scy
183284990ScyThe leap smear implementation is optionally available in ntp-4.2.8p3 and
184284990Scylater, and the changes can be tracked via http://bugs.ntp.org/2855.
185284990Scy
186284990Scy
187284990ScyUsing NTP's Leap Second Smearing
188284990Scy--------------------------------
189284990Scy- Leap Second Smearing MUST NOT be used for public servers, e.g. servers
190284990Scyprovided by metrology institutes, or servers participating in the NTP pool
191284990Scyproject.  There would be a high risk that NTP clients get the time from a
192284990Scymixture of smearing and non-smearing NTP servers which could result in
193284990Scyundefined client behavior.  Instead, leap second smearing should only be
194284990Scyconfigured on time servers providing dedicated clients with time, if all
195284990Scythose clients can accept smeared time.
196284990Scy
197284990Scy- Leap Second Smearing is NOT configured by default.  The only way to get
198284990Scythis behavior is to invoke the ./configure script from the NTP source code
199284990Scypackage with the --enable-leap-smear parameter before the executables are
200284990Scybuilt.
201284990Scy
202284990Scy- Even if ntpd has been compiled to enable leap smearing support, leap
203284990Scysmearing is only done if explicitly configured.
204284990Scy
205284990Scy- The leap smear interval should be at least several hours' long, and up to
206284990Scy1 day (86400s).  If the interval is too short then the applied smear offset
207284990Scyis applied too quickly for clients to follow.  86400s (1 day) is a good
208284990Scychoice.
209284990Scy
210284990Scy- If several NTP servers are set up for leap smearing then the *same* smear
211284990Scyinterval should be configured on each server.
212284990Scy
213284990Scy- Smearing NTP servers DO NOT send a leap second warning flag to client time
214284990Scyrequests.  Since the leap second is applied gradually the clients don't even
215284990Scynotice there's a leap second being inserted, and thus there will be no log
216284990Scymessage or similar related to the leap second be visible on the clients.
217284990Scy
218284990Scy- Since clients don't (and must not) become aware of the leap second at all,
219284990Scyclients getting the time from a smearing NTP server MUST NOT be configured
220284990Scyto use a leap second file.  If they had a leap second file they would apply
221284990Scythe leap second twice: the smeared one from the server, plus another one
222284990Scyinserted by themselves due to the leap second file.  As a result, the
223284990Scyadditional correction would soon be detected and corrected/adjusted.
224284990Scy
225284990Scy- Clients MUST NOT be configured to poll both smearing and non-smearing NTP
226284990Scyservers at the same time.  During the smear interval they would get
227284990Scydifferent times from different servers and wouldn't know which server(s) to
228284990Scyaccept.
229284990Scy
230284990Scy
231284990ScySetting Up A Smearing NTP Server
232284990Scy--------------------------------
233284990ScyIf an NTP server should perform leap smearing then the leap smear interval
234284990Scy(in seconds) needs to be specified in the NTP configuration file ntp.conf,
235284990Scye.g.:
236284990Scy
237284990Scy leapsmearinterval 86400
238284990Scy
239284990ScyPlease keep in mind the leap smear interval should be between several and 24
240284990Scyhours' long.  With shorter values clients may not be able to follow the
241284990Scydrift caused by the smeared time, and with longer values the discrepancy
242284990Scybetween system time and UTC will cause more problems when reconciling
243284990Scytimestamp differences.
244284990Scy
245284990ScyWhen ntpd starts and a smear interval has been specified then a log message
246284990Scyis generated, e.g.:
247284990Scy
248284990Scy ntpd[31120]: config: leap smear interval 86400 s
249284990Scy
250284990ScyWhile ntpd is running with a leap smear interval specified the command:
251284990Scy
252284990Scy ntpq -c rv
253284990Scy
254284990Scyreports the smear status, e.g.:
255284990Scy
256284990Scy# ntpq -c rv
257284990Scyassocid=0 status=4419 leap_add_sec, sync_uhf_radio, 1 event, leap_armed,
258284990Scyversion="ntpd 4.2.8p3-RC1@1.3349-o Mon Jun 22 14:24:09 UTC 2015 (26)",
259284990Scyprocessor="i586", system="Linux/3.7.1", leap=01, stratum=1,
260284990Scyprecision=-18, rootdelay=0.000, rootdisp=1.075, refid=MRS,
261284990Scyreftime=d93dab96.09666671 Tue, Jun 30 2015 23:58:14.036,
262284990Scyclock=d93dab9b.3386a8d5 Tue, Jun 30 2015 23:58:19.201, peer=2335,
263284990Scytc=3, mintc=3, offset=-0.097015, frequency=44.627, sys_jitter=0.003815,
264284990Scyclk_jitter=0.451, clk_wander=0.035, tai=35, leapsec=201507010000,
265284990Scyexpire=201512280000, leapsmearinterval=86400, leapsmearoffset=-932.087
266284990Scy
267284990ScyIn the example above 'leapsmearinterval' reports the configured leap smear
268284990Scyinterval all the time, while the 'leapsmearoffset' value is 0 outside the
269284990Scyinterval and increases from 0 to -1000 ms over the interval.  So this can be
270284990Scyused to monitor if and how the time sent to clients is smeared.  With a
271284990Scyleapsmearoffset of -.932087, the refid reported in smeared packets would be
272284990Scy254.196.88.176.
273