NTP

From Wikitech
(Difference between revisions)
Jump to: navigation, search
(how to install on deb)
(update)
 
(8 intermediate revisions by 3 users not shown)
Line 1: Line 1:
to install it:
+
All servers are time-synchronised using the NTP protocol. In pmtpa, this is managed by [[Puppet]], with the <tt>ntp::client</tt> class in <tt>ntp.pp</tt>. The NTP servers are [[dobson]] and [[linne]].
* on debian testing: <tt>apt-get install ntpdate</tt>
+
  
to fix it on Fedora core:
+
==Testing==
* edit <tt>/etc/ntp/step-tickers</tt> (<tt>init.d</tt> uses it) and put zwinger's internal IP if the box is a 10.x.y.z IP box
+
* <tt>chown ntp:ntp /etc/ntp</tt>
+
  
other problems:
+
Testing is very important, as the November 2005 event demonstrated. It's easy to test whether ntpd is working.
* make sure server (either 10/8 or ext ip) is correct in ntp.conf
+
 
 +
/usr/sbin/ntpq -p
 +
 
 +
Here is the output from a happy server:
 +
 
 +
      remote          refid      st t when poll reach  delay  offset  jitter
 +
==============================================================================
 +
*vl-2-0.csw1-pmt albert.pmtpa.wm  3 u  43  64  177    0.486    1.155  1.135
 +
 
 +
Note the asterisk in the first column, that tells you it's happy. It's synchronised to csw1, which is on stratum (st) 3, and the refid gives the stratum 2 server. The other important columns are:
 +
 
 +
* '''when:''' this tells you how long ago it received a response from the server, in this case 43 seconds
 +
* '''offset:''' this tells you how far off the clock is, in milliseconds.
 +
 
 +
Here is the output from a server which is on its way to synchronisation:
 +
 
 +
      remote          refid      st t when poll reach  delay  offset  jitter
 +
==============================================================================
 +
  vl-2-0.csw1-pmt albert.pmtpa.wm  3 u  81 1024    7    0.573  -203.77  1.635
 +
 
 +
There's no asterisk, which means it hasn't synchronised yet. The offset is substantial, so it will take a while to get into sync. The fact that the remote, refid, st and when columns are reasonable tells you that it is actually working. Hopefully we check back later, offset should be smaller.
 +
 
 +
Here is the output from a completely broken server:
 +
 
 +
      remote          refid      st t when poll reach  delay  offset  jitter
 +
==============================================================================
 +
  vl-2-0.csw1-pmt .B▒▒.          16 u  16  64    0    0.000    0.000 4000.00
 +
 
 +
It seems to know what server it's meant to be reading from, but the other columns are just silly. There's no such thing as stratum 16, and I'm quite sure the network delay is meant to be more than zero. If you see something like this, you need to fix it.
 +
 
 +
ntpq can be run remotely. The output of <tt>ntpq -c peers csw1-pmtpa</tt> currently shows:
 +
 
 +
      remote          refid      st t when poll reach  delay  offset  jitter
 +
==============================================================================
 +
  207.142.131.255 0.0.0.0        16 u    -  64    0    0.000    0.000 16000.0
 +
  10.0.255.255    0.0.0.0        16 u    -  64    0    0.000    0.000 16000.0
 +
  clock2.redhat.c .CDMA.          1 -  18d  64    0  73.200    1.322 16000.0
 +
  ntp-s1.cise.ufl 85.83.78.79    16 -  18d 1024    0  18.140    0.673 16000.0
 +
  raptor.tera-byt 0.0.0.0        16 -    - 1024    0    0.000    0.000 16000.0
 +
*albert.pmtpa.wm ntp-s1.cise.ufl  2 u  45  64  377    0.790  -3.597  0.400
 +
 
 +
Three broken external servers, two broadcast domains and albert, which is a working stratum 2 server. Finally, albert gives:
 +
 
 +
      remote          refid      st t when poll reach  delay  offset  jitter
 +
==============================================================================
 +
*ntp-s1.cise.ufl .USNO.          1 u  86  128  377  22.090  12.095  5.867
 +
+ip-207-145-113- .GPS.            1 u  89  128  377  76.639  20.227  5.468
 +
+solarnet.ru    hora.cs.tu-berl  2 u  92  128  377  180.877  11.987  3.481
 +
-blah.jabber.dk  ntp2.sth.netnod  2 u  97  128  377  135.076  22.336  5.524
 +
  LOCAL(0)       LOCAL(0)        10 l  12  64  377    0.000    0.000  0.001
 +
 
 +
A nearby stratum 1 server at the University of Florida is selected as the reference, but I've configured three other servers from pool.ntp.org in case that one goes down. Two of them are contributing to the averaging process, the third is ignored because its clock doesn't agree with the others. If all 4 are unreachable, the local clock will be used. It's currently undesirable because it's been declared stratum 10.
 +
 
 +
[[Category:Software]]

Latest revision as of 13:31, 23 November 2009

All servers are time-synchronised using the NTP protocol. In pmtpa, this is managed by Puppet, with the ntp::client class in ntp.pp. The NTP servers are dobson and linne.

[edit] Testing

Testing is very important, as the November 2005 event demonstrated. It's easy to test whether ntpd is working.

/usr/sbin/ntpq -p

Here is the output from a happy server:

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*vl-2-0.csw1-pmt albert.pmtpa.wm  3 u   43   64  177    0.486    1.155   1.135

Note the asterisk in the first column, that tells you it's happy. It's synchronised to csw1, which is on stratum (st) 3, and the refid gives the stratum 2 server. The other important columns are:

  • when: this tells you how long ago it received a response from the server, in this case 43 seconds
  • offset: this tells you how far off the clock is, in milliseconds.

Here is the output from a server which is on its way to synchronisation:

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 vl-2-0.csw1-pmt albert.pmtpa.wm  3 u   81 1024    7    0.573  -203.77   1.635

There's no asterisk, which means it hasn't synchronised yet. The offset is substantial, so it will take a while to get into sync. The fact that the remote, refid, st and when columns are reasonable tells you that it is actually working. Hopefully we check back later, offset should be smaller.

Here is the output from a completely broken server:

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 vl-2-0.csw1-pmt .B▒▒.          16 u   16   64    0    0.000    0.000 4000.00

It seems to know what server it's meant to be reading from, but the other columns are just silly. There's no such thing as stratum 16, and I'm quite sure the network delay is meant to be more than zero. If you see something like this, you need to fix it.

ntpq can be run remotely. The output of ntpq -c peers csw1-pmtpa currently shows:

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 207.142.131.255 0.0.0.0         16 u    -   64    0    0.000    0.000 16000.0
 10.0.255.255    0.0.0.0         16 u    -   64    0    0.000    0.000 16000.0
 clock2.redhat.c .CDMA.           1 -  18d   64    0   73.200    1.322 16000.0
 ntp-s1.cise.ufl 85.83.78.79     16 -  18d 1024    0   18.140    0.673 16000.0
 raptor.tera-byt 0.0.0.0         16 -    - 1024    0    0.000    0.000 16000.0
*albert.pmtpa.wm ntp-s1.cise.ufl  2 u   45   64  377    0.790   -3.597   0.400

Three broken external servers, two broadcast domains and albert, which is a working stratum 2 server. Finally, albert gives:

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*ntp-s1.cise.ufl .USNO.           1 u   86  128  377   22.090   12.095   5.867
+ip-207-145-113- .GPS.            1 u   89  128  377   76.639   20.227   5.468
+solarnet.ru     hora.cs.tu-berl  2 u   92  128  377  180.877   11.987   3.481
-blah.jabber.dk  ntp2.sth.netnod  2 u   97  128  377  135.076   22.336   5.524
 LOCAL(0)        LOCAL(0)        10 l   12   64  377    0.000    0.000   0.001

A nearby stratum 1 server at the University of Florida is selected as the reference, but I've configured three other servers from pool.ntp.org in case that one goes down. Two of them are contributing to the averaging process, the third is ignored because its clock doesn't agree with the others. If all 4 are unreachable, the local clock will be used. It's currently undesirable because it's been declared stratum 10.

Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox