CM10135 / Programming II:   Lecture 12


Protocols & the Internet


I. Protocols

  1. A protocol is a system / procedure for communicating between two computers.
  2. As with military & diplomatic protocols, it's designed to make certain that mistakes or misunderstandings don't happen.
    1. Computers are even stupider than people, so computer protocols have to be followed very precisely.
    2. For both humans and computers, failure to follow correct protocol is one way to spot intruders.
      1. Example from "From Russia with Love"
      2. James Bond: Pardon me, do you have a match?
        Agent: I use a lighter.
        James Bond: Better still.
        Agent: Until they go wrong.
        James Bond: Exactly.
      3. Actually, in the movie, this protocol is run several times, with Bond taking the part of either agent. 
        1. The purpose of the protocol is for two agents who don't know each other to make certain they are talking to the right person before they reveal anything.
        2. It doesn't matter which agent goes first, what matters is
          1. that the conversation could be started with anyone, but
          2. it would be unlikely to be completed correctly except by another program / agent that is running the same protocol.
  3. In computers, protocols happen at many different levels simultaneously
  4. From freesoft.org's tutorials:

    a picture of 7 layers, you're not
                missing anything!

    The seven layers of the OSI Basic Reference Model are (from bottom to top):

    1. The Physical Layer describes the physical properties of the various communications media, as well as the electrical properties and interpretation of the exchanged signals. Ex: this layer defines the size of Ethernet coaxial cable, the type of BNC connector used, and the termination method.

    2. The Data Link Layer describes the logical organization of data bits transmitted on a particular medium. Ex: this layer defines the framing, addressing and checksumming of Ethernet packets.

    3. The Network Layer describes how a series of exchanges over various data links can deliver data between any two nodes in a network. Ex: this layer defines the addressing and routing structure of the Internet.

    4. The Transport Layer describes the quality and nature of the data delivery. Ex: this layer defines if and how retransmissions will be used to ensure data delivery.

    5. The Session Layer describes the organization of data sequences larger than the packets handled by lower layers. Ex: this layer describes how request and reply packets are paired in a remote procedure call.

    6. The Presentation Layer describes the syntax of data being transferred. Ex: this layer describes how floating point numbers can be exchanged between hosts with different math formats.

    7. The Application Layer describes how real work actually gets done. Ex: this layer would implement file system operations.

    The original Internet protocol specifications defined a four-level model, and protocols designed around it (like TCP) have difficulty fitting neatly into the seven-layer model. Most newer designs use the seven-layer model.

  5. Actually having this many layers of protocol is controversial, because it may slow things down, and it's unclear that it brings much advantage. 
    1. The five-layer TCP/IP model skips the presentation & session layer above.
  6. But it is useful to have at least a few layers, e.g.
    1. The machine / hardware layer (get the bits into the right format for the wires / computer).
    2. The communication / networking layer, on the internet typically TCP/IP (check & route the packets).
    3. The application layer.
  7. Application protocols are often human readable so that they can be debugged.  But lower level protocols are only for machines.
  8. For example, you can sometimes see parts of the Send Mail Transfer Protocol (SMTP) in messages that have bounced.
  9. You can also use it through telnet (see the last networking lecture for more on telnet!)
  10. Here's an example of me forging mail to myself, from my other (older) self.
    1. I start out trying to be Elvis, but the mail server is on to this trick these days!
      [jjb@jjb op.papers]$  telnet XXXXX 25
      Trying 138.38.108.3...
      Connected to XXXXXXXXX.ac.uk (138.38.XXX.XXX).
      Escape character is '^]'.
      220 XXXXXXXXX.ac.uk ESMTP Exim 4.30 Tue, 16 Mar 2004 08:00:12 +0000
      HELP
      214-Commands supported:
      214 AUTH HELO EHLO MAIL RCPT DATA NOOP QUIT RSET HELP
      RCPT TO: jjb@cs.bath.ac.uk
      503 sender not yet given
      MAIL FROM: elvis@graceland.com
      250 OK
      RCPT TO: jjb@cs.bath.ac.uk
      550 RFCs mandate HELO/EHLO before mail may be sent
      HELO
      501 Syntactically invalid HELO argument(s)
      HELP HELO
      214-Commands supported:
      214 AUTH HELO EHLO MAIL RCPT DATA NOOP QUIT RSET HELP
      HELO jjb.cs.bath.ac.uk
      250 air.cs.bath.ac.uk Hello jjb.cs.bath.ac.uk [138.38.108.1]
      RCPT TO: jjb@cs.bath.ac.uk
      503 sender not yet given
      MAIL FROM: elvis@kingdom.co
      250 OK
      RCPT TO: jjb@cs.bath.ac.uk
      550-Verification failed for
      550-Unrouteable address
      550 Sender verify failed
      RCPT FROM: joanna@ai.mit.edu
      500-unrecognized command
      500 Too many syntax or protocol errors
      Connection closed by foreign host.
      [jjb@jjb op.papers]$ telnet XXXXXX 25
      Trying 138.38.108.3...
      Connected to XXXXXXXXX.ac.uk (138.38.XXXXXX).
      Escape character is '^]'.
      220 air.cs.bath.ac.uk ESMTP Exim 4.30 Tue, 16 Mar 2004 08:04:01 +0000
      HELO jjb.cs.bath.ac.uk
      250 air.cs.bath.ac.uk Hello jjb.cs.bath.ac.uk [138.38.108.1]
      RCPT TO: jjb@cs.bath.ac.uk
      503 sender not yet given
      MAIL FROM: joanna@ai.mit.edu
      250 OK
      DATA
      503 valid RCPT command must precede DATA
      RCPT TO: jjb@cs.bath.ac.uk
      250 Accepted
      DATA
      354 Enter message, ending with "." on a line by itself
      Hi Dr. Bryson, this is me pretending to be myself at MIT.
      .
      250 OK id=1B39ZG-0004yz-DU
      quit
      221 XXXXXXXXXX.ac.uk closing connection
      Connection closed by foreign host.

      (and then in my mail spool I got...)

      Return-Path:
      Received: from air ([unix socket])
      by air (Cyrus v2.1.15) with LMTP; Tue, 16 Mar 2004 08:05:15 +0000
      X-Sieve: CMU Sieve 2.2
      Return-path:
      Envelope-to: jjb@cs.bath.ac.uk
      Delivery-date: Tue, 16 Mar 2004 08:05:15 +0000
      Received: from [138.38.108.1] (helo=jjb.cs.bath.ac.uk)
      by air.cs.bath.ac.uk with smtp (Exim 4.30)
      id 1B39ZG-0004yz-DU
      for jjb@cs.bath.ac.uk; Tue, 16 Mar 2004 08:05:15 +0000
      X-Spam-Score: 2.9 (++)

      Hi Dr. Bryson, this is me pretending to be myself at MIT.
    2. Notice this doesn't actually mention joanna@ai.mit.edu -- although I needed a valid address to send mail, it didn't get stuck into the Return-Path 
      1. I probably could have put it in with a few more arguments buried in the DATA.
      2. But that wouldn't have generated a full path through the internet.
      3. This is something spam assassin looks for!  Notice I got a high spam score without saying any bad words.
    3. This example was run in 2004 -- in lecture I'll show you that it's not much different in 2010.
  11. The interesting thing is that the SMTP protocol is only the numbers & letters + their arguments.  
    1. You know there can be many different mail clients that use this protocol, you've probably used a few different ones (pine, gmail, outlook, mail).
    2. People can also write different servers!  For example, here's the mail server that's built into my Linux laptop (Redhat 9)
      [joanna@sydney CM10135]$ telnet localhost 25
      Trying 127.0.0.1...
      Connected to localhost.
      Escape character is '^]'.
      220 localhost.localdomain ESMTP Sendmail 8.12.8/8.12.8; Thu, 18 Mar 2004 14:08:45 GMT
      HELO cs.bath.ac.uk
      250 localhost.localdomain Hello localhost.localdomain [127.0.0.1], pleased to meet you
    3. The geeks who wrote this program chose to add ", pleased to meet you" after the formal protocol, because the spec told them it didn't matter what they said.  
      1. Or possibly the geeks that wrote the other mailer chose to drop that part off, so it would look more professional, knowing that no one bothered to parse that bit.
      2. Either way, the point is that the protocol is what matters, the clients & servers can change as long as they observe it.
  12. You may also want to look at this lecture on internet applications from Dave Hollinger

II. How Internet Addressing Works

  1. See the Wikipedia entry in IP Addresses.
  2. Here are some great lecture notes on IP Addresses & DNS for Java (in PDF), and some decent ones on the Architecture of the Internet (in HTML).
  3. Internet invented by Al Gore, no seriously, by Larry Roberts & Tom Merrill (see the Brief History of the Internet)
    1. who in 1965 connected a computer at MIT with one from Stanford (? CA anyway, maybe some DARPA lab), 
    2. they invented the idea of breaking data into packets & resending ones that got lost.
    3. Most research was funded by the US Military, "ARPANET" 
      1. so the government could survive nuclear war.
      2. very decentralized -- any computer that gets a packet knows how to send it to another computer that can
    4. Larry Roberts also invented email in 1972.
    5. MILNET (for the US military) & ARPANET split in 1983, the same time as TCP/IP was adopted as main protocol 
      1. 1983 is also when I got my first email account, coincidently.
      2. When we used to use email in the 80's, email took a day to get around the world.
        1. We thought this was amazing!
        2. Links in the internet phoned each other maybe once every hour or two to see if there were any bits to send.
    6. Al Gore did help get a lot of US taxpayer money into developing/expanding the Internet for commercial use --- politics does matter.
  4. Internet addresses currently are four 8-bit numbers (so each goes up to 28, or 256), so there can be 4,294,967,296 unique IP addresses -- probably not enough for 7,000,000,000 people!
  5. Getting a packet to the right place:
    1. first get it to a machine that matches the first number,
    2. that machine should be able to get it to a machine that matches the second number...
    3. so routers only need to know how to get to 4*256 machines, (1024), not 4 billion.
    4. logarithmic!  divide & conquer!
  6. Traceroute will show you how packets have to travel --- again, if you can find a machine that still runs it (I had to go to MIT... in 2004!)
    /home/ai/joanna % traceroute cs.bath.ac.uk
    traceroute to cs.bath.ac.uk (138.38.108.2), 30 hops max, 40 byte packets
    1 net-chex (128.52.37.10) 1 ms 1 ms 1 ms
    2 anacreon (128.52.0.10) 2 ms 3 ms 1 ms
    3 radole (18.24.10.3) 2 ms 76 ms 74 ms
    4 B24-RTR-2-LCS.MIT.EDU (18.201.1.1) 70 ms 109 ms 72 ms
    5 EXTERNAL-RTR-2-BACKBONE.MIT.EDU (18.168.0.27) 89 ms 105 ms 87 ms
    6 MIT-GIGAPOPNE.nox.org (192.5.89.89) 95 ms 84 ms 80 ms
    7 192.5.89.10 (192.5.89.10) 91 ms 99 ms 98 ms
    8 198.32.11.62 (198.32.11.62) 73 ms 118 ms 115 ms
    9 ny.uk1.uk.geant.net (62.40.96.170) 175 ms 109 ms 133 ms
    10 janet-gw.uk1.uk.geant.net (62.40.103.150) 129 ms 143 ms 148 ms
    11 po3-0.lond-scr3.ja.net (146.97.35.133) 150 ms 134 ms 143 ms
    12 po6-0.read-scr.ja.net (146.97.33.13) 173 ms 153 ms 163 ms
    13 po2-0.bris-scr.ja.net (146.97.33.49) 141 ms 191 ms 201 ms
    14 gi0-1.frenchay-bar.ja.net (146.97.35.82) 191 ms 186 ms *
    15 146.97.40.198 (146.97.40.198) 156 ms 167 ms 138 ms
    16 bath-1-brisf-1-r1.swern.net.uk (194.82.125.50) 138 ms 158 ms 161 ms
    17 bath-gw-1-bath-1.swern.net.uk (194.82.125.198) 143 ms 193 ms *
    18 earth.cs.bath.ac.uk (138.38.108.2) 174 ms 153 ms 155 ms
    19 earth.cs.bath.ac.uk (138.38.108.2) 166 ms 168 ms 171 ms
    20 earth.cs.bath.ac.uk (138.38.108.2) 209 ms 184 ms 184 ms
    1. JA is the JANET, the UK's main network.
    2. It used to be really annoying, because it made all the addresses backwards, e.g. uk.ac.bath.midge
    3. lots of clients & servers would get confused translating, even though they should have known better
      1. Mail to ai.mit.edu would wind up in Antigua!
    4. Moral:  Even when there's a clear protocol, making things too tricky confuses programmers & things break.
  7. What about going to Hong Kong?
    /home/ai/joanna % traceroute www.hkbu.edu.hk
    traceroute to net1.hkbu.edu.hk (158.182.4.1), 30 hops max, 40 byte packets
    1 net-chex (128.52.37.10) 1 ms 1 ms 1 ms
    2 anacreon (128.52.0.10) 2 ms 1 ms 1 ms
    3 radole (18.24.10.3) 2 ms 111 ms 109 ms
    4 B24-RTR-2-LCS.MIT.EDU (18.201.1.1) 114 ms 90 ms 118 ms
    5 EXTERNAL-RTR-2-BACKBONE.MIT.EDU (18.168.0.27) 118 ms 103 ms 84 ms
    6 MIT-GIGAPOPNE.nox.org (192.5.89.89) 91 ms 104 ms 107 ms
    7 192.5.89.10 (192.5.89.10) 130 ms 112 ms 126 ms
    8 chinng-nycmng.abilene.ucaid.edu (198.32.8.82) 148 ms 162 ms 121 ms
    9 iplsng-chinng.abilene.ucaid.edu (198.32.8.77) 123 ms 134 ms 135 ms
    10 kscyng-iplsng.abilene.ucaid.edu (198.32.8.81) 146 ms 149 ms 141 ms
    11 dnvrng-kscyng.abilene.ucaid.edu (198.32.8.13) 122 ms 150 ms 126 ms
    12 snvang-dnvrng.abilene.ucaid.edu (198.32.8.1) 175 ms 137 ms 163 ms
    13 losang-snvang.abilene.ucaid.edu (198.32.8.94) 152 ms 153 ms 166 ms
    14 tpr2-transpac-la.jp.apan.net (203.181.248.130) 269 ms 276 ms 258 ms
    15 taiwan-tpr2.jp.apan.net (203.181.248.153) 290 ms 320 ms 334 ms
    16 m160-1-0-0-OC3.tw.ascc.net (140.109.251.42) 308 ms 318 ms 330 ms
    17 m20-1-1-0-OC3.hk.ascc.net (140.109.251.45) 360 ms 347 ms 360 ms
    18 192.245.196.249 (192.245.196.249) 335 ms 364 ms 383 ms
    19 202.40.217.90 (202.40.217.90) 433 ms 457 ms 418 ms
    20 202.125.249.5 (202.125.249.5) 385 ms 391 ms 373 ms
    21 202.125.249.21 (202.125.249.21) 483 ms 402 ms 453 ms
    22 202.125.249.34 (202.125.249.34) 395 ms 382 ms 342 ms
    23 158.182.118.73 (158.182.118.73) 478 ms 556 ms 588 ms
    24 158.182.118.82 (158.182.118.82) 529 ms 588 ms 620 ms
    25 * * *
    26 * * *
    27 * * *
    ^C
    /home/ai/joanna % 
  8. The path out of MIT is the same, but we spend some time on a big internet backbone crossing the USA before hopping to Japan, then Taiwan, then Hong Kong.  DNS doesn't have names for the last few machines! 
  9. That was from 2004 or so --- in 2010 we go directly to HK from asia netcom
    1. and also, BATH is giving out a lot less dns info!
    2. google.cn just moved to Hong Kong today, note their traceroute is *way* more efficient (I wonder how they do that?)
      1. 18 very quick hops...
        Thea:~ joanna$ traceroute google.cn
        traceroute to google.cn (74.125.95.160), 64 hops max, 40 byte packets
         1  api (192.168.1.254)  207.080 ms  1.288 ms  1.407 ms
         2  217.32.141.0 (217.32.141.0)  21.465 ms  21.697 ms  22.224 ms
         3  217.32.140.222 (217.32.140.222)  21.994 ms  21.916 ms  22.787 ms
         4  213.120.161.34 (213.120.161.34)  27.092 ms  27.571 ms  27.340 ms
         5  217.41.222.10 (217.41.222.10)  29.634 ms  28.298 ms  28.629 ms
         6  217.41.222.178 (217.41.222.178)  27.234 ms  29.000 ms  27.248 ms
         7  217.41.222.121 (217.41.222.121)  28.240 ms  27.720 ms  27.585 ms
         8  core2-gig4-0-0.birmingham.ukcore.bt.net (217.32.170.73)  29.601 ms  29.640 ms  30.760 ms
         9  core2-pos0-6-4-0.ilford.ukcore.bt.net (62.6.204.62)  33.862 ms  32.878 ms  32.726 ms
        10  core4te-0-7-1-0.telehouse.ukcore.bt.net (62.172.102.29)  34.553 ms  34.498 ms  33.860 ms
        11  195.99.125.82 (195.99.125.82)  32.650 ms  32.983 ms  31.632 ms
        12  209.85.255.175 (209.85.255.175)  34.109 ms  33.490 ms 209.85.252.76 (209.85.252.76)  32.848 ms
        13  216.239.43.192 (216.239.43.192)  107.529 ms 209.85.250.54 (209.85.250.54)  133.474 ms  101.692 ms
        14  209.85.251.233 (209.85.251.233)  123.811 ms  123.294 ms  123.764 ms
        15  72.14.232.141 (72.14.232.141)  139.370 ms  135.284 ms  133.270 ms
        16  209.85.241.27 (209.85.241.27)  163.011 ms  143.349 ms 209.85.241.35 (209.85.241.35)  142.509 ms
        17  209.85.240.49 (209.85.240.49)  142.248 ms 72.14.239.189 (72.14.239.189)  139.446 ms 209.85.240.49 (209.85.240.49)  141.200 ms
        18  iw-in-f160.1e100.net (74.125.95.160)  140.527 ms  145.639 ms  136.314 ms
        Thea:~ joanna$ 
      2. Oh, that's actually through BT -- here it is from the CS department:
      3.  1  fire-private (172.16.0.1)  0.917 ms  0.284 ms  0.318 ms
         2  gw (138.38.108.254)  0.814 ms  0.620 ms  0.707 ms
         3  swan-wren-10g1.bath.ac.uk (138.38.255.1)  0.852 ms  0.606 ms  0.485 ms
         4  4948-bath-bath.swern.net.uk (194.82.120.1)  1.473 ms  1.929 ms  1.464 ms
         5  fren-bath-ph.swern.net.uk (194.83.94.64)  2.499 ms  2.211 ms  2.227 ms
         6  so-2-1-0.read-sbr1.ja.net (146.97.42.185)  7.771 ms  7.933 ms  8.680 ms
         7  so-6-0-0.lond-sbr3.ja.net (146.97.33.166)  9.543 ms  9.148 ms  9.381 ms
         8  po1.lond-ban3.ja.net (146.97.35.106)  9.335 ms  9.245 ms  9.193 ms
         9  google.lond-ban3.ja.net (193.62.157.30)  9.542 ms  9.740 ms  9.607 ms
        10  209.85.252.76 (209.85.252.76)  9.962 ms 209.85.255.175 (209.85.255.175)  9.952 ms 209.85.252.76 (209.85.252.76)  9.870 ms
        11  216.239.43.192 (216.239.43.192)  84.140 ms  84.225 ms  84.322 ms
        12  209.85.251.233 (209.85.251.233)  98.936 ms 216.239.46.215 (216.239.46.215)  105.018 ms 216.239.46.14 (216.239.46.14)  108.268 ms
        13  209.85.241.22 (209.85.241.22)  119.369 ms  121.651 ms 72.14.232.141 (72.14.232.141)  115.884 ms
        14  209.85.241.37 (209.85.241.37)  121.752 ms  115.770 ms 209.85.241.29 (209.85.241.29)  117.308 ms
        15  209.85.240.45 (209.85.240.45)  120.919 ms 72.14.239.189 (72.14.239.189)  130.899 ms 209.85.240.49 (209.85.240.49)  118.196 ms
        16  iw-in-f160.1e100.net (74.125.95.160)  122.741 ms  116.697 ms  122.691 ms

  10. Of course, there's loads more to this story 
    1. For example, look at the notes I linked to above for pictures of packets.
    2. Or take networking in your final year!

III.  Summary

  1. I also told you about protocols & their levels, & gave an example of SMTP.
  2. I also told you a little bit about internet addressing.

page author: Joanna Bryson
23 March 2010