CM10135 / Programming II:   Lecture 16


More Things the Net Might Throw At You


I. House Clearing

  1. Last week of lectures from me.
  2. The exam: There will be four questions total --- two from me & two from Dr. Paddon.  You will have to answer three of the four questions (so you can only skip one).
    1. Last year, Dr. Paddon & I gave almost identical marks (I checked) so don't bother guessing who marks easier. 
    2. Remember that we don't just want to see the right answer.  We want to see that you understand the answer too.  Be sure to explain your answer.  
    3. Even if you are not sure what the answer is, write as much as you know about the topic.  You can get points for being partially right! 
    4. (This advice is good for all exams, not just this one!)
  3. Here are the topics from my half of the unit:
    1. Programming, how it really works, where memory is stored, the difference between programming and programming languages.  * Only one dedicated lecture, but also a recurring theme throughout.
    2. Algorithms & Complexity   (Searching & Sorting were examples)  *** the biggest section!   6 lectures.
    3. Non-linear control:  Exceptions, Concurrency & Threading.   ** second biggest -- 4 lectures & mentioned in GUIs and Networking.
    4. GUIs
    5. Networking
  4. Given that you'll be getting whole courses in GUIs & Networking in the future, my questions have been drawn from the first three areas.  
    1. Important:  My questions don't cover just one lecture, they will cover concepts, and the concepts are intertwined between lectures.
      1. You need to study well.
      2. That's why I'm telling you what to study.
    2. Nothing on coursework 2 will be on the exam.
    3. Nothing taught this week will be on the exam.
  5. The next two lectures are about artificial intelligence, which is really in many cases a type of search, so they will help you revise what you know about search and complexity.
  6. Today's lecture is an introduction to internet networking & protocols.
  7. Coursework Marking:
    1. I'm glad many students have gotten programs working for the first time here.
      1. Unfortunately, we can't give you extra marks for that, this is programming II.
    2. Last year many students also first understood programming second term.f
      1. Their marks got better & better across the three courseworks.
      2. This is partly why CW1 is only worth 10 points instead of 13.333333...
    3. Appeal procedure on marking:
      1. Go to the TAs.
      2. If it can't be resolved, sign up for my office hours using this link (only accessable from campus.)
      3. I still won't set a new mark without checking again with the TAs so that everything is fair.

II. Protocols

  1. A protocol is a system / procedure for communicating between two computers.
  2. As with military & diplomatic protocols, it's designed to make certain that mistakes or misunderstandings don't happen.
    1. Computers are even stupider than people, so computer protocols have to be followed very precisely.
    2. For both humans and computers, failure to follow correct protocol is one way to spot intruders.
      1. Example from "From Russia with Love"
      2. James Bond: Pardon me, do you have a match?
        Agent: I use a lighter.
        James Bond: Better still.
        Agent: Until they go wrong.
        James Bond: Exactly.
      3. Actually, in the movie, this protocol is run several times, with Bond taking the part of either agent. 
        1. The purpose of the protocol is for two agents who don't know each other to make certain they are talking to the right person before they reveal anything.
        2. It doesn't matter which agent goes first, what matters is
          1. that the conversation could be started with anyone, but
          2. it would be unlikely to be completed correctly except by another program / agent that is running the same protocol.
  3. In computer's, protocols happen at many different levels simultaneously
  4. From freesoft.org's tutorials:

    a picture of 7 layers, you're not missing anything!

    The seven layers of the OSI Basic Reference Model are (from bottom to top):

    1. The Physical Layer describes the physical properties of the various communications media, as well as the electrical properties and interpretation of the exchanged signals. Ex: this layer defines the size of Ethernet coaxial cable, the type of BNC connector used, and the termination method.

    2. The Data Link Layer describes the logical organization of data bits transmitted on a particular medium. Ex: this layer defines the framing, addressing and checksumming of Ethernet packets.

    3. The Network Layer describes how a series of exchanges over various data links can deliver data between any two nodes in a network. Ex: this layer defines the addressing and routing structure of the Internet.

    4. The Transport Layer describes the quality and nature of the data delivery. Ex: this layer defines if and how retransmissions will be used to ensure data delivery.

    5. The Session Layer describes the organization of data sequences larger than the packets handled by lower layers. Ex: this layer describes how request and reply packets are paired in a remote procedure call.

    6. The Presentation Layer describes the syntax of data being transferred. Ex: this layer describes how floating point numbers can be exchanged between hosts with different math formats.

    7. The Application Layer describes how real work actually gets done. Ex: this layer would implement file system operations.

    The original Internet protocol specifications defined a four-level model, and protocols designed around it (like TCP) have difficulty fitting neatly into the seven-layer model. Most newer designs use the seven-layer model.

  5. Actually having this many layers of protocol is controversial, because it may slow things down, and it's unclear that it brings much advantage.  But it is useful to have at least a few layers, e.g.
    1. The machine / hardware layer (get the bits into the right format for the wires / computer).
    2. The communication / networking layer, on the internet typically TCP/IP (check & route the packets).
    3. The application layer.
  6. Application protocols are often human readable so that they can be debugged.  But lower level protocols are only for machines.
  7. For example, you can sometimes see parts of the Send Mail Transfer Protocol (SMTP) in messages that have bounced.
  8. You can also use it through telnet (see the last networking lecture for more on telnet!)
  9. Here's an example of me forging mail to myself, from my other (older) self.
    1. I start out trying to be Elvis, but the mail server is on to this trick these days!
      [jjb@jjb op.papers]$  telnet XXXXX 25
      Trying 138.38.108.3...
      Connected to XXXXXXXXX.ac.uk (138.38.XXX.XXX).
      Escape character is '^]'.
      220 XXXXXXXXX.ac.uk ESMTP Exim 4.30 Tue, 16 Mar 2004 08:00:12 +0000
      HELP
      214-Commands supported:
      214 AUTH HELO EHLO MAIL RCPT DATA NOOP QUIT RSET HELP
      RCPT TO: jjb@cs.bath.ac.uk
      503 sender not yet given
      MAIL FROM: elvis@graceland.com
      250 OK
      RCPT TO: jjb@cs.bath.ac.uk
      550 RFCs mandate HELO/EHLO before mail may be sent
      HELO
      501 Syntactically invalid HELO argument(s)
      HELP HELO
      214-Commands supported:
      214 AUTH HELO EHLO MAIL RCPT DATA NOOP QUIT RSET HELP
      HELO jjb.cs.bath.ac.uk
      250 air.cs.bath.ac.uk Hello jjb.cs.bath.ac.uk [138.38.108.1]
      RCPT TO: jjb@cs.bath.ac.uk
      503 sender not yet given
      MAIL FROM: elvis@kingdom.co
      250 OK
      RCPT TO: jjb@cs.bath.ac.uk
      550-Verification failed for
      550-Unrouteable address
      550 Sender verify failed
      RCPT FROM: joanna@ai.mit.edu
      500-unrecognized command
      500 Too many syntax or protocol errors
      Connection closed by foreign host.
      [jjb@jjb op.papers]$ telnet XXXXXX 25
      Trying 138.38.108.3...
      Connected to XXXXXXXXX.ac.uk (138.38.XXXXXX).
      Escape character is '^]'.
      220 air.cs.bath.ac.uk ESMTP Exim 4.30 Tue, 16 Mar 2004 08:04:01 +0000
      HELO jjb.cs.bath.ac.uk
      250 air.cs.bath.ac.uk Hello jjb.cs.bath.ac.uk [138.38.108.1]
      RCPT TO: jjb@cs.bath.ac.uk
      503 sender not yet given
      MAIL FROM: joanna@ai.mit.edu
      250 OK
      DATA
      503 valid RCPT command must precede DATA
      RCPT TO: jjb@cs.bath.ac.uk
      250 Accepted
      DATA
      354 Enter message, ending with "." on a line by itself
      Hi Dr. Bryson, this is me pretending to be myself at MIT.
      .
      250 OK id=1B39ZG-0004yz-DU
      quit
      221 XXXXXXXXXX.ac.uk closing connection
      Connection closed by foreign host.

      (and then in my mail spool I got...)

      Return-Path:
      Received: from air ([unix socket])
      by air (Cyrus v2.1.15) with LMTP; Tue, 16 Mar 2004 08:05:15 +0000
      X-Sieve: CMU Sieve 2.2
      Return-path:
      Envelope-to: jjb@cs.bath.ac.uk
      Delivery-date: Tue, 16 Mar 2004 08:05:15 +0000
      Received: from [138.38.108.1] (helo=jjb.cs.bath.ac.uk)
      by air.cs.bath.ac.uk with smtp (Exim 4.30)
      id 1B39ZG-0004yz-DU
      for jjb@cs.bath.ac.uk; Tue, 16 Mar 2004 08:05:15 +0000
      X-Spam-Score: 2.9 (++)

      Hi Dr. Bryson, this is me pretending to be myself at MIT.
    2. Notice this doesn't actually mention joanna@ai.mit.edu -- although I needed a valid address to send mail, it didn't get stuck into the Return-Path 
      1. I probably could have put it in with a few more arguments buried in the DATA.
      2. But that wouldn't have generated a full path through the internet.
      3. This is something spam assasin looks for!  Notice I got a spam score without saying any bad words.
    3. This example was run in 2004 -- it may be that things have gotten even smarter now...
  10. The interesting thing is that the SMTP protocol is only the numbers & letters + their arguments.  
    1. You know there can be many different mail clients that use this protocol, you've probably used a few different ones (pine, netscape, outlook, mail).
    2. People can also write different servers!  For example, here's the mail server that's built into my Linux laptop (Redhat 9)
      [joanna@sydney CM10135]$ telnet localhost 25
      Trying 127.0.0.1...
      Connected to localhost.
      Escape character is '^]'.
      220 localhost.localdomain ESMTP Sendmail 8.12.8/8.12.8; Thu, 18 Mar 2004 14:08:45 GMT
      HELO cs.bath.ac.uk
      250 localhost.localdomain Hello localhost.localdomain [127.0.0.1], pleased to meet you
    3. The geeks who wrote this program chose to add ", pleased to meet you" after the formal protocol, because the spec told them it didn't matter what they said.  
      1. Or possibly the geeks that wrote the other mailer chose to drop that part off, so it would look more professional, knowing that no one bothered to parse that bit.
      2. Either way, the point is that the protocol is what matters, the clients & servers can change as long as they observe it.
  11. You may also want to look at this lecture on internet applications from Dave Hollinger

III. How Internet Addressing Works

  1. See the Wikipedia entry in IP Addresses.
  2. Here are some great lecture notes on IP Addresses & DNS for Java (in PDF), and some decent ones on the Architecture of the Internet (in HTML).
  3. Internet invented by Al Gore, no seriously, by Larry Roberts & Tom Merrill (see the Brief History of the Internet)
    1. who in 1965 connected a computer at MIT with one from Stanford (? CA anyway, maybe some DARPA lab), 
    2. they invented the idea of breaking data into packets & resending ones that got lost.
    3. Most research was funded by the US Military, "ARPANET" 
      1. so the government could survive nuclear war.
      2. very decentralized -- any computer that gets a packet knows how to send it to another computer that can
    4. Larry Roberts invented email in 1972.
    5. MILNET (for the US military) & ARPANET split in 1983, same time as TCP/IP was adopted as main protocol 
      1. 1983 is also when I got my first email account, coincidently.
      2. When we used to use email in the 80's, email took a day to get around the world.
        1. We thought this was amazing!
        2. Links in the internet phoned each other maybe once every hour or two to see if there were any bits to send.
    6. Al Gore did help get a lot of US taxpayer money into developing the Internet for commercial use --- politics does matter.
  4. Internet addresses currently are four 8-bit numbers (so each goes up to 28, or 256), so there can be 4,294,967,296 unique IP addresses -- probably not enough for 6,000,000,000 people!
  5. Getting a packet to the right place:
    1. first get it to a machine that matches the first number,
    2. that machine should be able to get it to a machine that matches the second number...
    3. so routers only need to know how to get to 4*256 machines, (1024), not 4 billion.
    4. logarithmic!  divide & conquer!
  6. Traceroute will show you how packets have to travel --- again, if you can find a machine that still runs it (I had to go to MIT...)
    /home/ai/joanna % traceroute cs.bath.ac.uk
    traceroute to cs.bath.ac.uk (138.38.108.2), 30 hops max, 40 byte packets
    1 net-chex (128.52.37.10) 1 ms 1 ms 1 ms
    2 anacreon (128.52.0.10) 2 ms 3 ms 1 ms
    3 radole (18.24.10.3) 2 ms 76 ms 74 ms
    4 B24-RTR-2-LCS.MIT.EDU (18.201.1.1) 70 ms 109 ms 72 ms
    5 EXTERNAL-RTR-2-BACKBONE.MIT.EDU (18.168.0.27) 89 ms 105 ms 87 ms
    6 MIT-GIGAPOPNE.nox.org (192.5.89.89) 95 ms 84 ms 80 ms
    7 192.5.89.10 (192.5.89.10) 91 ms 99 ms 98 ms
    8 198.32.11.62 (198.32.11.62) 73 ms 118 ms 115 ms
    9 ny.uk1.uk.geant.net (62.40.96.170) 175 ms 109 ms 133 ms
    10 janet-gw.uk1.uk.geant.net (62.40.103.150) 129 ms 143 ms 148 ms
    11 po3-0.lond-scr3.ja.net (146.97.35.133) 150 ms 134 ms 143 ms
    12 po6-0.read-scr.ja.net (146.97.33.13) 173 ms 153 ms 163 ms
    13 po2-0.bris-scr.ja.net (146.97.33.49) 141 ms 191 ms 201 ms
    14 gi0-1.frenchay-bar.ja.net (146.97.35.82) 191 ms 186 ms *
    15 146.97.40.198 (146.97.40.198) 156 ms 167 ms 138 ms
    16 bath-1-brisf-1-r1.swern.net.uk (194.82.125.50) 138 ms 158 ms 161 ms
    17 bath-gw-1-bath-1.swern.net.uk (194.82.125.198) 143 ms 193 ms *
    18 earth.cs.bath.ac.uk (138.38.108.2) 174 ms 153 ms 155 ms
    19 earth.cs.bath.ac.uk (138.38.108.2) 166 ms 168 ms 171 ms
    20 earth.cs.bath.ac.uk (138.38.108.2) 209 ms 184 ms 184 ms
    1. JA is the JANET, the UK's main network.
    2. It used to be really annoying, because it made all the addresses backwards, e.g. uk.ac.bath.midge
    3. lots of clients & servers would get confused translating, even though they should have known better
      1. Mail to ai.mit.edu would wind up in Antigua!
    4. Moral:  Even when there's a clear protocol, making things too tricky confuses programmers & things break.
  7. What about going to Hong Kong?
    /home/ai/joanna % traceroute www.hkbu.edu.hk
    traceroute to net1.hkbu.edu.hk (158.182.4.1), 30 hops max, 40 byte packets
    1 net-chex (128.52.37.10) 1 ms 1 ms 1 ms
    2 anacreon (128.52.0.10) 2 ms 1 ms 1 ms
    3 radole (18.24.10.3) 2 ms 111 ms 109 ms
    4 B24-RTR-2-LCS.MIT.EDU (18.201.1.1) 114 ms 90 ms 118 ms
    5 EXTERNAL-RTR-2-BACKBONE.MIT.EDU (18.168.0.27) 118 ms 103 ms 84 ms
    6 MIT-GIGAPOPNE.nox.org (192.5.89.89) 91 ms 104 ms 107 ms
    7 192.5.89.10 (192.5.89.10) 130 ms 112 ms 126 ms
    8 chinng-nycmng.abilene.ucaid.edu (198.32.8.82) 148 ms 162 ms 121 ms
    9 iplsng-chinng.abilene.ucaid.edu (198.32.8.77) 123 ms 134 ms 135 ms
    10 kscyng-iplsng.abilene.ucaid.edu (198.32.8.81) 146 ms 149 ms 141 ms
    11 dnvrng-kscyng.abilene.ucaid.edu (198.32.8.13) 122 ms 150 ms 126 ms
    12 snvang-dnvrng.abilene.ucaid.edu (198.32.8.1) 175 ms 137 ms 163 ms
    13 losang-snvang.abilene.ucaid.edu (198.32.8.94) 152 ms 153 ms 166 ms
    14 tpr2-transpac-la.jp.apan.net (203.181.248.130) 269 ms 276 ms 258 ms
    15 taiwan-tpr2.jp.apan.net (203.181.248.153) 290 ms 320 ms 334 ms
    16 m160-1-0-0-OC3.tw.ascc.net (140.109.251.42) 308 ms 318 ms 330 ms
    17 m20-1-1-0-OC3.hk.ascc.net (140.109.251.45) 360 ms 347 ms 360 ms
    18 192.245.196.249 (192.245.196.249) 335 ms 364 ms 383 ms
    19 202.40.217.90 (202.40.217.90) 433 ms 457 ms 418 ms
    20 202.125.249.5 (202.125.249.5) 385 ms 391 ms 373 ms
    21 202.125.249.21 (202.125.249.21) 483 ms 402 ms 453 ms
    22 202.125.249.34 (202.125.249.34) 395 ms 382 ms 342 ms
    23 158.182.118.73 (158.182.118.73) 478 ms 556 ms 588 ms
    24 158.182.118.82 (158.182.118.82) 529 ms 588 ms 620 ms
    25 * * *
    26 * * *
    27 * * *
    ^C
    /home/ai/joanna % 
  8. The path out of MIT is the same, but we spend some time at a big internet hub (ucaid.edu) before hopping to Japan, then Taiwan, then Hong Kong.  DNS doesn't have names for the last few machines!
  9. Of course, there's loads more to this story 
    1. For example, look at the notes I linked to above for pictures of packets.
    2. Or take networking in your final year!

IV. Finalé

  1. I summarized the course & told you about the exam.
  2. I also told you about protocols & their levels, & gave an example of SMTP.
  3. I also told you a little bit about internet addressing if there was time.
  4. Next two lectures AI

page author: Joanna Bryson
25 April  2005