CM10228
/ Programming II: Lecture
3
Trees
& Logs
"In teoria, non c'e' differenza tra teoria e pratica. Ma in
pratica
c'e'"
(In theory, there is no
difference between theory and practice.
But, in practice, there is.)
-- Jan L.A. van de Snepscheut
-I. Housekeeping:
- How are the tutorials going?
- If you get tired of using The Lovelace Lab, don't forget you
can
also use the main
university
computer
labs.
- No required text for this course.
- There may be no java books that are also good CS texts.
- The ones Dr. De Vos suggested are fine.
- The on-line notes may link to more resources.
- Books for you guys if you really want:
- Books only cover GUIs, threads, events, applets, networking
(hard book), basic programming (easy book).
- Books don't cover algorithms, complexity, data structures,
but
see front web page for excellent
notes
on data structures, sorting (with animations!), searching
and
complexity.
- Books:
- easier:
- Java
Programming
Today, Barbara Johnston. Um, it's hard to buy
anymore, but
it's in the library & it's a good book for basic
learning.
- Thinking in
Java, by Bruce Eckel. Notice that he has the
old version of
the book on line, and you can buy the newer version
(2006.)
- harder:
- Learning
Java
by Niemeyer & Knudsen (O'Reilly, the one with the
tiger mum &
cubs on the cover). Note that all O'Reilly Java
books have tigers
on the cover, including the general reference book, but
this is the one
about learning java. This is available
on-line
for
free if you are on campus or VPNing to Bath.
- May also want to look at Object-Oriented
Programming
with Java by David
Barnes. It has nice networking
examples too.
- An
Introduction
to
Network
Programming
with
Java, by Jan Graba (new
Sept 2006), also looks good. It doesn't only talk
about
networking; it talks about file handling, threads, servlets,
CORBA
(middleware) etc. I haven't really read it thoroughly,
but it
looked good enough I've ordered a few copies (bookstore
&
library.) Notice that now that it's on Springer, you
should be
able to get free access to it online through the library, I
think.
This is available
on-line
for
free if you are on campus or on VPN to Bath.
-II. Reverse
(defun reverse-help (oldlist newlist)
(cond ((null oldlist) newlist)
(t (reverse-help (cdr oldlist) (cons (car oldlist) newlist)))))
(defun my-reverse (list)
(reverse-help list nil))
- draw memory for them too -- leave up for tree stuff below.
- This week's lab.
I. Criteria for Evaluating an Algorithm (Revision, more or less)
- Main Criteria
- Speed
- Size
- Risk of failure.
- Ease of maintenance.
- Speed is the main criteria that we'll talk about.
- Speed and Size are the two things Computer Scientists might be
talking about when they talk about complexity
formally.
- Of course, the conventional meaning of complexity (how
complicated
it is to understand the algorithm) affects both risk and
maintenance.
- Often worth going with something slightly slower if it will
be
easier to maintain.
- Software development is often more of a bottleneck than
processor speed.
- Good programmers are more expensive than fast computers.
- Brooks' Law: Adding manpower to
a late software project makes it later.
- This law is so old it's not even gender neutral (In
fact,
from "The Mythical Man-Month", 1975) but it's still
true.
- Slashdot
review
of
The Mythical Man-Month 20th Anniversary Edition.
- But sometimes, time really matters.
- Graphics, games engines.
- Database engines.
- Simulations:
- Weather simulations.
- Social or political simulations of millions of
agents,
- Science: the evolution of life, evolution
of culture,
the
big bang, hydrogen atoms,
brain cells etc.
- If you are a good programmer with spare time &
interested in modelling the evolution of culture,
come visit
me during my office hours. AmonI
- Right after the first
time I gave this lecture (2004) I got a talk
announcement on the
importance of algorithms for molecular biology /
drug discovery from Bruce
R. Donald.
Whether
you care about helping humanity or making money (not
an
xor) that's an important research field.
- How do you measure Speed?
- Stop watch usually isn't recommended (though see
the quote above!)
- Speed of one instance of an algorithm depends on:
- the processor it's run on
- other components (e.g. graphics card, bus)
- What else is happening on the computer.
- Amount of RAM available.
- This is only true because read & write operations
take
time, even to memory.
- But they take more
time if they have to go to disk.
- If a computer runs out of space in RAM, it swaps memory onto
disk.
- Very bad thing: if most
time is spent swapping, little on the computation.
- Can happen if working with very large data sets, not
processing the data efficiently.
- But this isn't what most computers scientists are talking
about when they talk about time complexity.
- Algorithms are normally analyzed in terms of:
- The number of operations they perform.
- The types of operations they perform.
- How
the
number of operations they perform changes if parameters
change.
The key point!!
- These criteria are the same for both time and space.
- Usually ignore most of the operations and focus on a few
that
are most significant for time or space (whichever is being
measured)
- e.g. for time disk reads, hard arithmetic
- e.g. for space `new' commands (things that allocate more
memory.)
- How
the
number of operations they perform changes if parameters
change?
- This
question
is
referred to as scaling.
- Scaling happens with respect to some parameter.
- Example: As an animal gets taller, its weight
typically
scales at height3.
- This is because weight is directly related to volume, not
height.
- volume = height x width x depth. If you assume width
&
depth are also correlated to height, then volume is
correlated to height3.
- Bone strength can grow as height2 but
exoskeletons
can't,
so vertebrates can grow bigger than insects.
- Example in algorithms: finding the length of an array:
How does this scale on the number of items in a
collection?
- Just look up a variable in the collection object that tells
you
its length.
- Always takes the same number of steps however many items
there are.
- This is called a constant
time algorithm.
- Start at the beginning and count until you reach the last
item
(which must be marked somehow, like in the lists.)
- The number of steps is dependent on the number of objects.
- This is a linear
algorithm.
- If you are checking for how many unique items are in the
collection,
then for each item of the list you will have to check if any
of the
other
items are the same, so you go through the list once for each
item in
the
list.
- The number of steps is the square of the number of items
in
the collection.
- This is said to scale quadratically.
Notice that an algorithm may look very
good
at a low N, but then turn out to be a nightmare at higher N!
On
the
other hand, if you know for certain that
you
will only have low N for an application, you may still want to
consider
that algorithm.
II. Logarithms & Logarithmic Complexity
- You need to remember what logarithms are to understand
complexity
analysis.
- More
review
about logarithms.
- A slightly heavier logarithm
page.
- But don't worry -- if you understand this page, that's
all you need for this unit.
- The default base for log is 10. So log(100)=2 is just
another way to say 102 = 100.
- In computer science, we normally use base 2. E.g. log2(8)
=3.
- We like base 2 because of bits. For example, the
number
of
bits you need to represent an integer is logarithmic (base 2)
on its
value
--
0
|
1
|
10
|
11
|
100
|
101
|
110
|
111
|
1000
|
|
20
|
21
|
|
22
|
|
|
|
23
|
0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
- If you think about it, a logarithmic
complexity would be a good thing for an algorithm -- not as good
as constant
time, but much better than linear.
- In fact, a log2 complexity
would be
exactly as much better than linear as quadratic
is worse. [draw on graph]
- But can we write an algorithm of logarithmic complexity?
III. Trees for Sorting and Searching
- A tree is like a list, except that instead of a next reference, each
node has
two references, left
and right.
- the objects at left & right are called children.
- the object that points directly to another object is called
its parent.
- the top parent is called the root (back to the tree
metaphor.)
- An algorithm for sorting a list using a tree.
- Take the first number in the list, make it the root.
- For each number in the list, start at the root object
- If the new number is less
than
the current object, go to its left child.
- If the new number is larger, go to its right child.
- (assume you can ignore duplicates.)
- If the child you've gone to is empty, insert your number
in a
new object.
- Otherwise go back to the
most
recent
1 (that this is 5 of.)
- Someone
asked in class one year about the duplicate case (I'd actually
skipped
step 3
above
on the board.) If you are only looking for the existence
of an
object
(see our search below) duplicates can be ignored / skipped.
If
you
do care about how many items you have of each type, you can
have your
node
objects contain a variable that counts how many instances you
found of
that
object. You could even imagine keeping a list of objects
at each
node
in the tree if you like! You'll learn more about complex
data
structures in Programming II, but if you are keen now, have a
look
at b-trees.
- Example: 15, 23, 14, 27, 3, 18, 5 gives:
15
/ \
14 23
/ / \
3 18 27
\
5
- The depth of this tree grows roughly at log2
relative
to the number of items.
- Notice that no number will ever come to the right of the 14
(& not just because I ran out of space.)
- So log2 is the best case, it's unlikely to be the
actual case.
- There
are algorithms for rebalancing trees though, so if the number
of items
you
have happens to be exactly a power of 2, you could get the
optimal case.
- To find out whether a number is in the tree, almost the same
algorithm works:
- Start at the root.
- If you find your number, return #true, if you find a null
node,
return #false.
- Assuming
you didn't just finish, if the number you are looking for is
less than
your
current node, move to the left, otherwise to the right.
- Go back to step 2.
- If the tree is perfectly balanced, then you will only have to
look at a maximum of log2 N items (where N
is the
length of the original list, the Number of
items.)
- Obviously the base of the logarithm here is dependent on the
number
of children, so you could do even better if you had more
children.
- But the algorithm gets more complicated if you can't just
use
</left vs >/right.
- You'll learn such an algorithm next year in association
with
databases, e.g. b-trees
- Another reason computer scientists like log2.
VII. What have we learned?
- Algorithmic complexity.
- Logarithms.
- Taught Trees
- Next lecture will actually be longer than an hour.
page author: Joanna
Bryson
13 February 2012