overload94.pdf

(972 KB) Pobierz
1065321052.046.png
OVERLOAD
CONTENTS
OVERLOAD 94
Overload is a publication of ACCU
For details of ACCU, our publications
and activities, visit the ACCU website:
www.accu.org
December 2009
ISSN 1354-3172
Editor
Ric Parkin
overload@accu.org
Advisors
Phil Bass
phil@stoneymanor.demon.co.uk
Richard Blundell
richard.blundell@gmail.com
Simon Farnsworth
simon@farnz.co.uk
Matthew Jones
m@badcrumble.net
Alistair McDonald
alistair@inrevo.com
Roger Orr
rogero@howzatt.demon.co.uk
Simon Sebright
simon.sebright@ubs.com
Paul Thomas
pthomas@spongelava.com
Anthony Williams
anthony.ajw@gmail.com
4
Shadow Data Types
Jon Jagger looks at hiding implementation details in C.
7
Creating a Framework for the iPhone
Pete Goodliffe tries to share code between iPhone
applications.
11
The Model Student: A Primal Skyline (Part 3)
Richard Harris concludes his investigation of the prime
factors of integers.
17
Project-Specific Language Dialects
Yaakov Belch, Sergey Ignatchenko and Dmytro
Ivanchykhin present a flexible approach to project-
specific language dialects.
Advertising enquiries
ads@accu.org
26
Quality Matters: A Case Study in Quality
Matthew Wilson assesses the quality of a simple
library.
Cover art and design
Pete Goodliffe
pete@goodliffe.net
Copy deadlines
All articles intended for publication in
Overload 95 should be submitted by
1st January 2010 and for
Overload 96 by 1st March 2010.
Copyrights and Trade Marks
Some articles and other contributions use terms that are either registered trade marks or claimed
as such. The use of such terms is not intended to support nor disparage any trade mark claim.
On request we will withdraw all references to a specific trade mark and its owner.
ACCU
ACCU is an organisation of programmers
who care about professionalism in
programming. That is, we care about
writing good code, and about writing it in
a good way. We are dedicated to raising
the standard of programming.
The articles in this magazine have all
been written by ACCU members - by
programmers, for programmers - and
have been contributed free of charge.
By default, the copyright of all material published by ACCU is the exclusive property of the author.
By submitting material to ACCU for publication, an author is, by default, assumed to have granted
ACCU the right to publish and republish that material in any medium as they see fit. An author
of an article or column (not a letter or a review of software or a book) may explicitly offer single
(first serial) publication rights and thereby retain all other rights.
Except for licences granted to 1) Corporate Members to copy solely for internal distribution 2)
members to copy source code for use on their own computers, no material can be copied from
Overload without written permission from the copyright holder.
Overload | 1
1065321052.047.png 1065321052.048.png 1065321052.049.png 1065321052.001.png 1065321052.002.png
EDITORIAL
RIC PARKIN
A Crack in Time
Encoding messages has a long history.
Ric Parking looks back at how this
affected computing.
A couple of weeks ago, I attended the new ACCU
autumn conference held at Bletchley Park – home of
the Enigma Code crackers. I won’t go into much detail
– there should be some writeups in the next CVu – but
suffice to say it was a splendid day, full of fascinating
details of the history of computing, held at what could
be described as its birthplace.
One remarkable part of the day was going around the National Museum
of Computing, which is based in some of the maze of sprawling huts that
housed much of the original code breaking activity. You can imagine that
wandering around with around 90 fellow computer geeks made for some
interesting anecdotes, especially once you got to the rooms holding the
earliest home and personal computers – coos and sighs of “Ah I had one
of them” abounded. But one thing that really stuck in the mind was how
limited those computers were compared to now, and yet how the designers
and users were so ingenious at getting around those limitations. My
personal favourite was on the WITCH [WITCH] – it could use punched
paper tapes to load a program to run from its central store, but as that store
was very small they also used loops of tapes to hold subroutines that could
be run directly from the paper (this had one side effect that you’d have to
write your program so that a subroutine mustn’t get called too many times
– after about 500 calls the paper would break!)
But the main thrust of the day was all about codes and code breaking, and
it is significant that the first computers were created to make and break
codes. At the most simple level, it is because of the sheer amount of
repetitive actions that are involved, which have to be performed
accurately. But I think there’s something deeper going on here: codes and
cyphers are all about manipulation of symbolic representations, which
when you think about it, is all that a computer ever does – even the simplest
number type is actually stored as a pattern of bits, and it’s the
interpretation of that pattern that makes that pattern ‘be’ the number. The
fact that when you tell the computer to add two ‘numbers’, it actually does
manipulations of the symbols, such that the new interpreted pattern will
be as if it had added the two interpreted input numbers. (This idea is not
exactly new – see sidebar) How it achieves it can be very simple, or very
complex. An example would be ‘multiply by two’ – a compiler could use
the knowledge that numbers are stored as binary to turn that into a bitshift
operation, or invoke a large multiplication routine.
This sort of thinking can be very useful at times. For example, when I was
trying to get my head around the various flavours of Unicode, what really
made things fall into place was realising that a
‘Character’ is really just an interpretation of one
or more ‘Values’ – years of generally only using
7 bit US ASCII had lulled me into conflating the
Symbolic Computing and Alan Turing
Precursors go all the way back to Babbage, but the purest form
is surely that of Alan Turing (who of course is the most famous
of code breakers at Bletchley), in his paper ‘On Computable
Numbers, With an Application to the Entscheidungsproblem’
[Turing]. In it he proposed a thought experiment of a machine that
worked on an infinitely long paper tape divided into cells that
could either be empty or be filled with a symbol. It had a readwrite
head that would be positioned on a single cell, read the contents,
and could write or erase a symbol. It could also move the tape
left or right. A program was a simple table of rules. Each rule
consisted of what actions to do depending on the symbol at the
current position. Each action would erase or write a symbol at the
current location, then move the tape left, right or stay still, and
then change which rule to now apply.
Despite its extreme simplicity, Turing and Alonzo Church
suggested that any computable algorithm could be written on a
Turing machine (while not completely proven, this is now pretty
much accepted as true). Turing went further and wrote a set of
rules whose first act would be to read in from a tape the
description of a second set of rules, and then process the rest of
the tape as if it were the machine described by that input. In other
words, a programmable computer.
two. Realising that the interpretation is just as important as the actual
values involved suddenly made everything make sense. This then also led
me to ‘get’ a lot of the 8-bit extended ASCII issues.
And it goes further. How we choose to represent information can have a
significant effect on what we can do with it. Choose the right data structure
and a program can perform quickly in a small memory footprint.
Conversely a poor choice can mean it runs slowly, consume excessive
resources, and perhaps it might not even run at all – a totally unsuitable
choice may make implementing an algorithm too difficult, such that it’ll
either never be completed or be too buggy to use. Take as an example the
problem of organising 24-hour support cover. Everyone in your team
volunteers the hours they can cover, and you need to check that someone
will be on call at any time. One approach would be to make a list for each
person of the blocks of hours they can do, then iterate over them and
remove that hour from a list of uncovered hours (taking into account that
the hour might already be covered). That’s rather complex, will probably
have lots of memory allocations/deallocations, and will have some slow
lookups. Might even contain some bugs. An alternative is to represent the
hours a person can cover as a bitmask, then checking that all hours are
Ric Parkin has been programming professionally for around 20 years, mostly in C++, for a range of
companies from tiny startups to international corporations. Since joining ACCU in 2000, he’s left a trail
of new members behind him. He can be contacted at ric.parkin@gmail.com.
2 | Overload | December 2009
1065321052.003.png 1065321052.004.png 1065321052.005.png 1065321052.006.png 1065321052.007.png 1065321052.008.png 1065321052.009.png 1065321052.010.png 1065321052.011.png 1065321052.012.png
RIC PARKIN
EDITORIAL
subs were a week after they’ve sunk your ships. So the value of
information often decreases over time, sometimes very rapidly. This is a
very useful guide for deciding how strongly to encrypt something – you
want the information to be worthless by the time it is cracked. There’s also
a rather subtle value of the mere existence of a message. This is exploited
by traffic analysis, which tries to gather information from the patterns of
messages over time and where they are sent, so you don’t even need to
decrypt the messages to get information. For example using mobile phone
records to track a gang planning a robbery – from who calls whom you
can often learn about the command structure, and from number and time
of the calls you can spot that something is about to happen. You can defeat
traffic analysis by hiding the patterns in noise, such always sending a
message at the same time each day even if it only contains the equivalent
of ‘This page left intentionally blank’, but this increases the cost of hiding
your information.
Putting these two costs together, you should choose to encrypt your
messages such that the cost to you of encryption is less than the expected
cost of that information being disclosed at the time it can be deciphered.
So if the costs are so cheap, why aren’t we all encrypting as a matter of
course? Well, for several decades, governments have been trying to
control access to cryptography on the grounds that they need to be able to
read your messages in the name of law enforcement. In the 90s the US
government tried to impose the Clipper chip [clipper] to give them a
backdoor to voice communications. There was also the classification of
encryption as a munition and so liable to strict export controls, which led
to the absurdity of the T-shirt that would land you in prison for wearing
it as you left the US, as it had a four line implementation of the RSA
algorithm printed on it [Back]. And now, we recently had the full
Regulation of Investigatory Powers Act come into force, which makes it
a serious offence not to disclose your encryption keys.
And which dangerous criminal or terrorist was the first
to be jailed for this? A rocket hobbiest who distrusted
the police [RIPA]. Best not forget those passwords.
One of the few pictures of Alan Turing
as an adult, running in 1946.
Figure 1
covered is just a matter of ORing together all the masks, and checking all
the bits are set.
So in a sense, everything in a computer is already in code, it’s just that we
know how to interpret it. Encryption is only slightly different in that the
author is now actively trying to hide the method of interpretation.
(Although technically what’s usually being hidden is the key used to drive
an encryption algorithm, as it’s easy to create a new key but very very hard
to create a new algorithm). How hard they try to hide it depends on how
much it costs to do the encryption, and how valuable the information is.
Let’s quickly look at those two variables.
The cost of encryption is usually due to difficulty of knowing what to do,
inconvenience in doing it, and the cost of actually doing so, in time and
opportunity cost (that is the value of what else you could have been doing
instead). Many of these costs have plummeted over time, as encryption
algorithms have been publicised, and automated, and computing power
has increased so it doesn’t take too long to encrypt to a fairly strong
standard.
The value of the information can vary widely and depends on many things.
Simple monetary value is obvious. From Bletchley we learn that
government and military information can be very valuable, as knowing
where a U-boat pack is can mean the difference between a convoy getting
through with vital equipment, or being sunk. This also hints at the temporal
aspect of information’s value – it’s not much use knowing where those
References
[Clipper] http://en.wikipedia.org/wiki/Clipper_chip
[RIPA] http://www.theregister.co.uk/2009/11/24/ripa_jfl/
[Turing] http://www.thocp.net/biographies/papers/
turing_oncomputablenumbers_1936.pdf
December 2009 | Overload | 3
1065321052.013.png 1065321052.014.png 1065321052.015.png 1065321052.016.png 1065321052.017.png 1065321052.018.png 1065321052.019.png 1065321052.020.png 1065321052.021.png 1065321052.022.png 1065321052.023.png 1065321052.024.png 1065321052.025.png 1065321052.026.png 1065321052.027.png 1065321052.028.png 1065321052.029.png
 
FEATURE
JON JAGGER
Shadow Data Types
Hiding implementation details is a good idea.
Jon Jagger shows a technique in C that avoids
some of the common problems.
uppose we have a type called wibble defined as a concrete data type
(that is, a type whose representation is fully visible) as shown in
Listing 1.
A definite downside with this approach is that clients cannot create objects
themselves.
#include "wibble.h"
void eg(void)
{
wibble * ptr; // ok
wibble value; // constraint violation
ptr = malloc(sizeof *ptr);
// constraint violation
}
The wibble type’s representation is defined in its source file and so only
code in the source file can create wibble objects. Furthermore, these wibble
objects have to be returned to users as pointers. These two constraints mean
the created objects cannot have auto storage class. This is a great loss since
auto storage class alone of the three storage class options allows the clients
to decide where the objects live which can greatly improve locality of
reference.
So how can wibble ’s source file actually create them? The first
possibility is to use an object with auto storage class:
...
wibble * wopen(int value)
{
wibble opened;
...
return &opened; // very very bad
}
...
A second possibility is to create the objects with static storage class:
...
static wibble wstorage[42];
static size_t windex = 0;
...
wibble * wopen(int value)
{
wibble * opened = wstorage[windex];
windex++;
...
return opened;
}
...
The final possibility is to create the objects with allocated storage class.
That is, to create the objects dynamically on the heap:
...
wibble * wopen(int value)
{
wibble * opened = malloc(sizeof *opened);
...
return opened;
}
...
S
#include "grommet.h"
#include "flange.h"
typedef struct
{
grommet g;
flange f;
} wibble;
void wopen(wibble * w, int i);
...
void wclose(wibble * w);
Listing 1
The definition of wibble exposes the types involved in its representation
( grommet and flange in this made up example), and hence requires a
#include for those types in its header file. This exposure has a price. One
cost is that any change to the grommet or flange header files, or any
header files they in turn #include , at any depth, will require a
recompilation of any source file that #include s wibble.h (either
directly or transitively). Another cost is that client code can easily become
reliant on the exposed representation rather than relying solely on the
functional api. Note that in C++ you can avoid this latter problem by
declaring your data members private.
These costs are sufficiently high that software developers have invented
techniques to hide a type’s representation: turn it into an Abstract Data
Type. This is simply a type that does not reveal its representation; a type
that abstracts away its representation. In this article I’ll look at two abstract
data type implementation techniques: Opaque Data Types, and Shadow
Data Types.
Opaque data types
The term Opaque Data Type is a well established term for the technique
of exposing only the name of the type in its header file. This is done with
a forward type declaration. This certainly has the effect of not exposing
any representation!
typedef struct wibble wibble;
wibble * wopen(int);
...
void wclose(wibble *);
Jon Jagger is a self-employed software consultant-trainer-
coach-mentor-programmer who works on a no-win no-fee
basis. He likes the technical aspects of software development
but mostly enjoys working with people. He can be contacted at
jon@jaggersoft.com.
4 | Overload | December 2009
1065321052.030.png 1065321052.031.png 1065321052.032.png 1065321052.033.png 1065321052.034.png 1065321052.035.png 1065321052.036.png 1065321052.037.png 1065321052.038.png 1065321052.039.png 1065321052.040.png 1065321052.041.png 1065321052.042.png 1065321052.043.png 1065321052.044.png 1065321052.045.png
Zgłoś jeśli naruszono regulamin