overload94.pdf

OVERLOAD

CONTENTS

OVERLOAD 94

Overload is a publication of ACCU

For details of ACCU, our publications

and activities, visit the ACCU website:

www.accu.org

December 2009

ISSN 1354-3172

Editor

Ric Parkin

overload@accu.org

Advisors

Phil Bass

phil@stoneymanor.demon.co.uk

Richard Blundell

richard.blundell@gmail.com

Simon Farnsworth

simon@farnz.co.uk

Matthew Jones

m@badcrumble.net

Alistair McDonald

alistair@inrevo.com

Roger Orr

rogero@howzatt.demon.co.uk

Simon Sebright

simon.sebright@ubs.com

Paul Thomas

pthomas@spongelava.com

Anthony Williams

anthony.ajw@gmail.com

Shadow Data Types

Jon Jagger looks at hiding implementation details in C.

Creating a Framework for the iPhone

Pete Goodliffe tries to share code between iPhone

applications.

The Model Student: A Primal Skyline (Part 3)

Richard Harris concludes his investigation of the prime

factors of integers.

Project-Specific Language Dialects

Yaakov Belch, Sergey Ignatchenko and Dmytro

Ivanchykhin present a flexible approach to project-

specific language dialects.

Advertising enquiries

ads@accu.org

Quality Matters: A Case Study in Quality

Matthew Wilson assesses the quality of a simple

library.

Cover art and design

Pete Goodliffe

pete@goodliffe.net

Copy deadlines

All articles intended for publication in

Overload 95 should be submitted by

1st January 2010 and for

Overload 96 by 1st March 2010.

Copyrights and Trade Marks

Some articles and other contributions use terms that are either registered trade marks or claimed

as such. The use of such terms is not intended to support nor disparage any trade mark claim.

On request we will withdraw all references to a specific trade mark and its owner.

ACCU

ACCU is an organisation of programmers

who care about professionalism in

programming. That is, we care about

writing good code, and about writing it in

a good way. We are dedicated to raising

the standard of programming.

The articles in this magazine have all

been written by ACCU members - by

programmers, for programmers - and

have been contributed free of charge.

By default, the copyright of all material published by ACCU is the exclusive property of the author.

By submitting material to ACCU for publication, an author is, by default, assumed to have granted

ACCU the right to publish and republish that material in any medium as they see fit. An author

of an article or column (not a letter or a review of software or a book) may explicitly offer single

(first serial) publication rights and thereby retain all other rights.

Except for licences granted to 1) Corporate Members to copy solely for internal distribution 2)

members to copy source code for use on their own computers, no material can be copied from

Overload without written permission from the copyright holder.

Overload | 1

EDITORIAL

RIC PARKIN

A Crack in Time

Encoding messages has a long history.

Ric Parking looks back at how this

affected computing.

A couple of weeks ago, I attended the new ACCU

autumn conference held at Bletchley Park – home of

the Enigma Code crackers. I won’t go into much detail

– there should be some writeups in the next CVu – but

suffice to say it was a splendid day, full of fascinating

details of the history of computing, held at what could

be described as its birthplace.

One remarkable part of the day was going around the National Museum

of Computing, which is based in some of the maze of sprawling huts that

housed much of the original code breaking activity. You can imagine that

wandering around with around 90 fellow computer geeks made for some

interesting anecdotes, especially once you got to the rooms holding the

earliest home and personal computers – coos and sighs of “Ah I had one

of them” abounded. But one thing that really stuck in the mind was how

limited those computers were compared to now, and yet how the designers

and users were so ingenious at getting around those limitations. My

personal favourite was on the WITCH [WITCH] – it could use punched

paper tapes to load a program to run from its central store, but as that store

was very small they also used loops of tapes to hold subroutines that could

be run directly from the paper (this had one side effect that you’d have to

write your program so that a subroutine mustn’t get called too many times

– after about 500 calls the paper would break!)

But the main thrust of the day was all about codes and code breaking, and

it is significant that the first computers were created to make and break

codes. At the most simple level, it is because of the sheer amount of

repetitive actions that are involved, which have to be performed

accurately. But I think there’s something deeper going on here: codes and

cyphers are all about manipulation of symbolic representations, which

when you think about it, is all that a computer ever does – even the simplest

number type is actually stored as a pattern of bits, and it’s the

interpretation of that pattern that makes that pattern ‘be’ the number. The

fact that when you tell the computer to add two ‘numbers’, it actually does

manipulations of the symbols, such that the new interpreted pattern will

be as if it had added the two interpreted input numbers. (This idea is not

exactly new – see sidebar) How it achieves it can be very simple, or very

complex. An example would be ‘multiply by two’ – a compiler could use

the knowledge that numbers are stored as binary to turn that into a bitshift

operation, or invoke a large multiplication routine.

This sort of thinking can be very useful at times. For example, when I was

trying to get my head around the various flavours of Unicode, what really

made things fall into place was realising that a

‘Character’ is really just an interpretation of one

or more ‘Values’ – years of generally only using

7 bit US ASCII had lulled me into conflating the

Symbolic Computing and Alan Turing

Precursors go all the way back to Babbage, but the purest form

is surely that of Alan Turing (who of course is the most famous

of code breakers at Bletchley), in his paper ‘On Computable

Numbers, With an Application to the Entscheidungsproblem’

[Turing]. In it he proposed a thought experiment of a machine that

worked on an infinitely long paper tape divided into cells that

could either be empty or be filled with a symbol. It had a readwrite

head that would be positioned on a single cell, read the contents,

and could write or erase a symbol. It could also move the tape

left or right. A program was a simple table of rules. Each rule

consisted of what actions to do depending on the symbol at the

current position. Each action would erase or write a symbol at the

current location, then move the tape left, right or stay still, and

then change which rule to now apply.

Despite its extreme simplicity, Turing and Alonzo Church

suggested that any computable algorithm could be written on a

Turing machine (while not completely proven, this is now pretty

much accepted as true). Turing went further and wrote a set of

rules whose first act would be to read in from a tape the

description of a second set of rules, and then process the rest of

the tape as if it were the machine described by that input. In other

words, a programmable computer.

two. Realising that the interpretation is just as important as the actual

values involved suddenly made everything make sense. This then also led

me to ‘get’ a lot of the 8-bit extended ASCII issues.

And it goes further. How we choose to represent information can have a

significant effect on what we can do with it. Choose the right data structure

and a program can perform quickly in a small memory footprint.

Conversely a poor choice can mean it runs slowly, consume excessive

resources, and perhaps it might not even run at all – a totally unsuitable

choice may make implementing an algorithm too difficult, such that it’ll

either never be completed or be too buggy to use. Take as an example the

problem of organising 24-hour support cover. Everyone in your team

volunteers the hours they can cover, and you need to check that someone

will be on call at any time. One approach would be to make a list for each

person of the blocks of hours they can do, then iterate over them and

remove that hour from a list of uncovered hours (taking into account that

the hour might already be covered). That’s rather complex, will probably

have lots of memory allocations/deallocations, and will have some slow

lookups. Might even contain some bugs. An alternative is to represent the

hours a person can cover as a bitmask, then checking that all hours are

Ric Parkin has been programming professionally for around 20 years, mostly in C++, for a range of

companies from tiny startups to international corporations. Since joining ACCU in 2000, he’s left a trail

of new members behind him. He can be contacted at ric.parkin@gmail.com.

2 | Overload | December 2009

RIC PARKIN

EDITORIAL

subs were a week after they’ve sunk your ships. So the value of

information often decreases over time, sometimes very rapidly. This is a

very useful guide for deciding how strongly to encrypt something – you

want the information to be worthless by the time it is cracked. There’s also

a rather subtle value of the mere existence of a message. This is exploited

by traffic analysis, which tries to gather information from the patterns of

messages over time and where they are sent, so you don’t even need to

decrypt the messages to get information. For example using mobile phone

records to track a gang planning a robbery – from who calls whom you

can often learn about the command structure, and from number and time

of the calls you can spot that something is about to happen. You can defeat

traffic analysis by hiding the patterns in noise, such always sending a

message at the same time each day even if it only contains the equivalent

of ‘This page left intentionally blank’, but this increases the cost of hiding

your information.

Putting these two costs together, you should choose to encrypt your

messages such that the cost to you of encryption is less than the expected

cost of that information being disclosed at the time it can be deciphered.

So if the costs are so cheap, why aren’t we all encrypting as a matter of

course? Well, for several decades, governments have been trying to

control access to cryptography on the grounds that they need to be able to

read your messages in the name of law enforcement. In the 90s the US

government tried to impose the Clipper chip [clipper] to give them a

backdoor to voice communications. There was also the classification of

encryption as a munition and so liable to strict export controls, which led

to the absurdity of the T-shirt that would land you in prison for wearing

it as you left the US, as it had a four line implementation of the RSA

algorithm printed on it [Back]. And now, we recently had the full

Regulation of Investigatory Powers Act come into force, which makes it

a serious offence not to disclose your encryption keys.

And which dangerous criminal or terrorist was the first

to be jailed for this? A rocket hobbiest who distrusted

the police [RIPA]. Best not forget those passwords.

One of the few pictures of Alan Turing

as an adult, running in 1946.

Figure 1

covered is just a matter of ORing together all the masks, and checking all

the bits are set.

So in a sense, everything in a computer is already in code, it’s just that we

know how to interpret it. Encryption is only slightly different in that the

author is now actively trying to hide the method of interpretation.

(Although technically what’s usually being hidden is the key used to drive

an encryption algorithm, as it’s easy to create a new key but very very hard

to create a new algorithm). How hard they try to hide it depends on how

much it costs to do the encryption, and how valuable the information is.

Let’s quickly look at those two variables.

The cost of encryption is usually due to difficulty of knowing what to do,

inconvenience in doing it, and the cost of actually doing so, in time and

opportunity cost (that is the value of what else you could have been doing

instead). Many of these costs have plummeted over time, as encryption

algorithms have been publicised, and automated, and computing power

has increased so it doesn’t take too long to encrypt to a fairly strong

standard.

The value of the information can vary widely and depends on many things.

Simple monetary value is obvious. From Bletchley we learn that

government and military information can be very valuable, as knowing

where a U-boat pack is can mean the difference between a convoy getting

through with vital equipment, or being sunk. This also hints at the temporal

aspect of information’s value – it’s not much use knowing where those

References

[Back] http://www.cypherspace.org/adam/shirt/

[Clipper] http://en.wikipedia.org/wiki/Clipper_chip

[RIPA] http://www.theregister.co.uk/2009/11/24/ripa_jfl/

[Turing] http://www.thocp.net/biographies/papers/

turing_oncomputablenumbers_1936.pdf

[WITCH] http://en.wikipedia.org/wiki/WITCH_(computer)

December 2009 | Overload | 3

FEATURE

JON JAGGER

Shadow Data Types

Hiding implementation details is a good idea.

Jon Jagger shows a technique in C that avoids

some of the common problems.

uppose we have a type called wibble defined as a concrete data type

(that is, a type whose representation is fully visible) as shown in

Listing 1.

A definite downside with this approach is that clients cannot create objects

themselves.

#include "wibble.h"

void eg(void)

{

wibble * ptr; // ok

wibble value; // constraint violation

ptr = malloc(sizeof *ptr);

// constraint violation

}

The wibble type’s representation is defined in its source file and so only

code in the source file can create wibble objects. Furthermore, these wibble

objects have to be returned to users as pointers. These two constraints mean

the created objects cannot have auto storage class. This is a great loss since

auto storage class alone of the three storage class options allows the clients

to decide where the objects live which can greatly improve locality of

reference.

So how can wibble ’s source file actually create them? The first

possibility is to use an object with auto storage class:

...

wibble * wopen(int value)

{

wibble opened;

...

return &opened; // very very bad

}

...

A second possibility is to create the objects with static storage class:

...

static wibble wstorage[42];

static size_t windex = 0;

...

wibble * wopen(int value)

{

wibble * opened = wstorage[windex];

windex++;

...

return opened;

}

...

The final possibility is to create the objects with allocated storage class.

That is, to create the objects dynamically on the heap:

...

wibble * wopen(int value)

{

wibble * opened = malloc(sizeof *opened);

...

return opened;

}

...

#include "grommet.h"

#include "flange.h"

typedef struct

{

grommet g;

flange f;

} wibble;

void wopen(wibble * w, int i);

...

void wclose(wibble * w);

Listing 1

The definition of wibble exposes the types involved in its representation

( grommet and flange in this made up example), and hence requires a

#include for those types in its header file. This exposure has a price. One

cost is that any change to the grommet or flange header files, or any

header files they in turn #include , at any depth, will require a

recompilation of any source file that #include s wibble.h (either

directly or transitively). Another cost is that client code can easily become

reliant on the exposed representation rather than relying solely on the

functional api. Note that in C++ you can avoid this latter problem by

declaring your data members private.

These costs are sufficiently high that software developers have invented

techniques to hide a type’s representation: turn it into an Abstract Data

Type. This is simply a type that does not reveal its representation; a type

that abstracts away its representation. In this article I’ll look at two abstract

data type implementation techniques: Opaque Data Types, and Shadow

Data Types.

Opaque data types

The term Opaque Data Type is a well established term for the technique

of exposing only the name of the type in its header file. This is done with

a forward type declaration. This certainly has the effect of not exposing

any representation!

typedef struct wibble wibble;

wibble * wopen(int);

...

void wclose(wibble *);

Jon Jagger is a self-employed software consultant-trainer-

coach-mentor-programmer who works on a no-win no-fee

basis. He likes the technical aspects of software development

but mostly enjoys working with people. He can be contacted at

jon@jaggersoft.com.

4 | Overload | December 2009

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: