50 years of Computing in Manchester

50 years of Computing in Structural Chemistry and Molecular Biology

Meeting Report - June 1998, Manchester


In addition to the main summary, there are some detailed reports:

This meeting was held June 16 to 17th 1998 in the Department of Chemistry, University of Manchester in conjunction with celebrations in Manchester to mark the 50th Anniversary of the First stored-Program Computer, which was built at the University of Manchester under the leadership of the late Sir F.C. Williams. The first calculation, involving 3.5 million operations, was run successfully in June 1948 by Tom Kilburn, now Emeritus Professor of Computer Science. Ten Conferences took place in the University through the week of June 15th to 21st 1998, which was termed "Digital Week" by the City of Manchester. All ten Conferences shared the central extravaganza event held in the Bridgewater Hall in Manchester, which was called "The Golden Anniversary". (A 'Golden CD-ROM has been produced as a souvenir, details in the 'Editor' s Note at the end of this report)

For the Conference in the Chemistry Department a range of leading speakers from the UK and abroad enthusiastically addressed the topic 'Computers in Structural Chemistry and Molecular Biology' over the one and a half days of the conference. The meeting was supported by the British Biophysical Society, the British Crystallographic Association and the BBSRC Collaborative Computational Project for Protein Crystallography (CCP4). The financial assistance of Bruker, MSC and the IUCr, who each had advertising stands, was also gratefully acknowledged.

The 84 delegates from the UK, Europe and Australia were welcomed by Professor John Helliwell in his opening remarks. He went on to make the observation that recent years had seen the 100 year celebration of Röntgen's discovery of X-rays, 100 years of the electron and the Bragg Centenary celebration (the latter held in the Manchester Physics Department in 1990), and now it was the turn of the computer for its 50 year celebration!
delgates Delegates at the Conference: speakers are seated on the front row, left to right:
Alwyn Jones, Janet Thornton, Frank Allen, Durward Cruickshank, Syd Hall, Eleanor Dodson, John Helliwell, Tom Blundell, Tony Crowther, Michael Woolfson

Emeritus Professor Durward Cruickshank, UMIST, spoke on "The Manchester Ferranti Mark I computer: a marvellous first tool for crystallographic calculations." Durward described the famous 'Williams tube'; this was the computer memory, a Cathode Ray Tube on which binary information was stored as 2560 spots or their absences. Durward's team made regular excursions from Cox's Department in Leeds starting in 1952 to undertake some of the earliest crystallographic computations. These included differential syntheses to study anisotropic vibrations and calculations for Dorothy Hodgkin on wet vitamin B12. Anecdotes of how to keep teleprinters going through long nights of calculations summoned up the correct pioneering images for the audience.

Professor Syd Hall, from the University of Western Australia spoke about "Coping with the structural chemistry information explosion." Syd entered into the full spirit of the event with computer lap top demonstrations of the power of the crystallographic information file (CIF), which he had pioneered. CIF provides a standard format for interchange of crystallographic data between applications, between authors and journals, and allows automatic data checking against standard dictionaries. Clearly this will be ever more vital to keep pace with rapid technical innovations in crystallography, the latest being laboratory-based CCD area detectors. The development of the 'smart' dictionary, containing procedures as well as raw data, will make manipulation of data explicit and customisable, and could reverse the trend towards the 'black-box' use of programs. Syd was the British Crystallographic Association Lecturer.

Dr Frank Allen, Cambridge Crystallographic Data Centre, discussed "Crystallographic databases, data mining and knowledge engineering in structural chemistry".
By charting thousands of individual structures systematic trends can be discerned, unexpected correlations are revealed or dynamic pathways can be mapped. Database content can be converted into libraries of geometric structures. A knowledge-based library of intermolecular interactions (IsoStar) is now available and a library of conformational preferences is under development. Both libraries can drive software solutions in structural chemistry and molecular design, for example, predicting protein-ligand docking modes, assisting in structure solution from powder data, etc.

Professor Janet Thornton, University College London, spoke on "Computing and structural molecular biology: the universe of protein folds." The explosion of genome sequences is upon us, and the 3-D structures are growing in number too: a total of around 6000 protein 3-D X-ray structures exist in the Protein Data Bank (PDB). Janet presented analyses of the relationships between protein functions and protein folds. Using a CATHerine wheel format, the CATH protein structural classification for each functional family could be easily generated, so that at a glance the coloured area fractions showed up the distribution of folds for a particular function. The number of unique folds is certainly expected to be limited, and could guide which new protein structures to target first for 3-D structure determination.

Professor Sir Tom Blundell, University of Cambridge, closed the first day at the Chemistry Department with a lecture on "Protein structure, genomics and the analysis of function". Tom opened up with comments about his use of a third generation Ferranti computer in Oxford. He then explored, via a wealth of examples, how functional relationships between distant proteins could be revealed, with computations leading to new experiments. Thus, AIDS anti-virals followed from crystallographic studies on renin and the aspartic proteinases. The human pentraxins were shown to have a family relationship with lectins. Other studies he discussed in detail were insulin, and many others. The potential of these approaches in the context of genomes would be very powerful, and software written within his group, for example, for domain identification would allow targeting of new proteins with similar domains. In a genome, sequence analysis reveals that a large percentage of the proteins are of unknown function. Many mysteries clearly lie in store.

Eleanor Dodson of the University of York, this year's winner of the ACA Fankuchen award, spoke after the conference dinner on "The past, present and future of protein crystallographic computing". Eleanor described the early days in Dorothy's group and how communication by air mail was reserved for special stages of calculations! She led on to recommendations for much more automated and intelligent help tools for users of programme packages.

Emeritus Professor Michael Woolfson, University of York opened the second day's proceedings and spoke on "Direct methods computations: from small molecules to biological macromolecules." and took us through the various stages of development of direct methods, and glimpses of the use of the Cambridge EDSAC machine as well as the Manchester machine. He commented that this year was also the 50th anniversary of the seminal publication by Harker and Kasper of their inequality relationships in Acta Crystallographica (whose 50 year anniversary it is as well!). From these inequalities modern direct methods were born. Modern workstations now allow a million point Fourier Transform in a couple of seconds, and opened up such direct methods to calculations on proteins. The relative merits of anomalous scattering, direct methods and combinations of these methods were explored in the discussion.

Professor Alwyn Jones, Uppsala University, Sweden spoke on "The great strides made with molecular graphics." A career tour involving especially Munich and Uppsala, and other places on the way, with machine specs in Vax units and costs in Ferrari units, showed just how far computer price performance has improved. Beautifully illustrated slides of his electron density maps showed the audience the wide range of examples and cases met in Alwyn's many years' experience of this area. Alwyn was the BBSRC CCP4 Lecturer.

Dr Tony Crowther, MRC Cambridge expanded the range of the meeting yet further with a lecture on "Image reconstruction and computational techniques: Getting the best from electron micrographs of biological structures." Tony illustrated the power of computer methods for analysing images of individual particles with hepatitis B virus, the causative agent of a hugely important medical problem worldwide with 350 million carriers and 1 million deaths per annum. The structure of the 240 subunit core shell was determined by Bettina Boettcher to 7.4 Å from images of over 6000 individual particles, revealing the largely alpha-helical protein fold. This map provided a basis for Andrew Leslie and Samantha Wynne to solve the structure to 3.5 Å by X-ray crystallography.

Professor Axel Brunger, Yale University, USA was The British Biophysical Society Lecturer. Sadly, Axel's flight connections were broken by bad storms on the eastern seaboard of the US, so he had to send his apologies. His topic was to have been "Searching for the best structure: combining computer simulation, X-ray diffraction, and NMR.". His abstract had explained this was to be a full treatment of these complementary and powerful techniques, no doubt to cover the capabilities of his latest CNS program.

Professor John Helliwell closed the meeting by offering his own personal comments on the computer revolution picking up on the great change in disk space capability, taking us into the 'terabyte era' of structural computing. This was hugely important, for example, for those great structural data generators of today, synchrotron X-ray beamlines.
He highlighted this with examples from his laboratory of ultra-high resolution, time-resolved and MAD protein crystal structure studies. Indeed he outlined the prospect of genome level numbers of protein 3-D crystal structure determinations by anomalous scattering and direct methods in the future.

Prof J R Helliwell, Dr J Raftery, University of Manchester
Dr M Winn, CCP4, Daresbury Laboratory.


Editor's Notes:
1. The celebrations included a special exhibition in the Manchester Museum and the manufacture of a replica of the 1st stored program computer, known as 'The Baby', with a ceremonial running of the 1st program. The replica 'Baby' will be kept on display in Manchester, at the Museum of Science and Technology.

2. A CD-ROM 'The Computer that Changed the World' has been produced as an educational text for schools, a reference work for libraries and a souvenir of the 50th Anniversary celebrations. It has many pictures, descriptions of early computers, especially those in Manchester between 1946 and 1974. Details of all the celebrations can be found on the Web at http://www.computer50.org , including the mechanism of the Williams storage tube memory, and how to order the CD-ROM, which runs only on high power PCs.

3. If you are interested in old computers you will be fascinated by the Virtual Museum of Computing ( at URL http://www.museums.reading.ac.uk/vmoc/ ) which is a completely virtual collection of exhibits on the history of computers, etc.

4. If you prefer to see real old computers (gathering real dust not virtual ) you may like to visit Bletchley, where the Bletchley Park Trust is setting up a Museum for Second World War memorabilia and technological exhibits such as old wireless equipment and the Computer Conservation Society's collection of old computers, including an Elliot 803 which it hopes to get running again. They are replicating, Colossus, the first electronic computer which ran a fixed program Colossus was kept secret for over 30 years because it was used during the Second World War to help decode wireless messages encrypted by the Enigma machine. All the workers conformed so well to the Official Secrets Act that details of Colossus only came to light in about 1980 when they were released by the UK Government. Bletchley Park is easily reached by train since it is only a few hundred yards from the railway station. Opening hours vary, most weekends during the summer, details by telephone +44 (0)1908 640 404 or either of 2 Websites at:
http://www.cranfield.ac.uk/ccc/bpark/ or the Bletchley Park Trust's own at
http://chide.museum.org.uk/bletchley.park.trust/bletchley.park.trust.index.html

5. If you are more interested in buildings associated with computing people, you can visit Maida Vale, North West London, where English Heritage recently unveiled a plaque on Alan Turing's birthplace. Alan Hodges, Turing's biographer, maintains a web site with details of his life and work on computable numbers at http://www.turing.org.uk/turing/



Types of data storage for early computers

paper tape
English computers used '5 hole' paper tape, The smaller 6th hole was a timing track used to measure the speed of the tape through the reader. Initially, the tapes were read mechanically with fingers poking through the holes to make contacts. problems were that tape length was limited by the mechanical strength of the paper between the small holes of the clock track, which tended to tear when the tapes became longer and heavier. Later tape readers were optical, and much faster. It took some time for computer manufacturers to agree on a code standard so that paper tape made o one computer could be read into another.

Operator
Some early computers could only read a program into memory if it was on a single pieceof paper tape. The operator above is copying a program tape to a new one and copying any utility routies it needs on the end of the new tape. The master copies of the utility routines were kept in small filing cabinets like the one on the left. Card
American computers tended to use punched card storage, again conducting fingers poked through the holes to make a contact so that the circuit could detect which holes had been punched. Cards were easier to edit than paper tape, but had to be sequenced (in cols 72-80) if you were going to be able to get them in the right order after accidentally dropping them.



Summary of Durward Cruickshank's talk


I asked Durward Cruickshank, for a copy of his notes and references so that I could give a few more details of early computing for historians of crystallographic computing..

The page opposite shows some types of storage media used to feed the programs into the computer memory.

Durward began by explaining that crystallographers were very well placed to make good use of computers because they had already developed ways of breaking down the calculations into parts, using Fourier transforms and analogue devices such as the Beevers-Lipson strips.


He gave a list of early references to the use of Fourier Transforms, in one dimension

2D Fourier transforms

He also gave a list of references to the use of computers for crystallographic calculations.

 Computer  Authors  References
EDSAC Bennett & Kendrew  Acta Cryst (1952) 5, 109
MUEC  Ahmed and Cruickshank  Acta Cryst (1958) 6, 765
MUEC  Fowweather  Acta Cryst (1955) 8, 633
SWAC  Sparks, Prosen. Kruse, Trueblood  Acta Cryst (1956) 9, 350
FERUT  Ahmed & Barnes  Acta Cryst (1958) 11, 669

Conference on Biological Structures and Computational Methods XRAG London
Nov 1956 British Journal of Applied Physics (1958) 9,1

Conference on Computing methods in X-Ray Crystal Analysis, Glasgow, August 1960
editors Pepinsky, Robertson and Speakman publisher Pergammon (1961)



DIRECT METHODS, 1998 AND ALL THAT!


In June 1998 a series of meetings in Manchester commemorated the 50th anniversary of the invention in that city of the stored-program computer. One meeting, organized by John Helliwell, reviewed the impact of computers in crystallography. As it turns out 1998 is the 50th anniversary of two other important developments in crystallography - the birth of Acta Crystallographica and the origin of direct methods.

The first paper on direct methods was by D. Harker & J.S. Kasper (Acta Cryst., 1948, 1, 70) which showed that for very simple centrosymmetric structures inequality relationships could unambiguously indicate the signs of some strong structure factors. An important extension of this general principle, which applied to larger structures, was that probable signs could be found from sign relationships. This idea was presented in three papers which appeared together. The first by D. Sayre (Acta Cryst., 1952, 5, 60) introduced an exact equation between structure factors for equal-atom structures, generally known as the Sayre equation. The second by W. Cochran (Acta Cryst. 1952, 5, 65) derived the sign relationship from a mathematical expression of the physical condition that electron density is concentrated around atomic centres. The third by W.H. Zachariasen (Acta Cryst., 1952, 5, 68) was theoretically unsound but introduced the idea of representing unknown phases by symbols and showed how this could be used to solve a simple previously-unknown structure.

More important progress was made in 1955 and 1956. The first computer program, for the Cambridge machine (Electronic Delay Storage Automatic Calculator), EDSAC, was written to apply sign relationships and it was shown that the known structure salicylic acid could be solved in projection (W. Cochran & A.S. Douglas, Proc. Ray. Sac., 1955, A227, 486). Features of the program were the representation of signs by symbols, the generation of a number of possible sets of signs and the recognition of the correct set by a figure-of-merit test. In another development Cochran introduced the three-phase relationship (Acta Cryst., 1955, 8, 473) which extended the application of direct methods to non-centrosymmetric structures. The probability density of the sum of the phases of three large structure factors, with a null sum of indices, was shown to be concentrated around zero (modulo 2p ). Once a base of phase information was available a process of phase extension was available since if two phases in a relationship were known then the value of the third could be estimated. Finally, in 1956 J. Karle and H. Hauptman (Acta Cryst., 1956, 9, 635) introduced the tangent formula, a powerful tool for phase extension and refinement that was to play a central role in much that followed.

Although all the ingredients were now in place for solving structures by direct methods there was a lull of several years - with some progress but none of it very critical. Then
J. Karle & I.L. Karle (Acta Cryst., 1963, 16, 969) introduced the symbolic addition method, a systematic approach to using symbols which they later applied to the solution of a non-centrosymmetric structure (Acta Cryst., 1964, 17, 835). This demonstration that direct methods could be applied to non-centrosymmetric structures opened the floodgates to new powerful developments. G.Germain & M.M.Woolfson wrote the first fully-automatic program, LSAM, based on symbols, to solve centrosymmetric structures (Acta Cryst. 1968, B24, 91 ) in which the input was the basic observed data and the output an image of the structure. In the same paper the concept of MULTAN was described, an automatic multi-solution method for solving non-centrosymmetric structures. This method was developed and described in a series of papers over the next few years and by the late 1970's the widely distributed, user-friendly MULTAN program was responsible for the solution of almost two-thirds of reported structures. Many other automatic program systems and new concepts were produced. SHELX (G.M. Sheldrick, 1975, SHELX. Program for Crystal Structure Determination. University of Cambridge) was similar to MULTAN but with some novel features. RANTAN (J-X Yao,Acta Cryst., 1981, A37, 642) initially allocated completely random phases to all reflections to be phased so that phase refinement occurred without phase extension. SIMPEL (H. Schenk & C.T. Kiers, 1984, in Methods and Applications in Crystallographic Computing. Eds. S.R. Hall and T. Ashida,, Clarendon Press, Oxford) was an automated symbolic addition procedure for non-centrosymmetric structures. SAYTAN (T. Debaerdemaeker, C. Tate & M.M. Woolfson, Acta Cryst., 1985, A44, 353) replaced the tangent formula by a more complicated phasing formula so giving phases that satisfied Sayre's equation. XMY (T. Debaerdemaeker & M.M. Woolfson, Acta Cryst., 1989, A45, 349) used parameter-shift refinement to optimize a target function based on Sayre's equation. In 1990 G.M. Sheldrick (Acta Cryst. A46, 467) and T.N. Bhat (Acta Cryst. A46, 735) introduced the idea of simulated annealing to refine phases, borrowing ideas from liquid theorists who numerically studied systems in thermodynamic equilibrium. With this armoury of direct methods available it could be claimed by the end of the 1980's that for smaller crystal structures, with up to 100 or so non-hydrogen atoms in the asymmetric unit, the problem of structure solution was essentially solved.

While these developments were happening in direct methods there was spectacular progress in protein crystallography which was at the forefront of the new biotechnology revolution. Those working in the direct methods field began to set their sights on the solution of protein structures and eventually M.M. Woolfson and J-X Yao (Acta Cryst., 1990, A46, 409) showed that a small known protein, Avian Pancreatic Polypeptide (App) with 36 amino acids plus zinc and some ordered water, was amenable to solution by SAYTAN. Sheldrick showed that this structure could also be solved by his simulated annealing process. M. Mukherjee and M.M. Woolfson ( Acta Cryst., 1993, D49, 9; Acta Cryst., D51, 626) demonstrated that MULTAN could solve a larger protein, 2-Zn insulin, with over 800 non-hydrogen atoms in the asymmetric unit, and that useful phases could also be found for this structure with lower resolution data, down to about 2.25Å.

A new contribution to protein solution, the SHAKE and BAKE method, was introduced by the Buffalo group (C.M. Weeks, G.T. DeTitta, R. Miler & H. Hauptman, Acta Cryst.,1993, D49, 1 79). This involved phase refinement, either by the tangent formula or by parameter shift optimization of a target function, alternating with real-space filtering by selecting large peaks from a Fourier map. This has been shown to be able to solve protein structures with several hundred atoms by perfectly objective procedures.

Most of the demonstrations of structure solution for larger proteins have been for structures containing heavy atoms. Recently M.Mukherjee and M.M. Woolfson (Acta Cryst. in press) have shown that RNApl, a 96 amino-acid protein containing five sulphur atoms, can be solved quite automatically from data to finally-refined structure at 1.17Å without any intervention. If this process is generally applicable and not just a flash-in-the-pan, which is still to be demonstrated, then it offers the prospect of doing for small-to-moderate size proteins what has already been achieved for larger non-protein structures.

Michael Woolfson
University of York


Coping with the Information Explosion in Structural Science

a summary of Syd Hall's talk


There is considerable interest among structural scientists at the moment in the future impact of area-detector diffractometers and fast computers. These advances pose a number of concerns: will these devices flood us with data and structures; how will labs archive all of this data, and how will journals absorb the surge of new structures? This talk explored the nature of the data crisis, and what role data languages, such as CIF, would play in handling future data growth, and turning these into a benefit rather than a burden.

Seeing into the future is never easy, but one prediction is obvious.. computers will continue to evolve and expand rapidly in performance. In the next two years workstations with 100 gigabyte disks will be available, and terabyte disks within the decade... gigabyte RAM memories will become standard... and Internet links will operate at gigabyte bandwidths. Despite such dramatic improvements in technical support, scientific data is unlikely to grow rapidly. Depositions to databases, such as the CCDC and PDB (currently at 1200 and 160 per month, respectively), will continue to accelerate... but at a lesser rate.

The reason for this is also obvious. Although computer and instrument capacities have improved a thousand-fold in the past decade, that of the unaided human remains constant! The production of scientific knowledge is critically dependent on human capacity. We are all well aware of our own limitations so it makes sense in these hectic times to ask computers to do more smart tasks so that we can concentrate on more important stuff. Universal data languages, such as CIF, are essential if we want computers to do clever things. A fundamental requirement is that data be presented in a flexible, self- descriptive and computer-parsible way. Then data dictionaries can be used to specify the relationships between data and provide automatic methods for the evaluation of data based on these relationships.

In crystallography existing CIF dictionaries already provide much of this information for core, macromolecular and powder items and these are actively used by journals for data validation. Considerable software is available for manipulating CIF's and for developing local applications. Syd demonstrated how to access the CIFs at the Acta C web site and display published structures on the screen using local software. A similar approach works for viewing macromolecular structures from the PDB using, say, RASMOL,( the PD protein viewing program available on many platforms,) called from a web browser on a PC.

Details of the CIF web link are at http://www.crystal.uwa.edu.au/Crystal/Software/Ciffilter/ where you will see the latest details what needs to be done to install it.

The data languages such as CIF, and their associated dictionaries, open up all sorts of new and intelligent ways to handle data. The use of a tool like cif2pdb to direct data to RASMOL is a simple but useful example of the sorts of general applications that are possible with this approach.

This sort of portability and versatility is essential for application to more complicated data handling tasks. Languages such as CIF provide opportunities for moving the knowledge of data relationships out of computer programs into dictionaries, so that they can be used as electronic text books, or as interactive algorithms for software applications. This prediction will cause some programmers to grimace, as clearly such an approach would be slow on today's computers, but interpretive processes are likely to be widely used in the future. Just think of the algorithmic transparency compared to the current software black boxes!


Page last updated 29 Oct 1998
BCA Home page WebMaster [email protected]
<-- Click here to return to BCA homepage