An early need was for structure validation software, to guard against local data entry mistakes and to locate the errors that occurred in some 10% of typed or typeset tables. Many errors were trivial to correct, but in the pre-email era a significant number had to be referred back to authors by letter. Crystallographers took these 'CCDC letters' in good part, and this was the beginning of a special relationship with the community that has enhanced the development of the CSD throughout the past 40 years.
An electronic bibliographic file was being regularly updated by 1970, and was disseminated via the Molecular Structures and Dimensions book series - itself one of the earliest handbooks to be typeset directly by computer. Meanwhile, the first 5,000 crystal structures were being validated and entered into a CSD data file. Finally, it was realised that a system of chemical structure representation was needed and a third component, a file of chemical connection tables, was created. 2D and 3D substructure search capabilities were now possible, adding tremendous value to the underlying crystal structure information. These three separate files were eventually amalgamated into the CSD that we know today.
The CCDC is responsible for three types of code:
The CCDC itself has been heavily involved in this research effort, and has published applications papers covering both intramolecular and intermolecular topics. Tables of mean bond lengths published in J.Chem.Soc, Perkin Trans (1987, pp S1-S19) and J.Chem.Soc. Dalton Trans. (1989, ppS1-S83) have now jointly received more than 10,000 citations. In the study of intermolecular interactions, the CSD has underpinned many fundamental contributions. These have helped to provide tools for studying protein-ligand interactions, and played a part in the emergence of crystal engineering as a sub-discipline. The CCDC's most cited paper in this area - more than 1,000 citations and the 60th most cited paper ever in the first 125 years of JACS - is the categorisation of short C-H...O interactions as true H-bonds (Taylor & Kennard, J. Amer. Chem. Soc., 104, 5063-70, 1982), work that re-shaped the global view of weaker interactions.
The CCDC maintains a web-accessible database of published applications of its products, and the 1,200 current entries chart the many and varied uses of the CSD. The CCDC is well represented with over 150 papers, but more than 1,000 other references show the truly international impact of CSD-based research.
Current CSD statistics are also available on the website, and although the CCDC encourages direct deposition of Private Communications, these statistics refer primarily to published data. The issue of the very large number of structures that languish unpublished in laboratory records is quite another matter, but one that must surely be addressed. Software for data processing and maintenance of both the CIF archive and the CSD are currently undergoing a major overhaul, and new software will incorporate much expert knowledge that has been gained over the past 40 years.
Recent years have also seen the CCDC diversify into developing and marketing specific software applications for rational drug design (GOLD, SuperStar, Relibase+) and for structure solution from powder diffraction data (DASH). All of these products make use of crystal structure data from the CSD or PDB in some way, and all except SuperStar are being developed through collaborations with industry and academia. The life sciences products, concentrating essentially on protein-ligand interactions and protein-ligand docking, help to solve difficult problems, and promote the value of small-molecule crystal structure data in structural biology and in the pharmaceutical and agrochemicals industries. The CCDC continues to broaden its horizons, by seeking new areas of science in which crystal structure data adds value to research and development activities.
The CCDC was grant-funded from 1965 until 1989, when it became an independent institution: a non-profit Company Limited by Guarantee and with charitable status. This means that the CCDC must be financially self-sufficient, and that any surplus income must be ploughed back into the company (e.g. for new equipment) or into specific charitable activities. Thus, the CCDC provides grants-in-aid for access to the CSD System in developing countries, sponsorship to students who are working on projects allied to the CCDC's interests, and support for the activities of relevant professional organisations. The CCDC's affairs are overseen by an international Board of Governors, eight eminent scientists who, in their turn, are responsible to UK Companies House and to the Charity Commissioners for England and Wales.
The CCDC has expanded steadily, and now has 50 employees divided between database creation, product development, research, scientific and technical support, and administration. The CCDC now has customers in academia and industry all over the world, and the nearly 2,000 CSD System licenses were distributed across 56 countries in 2004. The CCDC has a long history of scientific collaboration with academia and industry, and this work has fuelled our research output and fed into our product developments. Currently, the Pfizer Institute for Pharmaceuticals Materials Research, a major partnership involving the CCDC, Cambridge University and Pfizer Inc., is generating exciting results and further extending our areas of scientific interest.
We do not have a precise total of the number of staff and visitors who have worked at the CCDC over the past 40 years, but it must be 250 or more. What we do know is that they have left, or are leaving, their own mark on the organisation. It is the stronger for their contributions. Customers, scientific collaborators and data depositors also leave their mark, through their constructive input and feedback on our efforts. The CSD, our products, and ultimately all of our customers, have benefited enormously from these interactions, and we are grateful for their involvement.
We look forward to the next 40 years.
Frank Allen
www.ccdc.cam.ac.uk