Newsletter

Issue 46

Summer 197


Personalia

Chemweb Inc., which produces ChemWeb.com, the global Web club for chemical researchers, has appointed Bill Town to the new position of operations director. Bill will manage the day-to-day running of the Web environment for chemists while continuing to build the ChemWeb.com library of chemical databases and journals.

For the academic year 1997/1998, Julia Fletcher will be in the US on sabbatical from the Quisitor office, which will be closing at the end of June 1997. Julia will not be taking on any patent searching while she is away. She plans to re-open the office in September 1998 and from then on it will be business as usual.

Sheila Ash has returned from Tripos in the US to work on chemoinformatics products at Oxford Molecular, where she has two roles; one with the New Business Development Unit, and another with the European sales force. She can be contacted on +44-(0)1865-784600.

Muriel Levenbach is now the CSA representative at Solvay-Duphar, replacing Mr J.R. Nienhuis, who has retired.


CSA Internet Workshop

The recent two day symposium at Sheffield was one of those little gems of which we small company types dream. It was pitched at a brilliant price, it had fewer than twenty delegates, and it was conducted by very knowledgeable staff, who could communicate at just the right level. Our party consisted of an arm-waver (me), our IT manager and our technical manager. To have fulfilled the needs of all three was pretty good, I thought.

Day one was run by David Miller, who gave a very lucid introduction to the Internet, passing through e-mail, mailing lists and newsgroups. The real star of the show, however, was the Web, and quite rightly, most of the course was spent explaining how to find information on the Web, and how to provide information in return.

The overall view which I received was that only those areas with a Web champion were going to be well represented. Mark Winter was an excellent example of a dabbler who quickly became a professional, and in the process produced a Web area of world class significance. Such areas seem to be sadly lacking for organic and physical chemistry... we live in hope!

One area which we felt had been given insufficient attention was that of molecular graphics viewers. It would appear that there are several on the market, some free, some shareware, and some cripple ware; however the pros and cons were not really explained. Perhaps the CSA should put up a Web page which compares and contrasts the various offerings.

Julian Driver, Chief Executive,
Vickers Laboratories Ltd


Joint RSC CIG/CSA Autumn Meeting

Chemical Structures and Networking:

an Overview of Recent Developments and Trends

Thursday, October 16th 1997,

Burlington House,

Piccadilly, London.

Rapid development continues in the application of increasingly sophisticated computer technology, both hardware and software, to the location, retrieval, registration, manipulation and exploitation of chemical structure and related data. These developments are helping users cope with the new demands arising from the continuing growth in the volumes and complexity of data, especially from such techniques as combinatorial chemistry and high-throughput screening. All of this is taking place alongside the continuing growth in the use of Internet/Intranet technology. Most of the new developments allow the use of a browser interface, and many of them are accessible via the Internet.

The meeting will review recent developments and trends in the availability of tools for manipulating chemical structure and related data. Key presentations will provide overviews, supplemented by presentations and demonstrations from major systems suppliers.

After the presentations, there will be a wine and cheese reception, which will provide a further opportunity to network and to view the various demonstrations.

 

This one-day meeting presents a highly cost-effective way of keeping up to date with this fast-moving field. The cost is just £20 to members of the CIG or CSA and £25 to non-members. Further details and registration forms are available from:

Doug Veal, Doverton Ltd., 46 The Knoll, Hayes, Bromley, Kent, BR2 7DHP
Phone/fax +44-(0)181-325-7608
e-mail: doverton@compuserve.com


CSA Trust 1997 Annual Award

Applications are invited for the 1997 Chemical Structure Association Trust award. The Trust is offering an award of £2000 for the best applicant seeking funds for education or research in chemical information.

Anyone working in the field of chemical information research can apply and application can be made for funds to attend a relevant conference, for travel (eg to collaborate with another research group) or for hardware or software to assist with the research project.

The application should include:

The Trust has previously supported the continuation of research studies in biomedical interactions including molecular recognition processes and drug design: a novel combination of reaction indexing and synthesis planning; clustering of chemical structures for property prediction; and investigation of reaction mechanisms. The work of younger scientists in developing countries has also been made possible in conjunction with some of the awards.

Recent award winners, and their areas of interest, are:

Applications must be submitted by July 31 1997, preferably by e-mail, to Professor Michael Lynch at: M.Lynch@sheffield.ac.uk

Any postal applications should be sent to:

Professor M.F. Lynch, Chairman of CSA Trust Awards Sub-Committee
Rural Route #1, Avonport, Kings County
Nova Scotia B0P 1B0, Canada

The award will be presented this year at the International Chemical Information Meeting in Nîmes in October.


ACS San Francisco, 13-17 April 1997

John Barnard takes a slightly jaundiced look at the most recent incarnation of the twice-yearly bash

The 113th American Chemical Society National Meeting was held in San Francisco, one of my favourite American cities. The weather was pleasantly spring-like, and so I was not too unhappy about leaving behind some of the best spring weather we have had in Britain for years.

Both the COMP (Computers in Chemistry) and CINF (Chemical Information) Division programs were full - too full in fact, and overlapping both in timing and subject-matter. At least they were in the same building this time (the spacious Moscone Conference Center), and so I was not reduced to dashing across ten-lane highways between hotels to get between the papers I wanted to hear. It is high time, however, that these two Divisions got their acts together to ensure that they offered complementary rather than competing symposia, and persuaded the ACS authorities to put them in adjacent rooms, rather than two escalators and a quarter-mile of corridor apart. COMP division also seems to have a pathological inability to secure rooms big enough for the audience, with the result that when you arrive, breathless after the dash from the previous paper in CINF, you find all the seats taken, and twenty people crowded just inside the door, and the only available standing room is immediately in front of the projector. The Registration Fee for an ACS meeting has doubled in the past six and a half years and even for ACS Members, it is now $210, which does not include a cup of coffee (another gripe: why can they not schedule some coffee breaks in the middle of long afternoon sessions, even if we have to buy our own?). We really do deserve better than this!

COMP presented a two-and-a-half day symposium on 'Pharmacophore Identification', most of which clashed with CINF's one-and-a-half days on 'Clustering and Similarity Searching Techniques for Studying Molecular Diversity', and half a day on each of 'Management of Spectroscopic Information' and 'Information Needs for Planning and Synthesis of Combinatorial Libraries'.

On the Tuesday, however, COMP restricted itself to 'traditional' molecular modelling, whilst CINF covered academic libraries and pesticides, neither of which interested me, and so I spent the day in the exposition. Unfortunately, I had to leave early on the Thursday morning, and so missed the chance to improve on my personal best sprint time between CINF's symposium on 'Database Mining and Data Visualisation' and COMP's simultaneous one on 'Multivariate Analysis'.

A further problem with the longer symposia seemed to me to be the unnecessary duplication between papers. We had an awful lot of people from different companies describing essentially the same work, and though there was some encouragement to be gained from the fact that they were generally getting the same results, a much more useful symposium would probably have resulted if the organisers had been more selective, and had chosen just one speaker to represent each of the main techniques being tried. I lost count of the number of genetic algorithms being used to select precursors for combinatorial library synthesis (the only people who do not seem to have any are the software vendors).Whilst on the subject of vendors, there was also the usual problem of people offering papers simply because they felt that their company's name should be on the programme (probably just because their competitors' were) despite the fact that they had nothing new or interesting to say. In fact, I suspect that such papers are extremely counterproductive, and leave the audience (composed of people who are (a) your potential customers and (b) not stupid) with the impression that your company is not doing anything worth looking at, even if it actually has superb implementations of (someone else's) research work published two (or even twenty) years ago, which knocks spots off all the competition. Perhaps, as was the case at last year's Noordwijkerhout conference, there should actually be separate sessions for commercial product reviews (even a special session for vapourware!) so that such talks do not get in, heavily but ineffectively disguised as research.

Partly because of all the above, and partly because I am not going to do for nothing what Wendy Warr charges good money for doing, I will not give detailed accounts of the papers I did get to hear. Rather, I will give a brief mention to a few papers and themes which particularly attracted my own biased interests and prejudices. Incidentally, despite all my complaints, many of the papers were extremely interesting, and led to some excellent discussions both during the symposia and informally with the speakers afterwards. This was certainly one of the most useful ACS meetings which I have attended.

In the CINF Molecular Diversity symposium, Robin Spencer of Pfizer discussed the question of the dimensionality of chemistry space. Much diversity analysis work using structure fingerprints effectively treats such fingerprints as if they were composed of unit vectors in a high - (eg 1024-) dimensional space. Using an analysis based on fractal geometry Spencer concluded that the real dimensionality, at least of the Pfizer collection, was closer to 7. A similar conclusion is reached by Principal Components Analysis or Multidimensional Scaling (also discussed by Eric Martin of Chiron). There was some discussion of whether or not different types of fingerprint might give different dimensionalities.

Several speakers were trying to measure diversity, using a diverse range of measures, though Colin Edge (Smith-Kline Beecham) pointed out that one of the problems is that whilst similarity is best defined at its maximum value (identity), diversity is most ill-defined at its maximum value, which unfortunately is also its most useful. Robert Clark (Tripos) drew attention to the difference between diversity and representativeness in choosing subsets from a large collection. Yours truly described the use of Markush structure handling techniques for very rapid calculation of diversity measures for large combinatorial libraries, and John Blankley (Parke-Davis) used a Principal Components Analysis to compare a wide variety of different diversity metrics.

Different similarity measures are still cropping up, and being modified, re-invented, and arrived at from different directions.Gerry Maggiora (Pharmacia and Upjohn), deputising for an absent Mic Lajiness, described using a weighting with the Tanimoto index to correct biases that tend to occur with small molecules, and on his own account, used an approach based in fuzzy set theory to arrive at what was later concluded to be a type of Tversky Similarity, an asymmetric (similarity of A to B is not equal to the similarity of B to A) that John Bradshaw described at the Daylight MUG meeting in February.

Lots of people are still comparing different sorts of descriptor (based on 2D or 3D representations of molecules) and coming to the same conclusions published earlier this year by Brown and Martin (JCICS, 1997, 37, 1-9) that 2D descriptors seem to do better at finding actives. Nevertheless, there is also a lot of interest in shape similarity, including ideas from Dave Weininger (Daylight) and Mark Hermesmeier (Bristol Myers Squibb) based, like Vincent van Geerestein's earlier SPERM program (JCICS 1992, 32, 601), on placing the molecule inside an icosahedron. Several authors also reported work on choosing appropriate descriptors for use in fingerprints or other analyses.

Twenty years ago, CINF division concentrated primarily on the problems of chemical documentation, and its interest in computers was restricted to the fact that they were a more efficient tool than card indexes for doing what amounted to the same thing. Since then, chemical databases have become regarded as research tools in their own right, and the advent of high-throughput screening and combinatorial chemistry has further emphasised their role in this respect. CINF division has thus effectively split into those whose primary interest remains the storage and retrieval of information about chemistry and chemicals, and those whose interest is in analysing and drawing general conclusions from the information in large databases. COMP division, on the other hand, has divided into those whose computer-based analyses deal with one molecule at a time ('traditional' computational chemistry and molecular modelling), and those who look at whole databases of molecules at once. There is a group of us from the two divisions who, having come from different directions, have now collided in the middle, and spend our time at ACS meetings running up and down escalators and along corridors (where we sometimes quite literally collide with each other) trying to hear papers in competing symposia organised by the two divisions on essentially the same subject, and quite often given by the same speakers.

Something needs to be done about this. Either the two divisions need to get together and agree where the dividing line between them ought to be (and I would suggest the distinction between work involving individual molecules, and work involving whole databases of them), or we perhaps need a new Division of Chemical Informatics to take over the interface area (and perhaps the peripheral ones in both divisions as well). Obviously there are always going to be overlaps between the areas of interest of different divisions, but someone needs to deal with the present situation in CINF and COMP, which regularly repeats itself at ACS meeting after ACS meeting.

There is a similar problem with the CSA and the MGMS, as I pointed out when I spoke in favour of a merger between them at the CSA AGM in Noordwijkerhout last year. Though my proposal was voted down then, I do not believe that the problem has gone away.

John Barnard can be contacted on +44-(0)114-233-3170,
or at barnard@bci1.demon.co.uk


RSC-CIG Meeting: Chemical Information on the Internet

The Council Room at the RSC Headquarters in London was packed for the one day RSC-CIG meeting in March. The meeting, organised and chaired by Doug Veal, aimed to outline some of the most recent developments in chemical information on the Internet, notably the Open Molecule Foundation (OMF), and Chemical Markup Language (CML).

To start the day, Don Parkin from CDS, Daresbury, covered the current problems encountered when transferring chemical data over the Internet, with the variety of different file formats and standards. Some of the problems have been overcome by the use of file conversion programmes, plug-ins such as Chemscape Chime, which can be downloaded free, and Java applets, which are platform independent. This talk was followed by Philip Judson discussing deficiencies which he had encountered when using the Internet to find chemical information. Although it is possible to access chemical structure information in some chemical databases, few biological or physiochemical databases have structures. Internet sites cannot always be accessed, and there is a need for a quality approval scheme. Philip's conclusion was that in many ways the Internet is 20 years out of date.

We were, however, brought fully up-to-date in the rest of the morning sessions, which were devoted to the Open Molecule Foundation, introduced by Adam Precious of McDonnell Information Systems Ltd, and further described by Peter Murray-Rust. Adam pointed out that everything happens seven times faster on the Web than in any other form of communication. At the time of the meeting, Java had been in existence only 571 days, but it had already made a major impact on the transfer of chemical information. Peter Murray-Rust described Java as 'the only programming language that does what I want', adding that 'Netscape running Java is very recent and should be admired for what it does and not what it doesn't do'!

The Open Molecule Foundation aims to provide re-usable Java applets which would be freely available over the Internet for use by the bioinformatics and chemoinformatics communities. Platform independent Java applets eliminate the need for conversion for all the various systems. In this way data can be organised for the global community, and therefore it is vital that vendors, industry and academia work together. It is proposed that chemical information should be described using CML, such that the molecule carries all the related information with it.

In the afternoon, Peter Murray-Rust demonstrated some of the OMF functions, in parallel with the RSC-CIG AGM, at which Doug Veal was elected Chairman. Then Bernard Blessington from the University of Bradford discussed standardisation of chemical information formats and stressed the need for a central UK chemistry Web site, offering free software and open published file formats. It is essential also that any new file formats have backwards compatibility. The afternoon concluded with Peter Murray-Rust giving further details of the CML. All existing molecular data, such as 2D and 3D structures, spectra, sequences, citations, tables and graphs, can be converted into CML. It can be thought of as a set of components, posing no threat to vendors.

Information on the OMF and CML can be found at:

 

http://www.ch.ic.ac.uk/omf

http://www.venus.co.uk/omf/cml/

Peter Murray-Rust, CML Project Leader, can be contacted at: peter.murray-rust@nottingham.ac.uk

The CSA is grateful to the RSC-CIG not only for organising such an informative and interesting meeting, but also for giving a discount to CSA members. We look forward to the joint meeting with the RSC-CIG in October.


Quality Control of Data on the Net

Gary Wiggins

Although there are many free data sources on the Internet, some of the free compilations lack the kind of rigorous quality control found in commercial data sets. Without standards, questions of accuracy and reliability of Internet data invariably arise. In October 1996, a survey set out to determine how to improve the quality of Web data. Questions sent to CHMINF-L and CHEMWEB were intended to gauge the extent of inaccurate data in Web databases, to define desirable characteristics, and to determine the best guides to data.

Some respondents roundly criticised the inaccuracy of data on the Web, citing the frequent omission of units, and the many transcription errors. Few sources have quality assurance statements, or indicate the origin of the data, and if they do, often the data has been copied from outdated sources. A minimal level of auxiliary information (metadata) is desirable, providing such information as authorship, units, conditions of measurement, and references to primary and secondary sources.

The following were suggested to improve Web data:

At a minimum, compilers ought to provide descriptions of physical theories on which data are based, full references to literature, and descriptions of the format of the database and its search capabilities.

There are some standardisation efforts underway on the Net, particularly the CLIC project, Chemical MIME and CML. Although one person questioned whether or not standardisation was worthwhile, there was generally thought to be a role for IUPAC, CODATA and other bodies in the area of data certification.

One respondent to the survey noted the relevance of Lebedev's study of Internet search engines (http://www.chem.msu.su./eng/comparison.html). Lebedev searched for data using words on 11 Web search engines. He concluded that Excite retrieves a comparable number of documents to Altavista and that Metacrawler is the most powerful search engine for scientific and technical information. The author compared his Internet searches to INSPEC results for the same information covering 1994 and 1995, and found thast only five to ten per cent of relevant information is on the Net. He does consider, however, that the Web is a good source of supplementary information on authors, on their work and research projects, and on foundations supporting them.

It is possible to find data on the Internet by following some generally accepted procedures, such as:

Some examples of quoted sites

Indiana University

http://www.indiana.edu/~cheminfo/ca_ppi.html
http://www.indiana.edu/~cheminfo/ca_accc.html

Sheffield Chemputer
http://www.shef.ac.uk/~chem/chemputer/

 

 

Biocatalysis/Biodegradation Database
http://dragon.labmed.umn.edu/~lynda/index.html

 

 

Chemfinder
http://chemfinder.camsoft.com

 

 

WWW Chemical Structure Database
http://schiele.organik.uni-erlangen.de/services/webmol.html

 

 

Molecule of the Month
http://www.bris.ac.uk/Depts/Chemistry/MOTM/motm.html

 

 

Molecules R Us (text search of PDB database)
http://molbio.info.nih.gov/cgi-bin/pdb

Based on a talk given at the US National Institute of Standards and Technology, December 1996


ADVANCE NOTICE OF AGM AND DINNER, 8TH DECEMBER!

The CSA Annual General Meeting will be held at 4.00 pm on Monday 8th December (the day before the Online Meeting) at the Linnean Society, Burlington House, Piccadilly, London.

The AGM will be followed by the CSA Annual Dinner, which will be held this year at the Cadogan Hotel, Sloane Street, at 7.30 pm. The dinner will be a set price of £27.50 in a private room and will include one pre-dinner drink on arrival, a set 3 course meal with coffee and half a bottle of house wine per person.

The Cadogan Hotel has had a long and distiguished career as a discreet and traditional hotel of high quality. Built in 1887-8, it became linked with two of the most fascinating personalities of the age - Lillie Langtry and Oscar Wilde. Originally, the hotel consisted just of the corner of the present building, but gradually over forty years, almost all of the rest of the building was added, including number 21 Pont Street, the former home of Lillie Langtry. Lillie Langtry was a famous actress and was confidante and close friend of Edward VII. The CSA dinner will be held in her drawing room and dining room.

Oscar Wilde had a less happy association with the hotel as he was arrested in room 118 in 1895!

Make a note in your diary now!


Mike Lynch Retires from the Department of Information Studies

Friends and colleagues gathered together at the University of Sheffield on the 9th May to honour the retirement of Mike Lynch, our CSA President, from the Department of Information Studies. The contribution made by Mike to the Department, and to the whole of the chemical information profession, has established the Department as a world-class centre for research in chemical information.

After gaining a PhD in chemistry at the National University of Ireland, Mike pursued postdoctoral research in Zurich.Then, following two years in industry in the UK and four years at CAS, he joined the staff of what was then the Postgraduate School of Librarianship and Information Studies in Sheffield, in 1965. His research covered such diverse areas as the development of automatic methods for the production of articulated subject indexes; indexing, storage and retrieval of reactions; fragment screening for sub-structure searching; the application of screen-set techniques to textual databases; storage and retrieval of generic structures; and information extraction from natural language patent descriptions. Throughout the chemical information profession there are people who have worked with Mike at Sheffield University, and products which have been directly influenced by his research.

In 1975, Mike was awarded a Personal Chair by the University, and in addition to receiving awards from the Journal of the American Society for Information Science, and the Institute of Information Scientists, he was awarded the ACS Skolnik Award in 1989.

With his Canadian wife, Mary, he plans to live partly in Canada and partly in Sheffield but we hope that he will be able to continue as our President and remain closely involved with the CSA in the future.


Please note: Gez Cross (+44-(0)171-344-2800 maintains a current membership list with full addresses and telephone numbers of all CSA members. If you change your address or telephone number please let him know. If you need the telephone number of a CSA member, Gez can help you.

Your committee ...

Chairman
Janet Ash
+31-20-6269610 or
+44-(0)1580-852270
e-mail ash@euronet.nl or 100547.562@compuserve.com

Vice-Chairman
Peter Nichols
+44-(0)181-441-7495
e-mail pwn22@xtrn.org

Secretary
Barbara Nicholson
+44-(0)1625-871126
e-mail 100645.3654@compuserve.com

Membership Secretary
Gez Cross
+44-(0)171-344-2800
e-mail GCROSS@derwent.co.uk

Treasurer
Andrew Poirrette
+44-(0)114-2222650
e-mail apoirrette@sheffield.ac.uk

Articles printed in the CSA Newsletter express the views of individual contributors, not those of the editor.