International Biometric Society
British and Irish Region

211th Ordinary Meeting
Chemometrics

Friday, March 31st, 2006

The 211th Ordinary Meeting of the British and Irish Region of the International Biometric Society will be held at the Central Science Laboratory, Sand Hutton, York on Friday March 31st. The meeting will start at 11.00 (coffee and registration from 10.30am) and finish at about 4.45pm.

This meeting addresses an area possibly on the periphery of the range of topics usually addressed by the IBS, but, we hope, provides an interesting combination of theoretical developments, a range of practical applications, and an opportunity to see the data collection in action. There is also a link back to the meeting held last November on “Proteomics and Metabolomics”, with many of the chemometrics techniques being applied in these research areas. We are fortunate to have attracted Prof. Svante Wold, probably the world expert on chemometrics, to talk at the meeting, together with two experts from the UK from different application areas.

It will be essential to register in advance for this meeting, which will also provide an opportunity to see some of the biological applications within the work of the Central Science Laboratory which require a substantial statistical and mathematical input.

There is a small charge for attending the meeting, essentially to cover lunch. This is £15 for IBS members, £20 for student non-members and £30 for non-members. Please note that associate membership of the society is available for £15 a year, and membership forms can be downloaded from the British and Irish Region web-site at:

http://www.maths.qmul.ac.uk/~rab/biometrics/british.html

Registration forms should be returned to the address shown on the registration form, together with appropriate payment, to be received by Wednesday March 22nd.

Please note that access to the site is by pass. If possible, a pass will be sent to you by email in advance of the meeting. Otherwise, please check in at reception on the main gate to collect your pass and return it there when you leave. Parking is available on site – please indicate on the registration form whether you will need a parking space.

Background to the Central Science Laboratory

CSL (http://www.csl.gov.uk/) is an Executive Agency of the UK Government Department for Environment Food and Rural Affairs (DEFRA). CSL specialises in the sciences underpinning agriculture for sustainable crop production, environmental management and conservation, and in food safety and quality. The Agency provides policy advice and technical support underpinned by high quality R&D to help DEFRA, and other customers, safeguard food supplies and protect consumers and the environment. Consequently, there is a very wide range of science capabilities within the organisation.

There is a strong emphasis on statistics and modelling in all of the work within CSL, with the application of statistics and mathematics being fundamental to sound evidence‑based science together with measures of uncertainty to quantify the strength of that evidence. Rigorous, science-based approaches to risk assessment are increasingly seen as a fundamental requirement for decision‑making by governments, industry and international institutions. CSL pioneers the development of quantitative approaches by estimating the frequency and magnitude of impacts, and providing confidence bounds to indicate the extent of quantifiable scientific uncertainty.

Understanding the spatial aspects of phenomena is crucial to generate cross-links between different sources of information and create the holistic view required to achieve sustainability in the environment. Geographic Information Science utilises data from monitoring, surveillance, computer models and other sources to extract scientific evidence to underpin policy, risk analysis and contingency response. It plays a major role in linking research findings to the wider, economic, environmental and social context and in evaluating the social dimensions of risk. Sound spatial statistics and modelling are fundamental to GI Science.

Many of the quantitative applications and underlying databases are web‑enabled and there is an active network of scientists involved in knowledge management applications. Examples include the Foot and Mouth data archive (http://footandmouth.csl.gov.uk/) and Pesticide Usage Statistics (http://pusstats.csl.gov.uk/).

CSL is located just outside York in modern purpose-built laboratory. Directions to the CSL can be obtained from their web-site at:

http://www.csl.gov.uk/aboutcsl/where_to_find_us.cfm

York is easily accessible by road (dual carriage way A64 from the A1) and rail (~2 hours from London Kings Cross). There are regular buses from the station – see http://www.yorkshirecoastliner.co.uk/.

York is an attractive historic university town – good for a short break or long weekend. Information on York is available from http://www.thisisyork.co.uk/ and many other web sites.

Programme

10.30 Registration and Coffee
11.00 Introduction – Joe Perry (President, IBS British and Irish Region)
Welcome to CSL – Alistair Murray (Team Leader, Statistics, CSL)
11.15 35 Years of chemometrics, from a sidekick to an obsession, illustrated by some case stories
Svante Wold (RG Chemometrics, Umeå University, Sweden, and Umetrics Inc., Kinnelon, NJ, USA)
Chemometrics has since its infancy around 1970 been driven by the need to find useful information (knowledge) in complicated data sets with many variables. This information is usually structured as (a) related to summary and overview of a data set, (b) classification and discriminant analysis, and (c) quantitative relationships.

In 1970, it was a surprise to almost everybody that the analysis of data sets with more variables than observations was possible, and even more surprising that a large number of variables actually was an advantage. This led to some thinking about the foundations of chemometric methods, with latent variables, Taylor expansions, and projections emerging as essential concepts.

At the same time, the experimental basis of chemistry necessitated adoption of efficient approaches for making experiments to investigate a given problem. Design of experiments (DoE) à la Box provides a most useful start, which combined with the latent variables gives the interesting approach of design in latent variables.

The situation in 2006 is very much the same as in 1971, except that the typical number of variables has increased by some magnitudes of ten. The chemometrics approaches of 1971 and 1981 still work, but need some additional refinement to work well. An interesting approach is to divide variables into hierarchical blocks, thus creating layered models with improved interpretability and retained efficiency and simplicity, as well as a workable DoE approach for very complicated systems.

The development in chemometrics from 1971 to 2006 is illustrated with a number of examples from basic research (science), development, and production (technology), where also the statistical properties of the chemometrics approaches are briefly touched upon.

12.15 Lunch
13.15 Data Processing Requirements for Chromatography - Mass Spectrometry Analysis
Richard Fussell (Central Science Laboratory, Sand Hutton, York, UK)
(r.fussell@csl.gov.uk)
Analysts involved in chromatographic analysis use a number of different proprietary data processing software systems for the detection of individual peaks and subsequent measurement of the peak area/height. These systems, usually designed to process instrument specific file formats, adequately perform the relatively simple task of detecting and measuring a single peak or a limited number of peaks. However, the increasing capability of chromatography-mass spectrometry systems, particularly improved signal to noise and higher peak capacity, means that the expectation of the analyst is to be able to analyse hundreds of peaks in a complex matrix in a single analysis. Data processing becomes very demanding because of inevitable differences in peak shapes, drift in retention time, variation in the response for different compounds, and the need to differentiate the analyte signal from instrument noise and chemical noise. The task becomes even more complex, requiring peak spectral deconvolution, when chromatographic resolution is sacrificed for speed. The search for ‘unknown’ compounds rather than target compounds and the use of two-dimensional separations adds further difficulties.

After peak detection and quantification, there is a requirement to construct calibration curves in order calculate analyte concentration, and to collate retention time data and ion-ratio statistics to confirm the identity of the compound causing the response. The final report is usually in the form of an Excel spreadsheet or HTML report.

Often, the data-handling component of a multi-analyte procedure accounts for a significant proportion (up to 50%) of the overall analytical cost. More efficient, automated, flexible and reliable data handling systems are required.

The aim of this short presentation is to provide examples of the data outputs from different types of chromatographic systems, to highlight the requirements and difficulties, and to promote discussion on possible solutions.

13.35 Analysis of Metabolomics data acquired by NMR Spectrometry
Adrian Charlton (Central Science Laboratory, Sand Hutton, York, UK)
(adrian.charlton@csl.gov.uk)
Improvements in the resolution and sensitivity of analytical instrumentation (e.g. NMR, GC-MS, LC-MS) have been rapid in recent years. This has led to a vast improvement in the capabilities and an increase in the breadth of application of the data derived from these instruments. Data profiles (often spectra) resulting from the analysis of complex mixtures generate a wealth of information that can be “mined” in many different ways to address a wide range of questions. For example, it is possible to ask questions about the disease status of cattle from the NMR spectrum of their blood whilst also detecting the misuse of veterinary drugs from the same profile. To answer such varied questions multivariate analysis tools such as genetic programming and chemometrics have been adapted and deployed. These algorithms, unlike simple conventional statistical methods, enable trends to be found in data that may be dependant on the concurrent displacement of multiple variables. This methodology has not been widely understood within scientific communities until recently and has yet to reach its full potential. There is, therefore, great scope for further application and promotion of these techniques, and we are keen to build on existing academic collaborations as part of our ongoing research and development in this area.

Topics of interest include:

  • Improvements in pattern matching algorithms is of crucial importance in supporting current work programmes on food traceability.
  • Characterisation of biomarkers.
  • Mathematical algorithms for data visualisation and exploration, storage and retrieval are also an inherent part of the development process.

    Signal fusion is the use of multiple analytical measurement techniques to record data, which are then subject to combined multivariate analysis. This concept has recently arisen from the multidisciplinary research projects that have been initiated in the metabolomics field. With the range of facilities available at CSL, we are well positioned to perform these measurements and are seeking to develop research applications in this area.

  • 13.55 Tours of CSL facilities associated with chemometrics
    - to include the NMR facility, the chromatography labs and the statistics team room for a demonstration of some of their software applications
    14.55 Tea
    15.15 NIR spectroscopy and chemometrics in food analysis
    Tom Fearn (Department of Statistical Science, University College London)
    NIR spectroscopy using chemometrics has had many successful applications in food analysis. The talk will describe two such applications. One involves measuring the composition of biscuit doughs for quality control purposes. For this quantitative calibration problem, variable selection, principal components regression and partial least squares regression all give good results. The other application involves discrimination between meat from different animal sources. This qualitative problem is tackled using a hierarchical approach combined with two chemometric methods: linear discriminant analysis using principal component scores, and SIMCA.
    16.00 On the statistical calibration of analytical fingerprints for rapid characterisation of complex hydrocarbon mixtures
    Philip Jonathan (Statistical Consulting, Shell Global Solutions, Chester)
    Analytical fingerprinting techniques are widely used as surrogate measurements for rapid physico-chemical characterisation in many fields; each area of application holds its challenges for analytical chemist and statistician. From a statistical perspective, the range of approaches and ideas for multivariate calibration has increased in recent years. This article addresses spectroscopic fingerprinting of complex hydrocarbon mixtures within the oil refining industry, outlining the motivation, the challenges faced and exploring some of the statistical solutions available.
    16.45 Conclusions – Joe Perry (President, IBS British and Irish Region)