Data analysis in chemometrics: selection versus compression (DOI: 10.2436/20.2003.01.30)

Authors

  • Alberto Ferrer Departament d’Estadística i Investigació Operativa Aplicada i Qualitat .Universitat Politècnica de València.

Keywords:

Chemometrics, latent structures, principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA), compression, selection, data mining, soft sensor, multivariate process diagnosis.

Abstract

Chemometrics uses data mining tools for empirical modeling of biochemical systems. The explosive development of information and communications technology have enabled the manufacture of a wide variety of sensors that are able to register large amounts of data stored on computing devices. The challenge is to efficiently extract the potential information contained in the data, which depends heavily on the strategy of analysis used. With so much data available it is necessary to use a procedure to reduce the number of variables to analyze. In this paper we present two strategies for this necessary simplification: compression versus selection. The big difference between them is that with selection some variables are discarded whereas after compression all variables may be recovered. If the selection is made at the beginning of the investigation there is a risk of eliminating variables with useful information to solve the problem at hand. The recommendation is therefore compress and, if it is needed, select. The benefits of this recommendation are illustrated with actual examples.

Keywords. Chemometrics, latent structures, principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA), compression, selection, data mining, soft sensor, multivariate process diagnosis.

Downloads

Download data is not yet available.

Author Biography

Alberto Ferrer, Departament d’Estadística i Investigació Operativa Aplicada i Qualitat .Universitat Politècnica de València.

Alberto Ferrer és enginyer agrònom i doctor per la Universitat Politècnica de València. Actualment, és catedràtic del Departament
d’Estadística i Investigació Operativa Aplicades i Qualitat de la Universitat Politècnica de València, on dirigeix el grup d’investigació en Enginyeria Estadística Multivariant, dedicat al desenvolupament de metodologia estadística per a l’anàlisi, el monitoratge i el diagnòstic de processos complexos. És editor associat de la revista Technometrics, membre de l’equip editorial de la revista Quality Engineering, membre
del Consell de la International Society for Business and Industrial Statistics (ISBIS), així com membre de l’European Network for Business and Industrial Statistics (ENBIS) i de la Xarxa Espanyola de Quimiometria.

Downloads

Issue

Section

Articles