Noise reduction for near-infrared spectroscopy data using extreme learning machines

https://doi.org/10.1016/j.engappai.2018.12.005Get rights and content

Highlights

  • There are many pre-processing techniques for NIR data to choose from.

  • We propose to avoid the pre-processing by using a novel algorithm called C-PL-ELM.

  • C-PL-ELM uses two Lagrange multipliers as optimization constraints.

  • Results for regression and classification tasks confirm the advantages of C-PL-ELM.

Abstract

The near infrared (NIR) spectra technique is an effective approach to predict chemical properties and it is typically applied in petrochemical, agricultural, medical, and environmental sectors. NIR spectra are usually of very high dimensions and contain huge amounts of information. Most of the information is irrelevant to the target problem and some is simply noise. Thus, it is not an easy task to discover the relationship between NIR spectra and the predictive variable. However, this kind of regression analysis is one of the main topics of machine learning. Thus machine learning techniques play a key role in NIR based analytical approaches. Pre-processing of NIR spectral data has become an integral part of chemometrics modeling. The objective of the pre-processing is to remove physical phenomena (noise) in the spectra in order to improve the regression or classification model. In this work, we propose to reduce the noise using extreme learning machines which have shown good predictive performances in regression applications as well as in large dataset classification tasks. For this, we use a novel algorithm called C-PL-ELM, which has an architecture in parallel based on a non-linear layer in parallel with another non-linear layer. Using the soft margin loss function concept, we incorporate two Lagrange multipliers with the objective of including the noise of spectral data. Six real-life dataset were analyzed to illustrate the performance of the developed models. The results for regression and classification problems confirm the advantages of using the proposed method in terms of root mean square error and accuracy.

Introduction

Near-infrared (NIR) (Pierna et al., 2011) spectroscopy are mainly used to measure light absorption of the so-called mid-infrared light, in order to identify and quantify various materials. Spectroscopy in combination with varied multivariate algorithms has played an important role for fast and nondestructive analysis in petrochemical, agricultural, medical, and environmental sectors (Liu et al., 2015a, Luypaert et al., 2007, Kim et al., 2010, Park et al., 2012).

According to the Beer–Lambert law, the absorption of light in a medium is proportional to the path length and the concentration of the absorbing agent. That is, there is a linear relationship between absorbance and concentration when the path length remains constant, which motivated the use of linear multivariate calibration techniques, such as multiple linear regression (MLR), principal components regression (PCR) (Keithley et al., 2009) and partial least squares regression (PLS) (Wilcox et al., 2016).

However, the linearity of the Beer–Lambert law is limited by chemical and instrumental factors, such as, deviations in absorptivity coefficients at high concentrations, non-symmetrical chemical equilibrium, intermolecular reactions, existence of humidity inducing hydrogen bonding, changes in temperature, non-monochromatic radiation, scattering of light, fluorescence or phosphorescence of the sample, stray light, nonlinear detector response (Despagne and Luc Massart, 1998), etc. When the system exhibits strong nonlinear behaviors, classical linear methods may not completely identify the relationship between the spectra and corresponding concentrations and thus would produce large errors in regression and classification problems. Many nonlinear techniques have been developed, such as, artificial neural network (ANN) (Hajnayeb et al., 2011), support vector machine (SVM) (Li et al., 2009), nonlinear partial least squares (Rosipal and Trejo, 2001), etc. These methods may perform well on nonlinear data but are computationally more complex than linear methods and have the limitation of being prone to overfitting (Peng et al., 2013).

In the case that the experimental measures deviate from the Lambert–Beer law, a suitable pre-processing must be considered to compensate for this nonlinear behavior. The disadvantage of including such additional factors is an increase in the complexity of the model and, in turn, it is likely to have a reduction of the robustness of the model for future predictions. All pre-processing techniques have the aim to reduce the noise in the data with the purpose of improving the characteristics looked for in the spectra. However, there is always the danger of choosing the wrong type or applying a pre-processing that is too severe that you may (unintentionally) delete valuable information. This problem is described in detail by Rinnan et al. (2009).

For near-infrared spectroscopy data, it generally contains linear and non-linear components in regression or classification problems. Linear methods may have difficulty obtaining a good performance, since the non-linearity is usually modeled in a limited way. However, the linear methods are more simple and stable. The non-linear methods can provide better performance than the linear methods, but are more complex. Therefore, a simple, fast, precise and effective method is required.

In the early 1990s, different authors (Schmidt et al., 1992, Pao et al., 1994) independently proposed feedforward neural networks comprising randomly initialized and untrained connections between the input layer and a hidden layer of non-linear neurons. Then, in the 2000s these type of networks were revisited under the name of Extreme Learning Machine (ELM) (Huang et al., 2004). Recently, a comparison of these neural networks with random weights, for classification and regression problems, was carried out by Henríquez and Ruz (2018a), also, in Zhang and Suganthan (2016b), a survey on randomized algorithms for training neural networks was presented. In this paper, we will use the term ELM as a reference for randomized feed-forward neural networks (Henríquez and Ruz, 2018b).

ELM has been successfully applied in many fields (Samat et al., 2014, Cao et al., 2016). Especially, for NIR data, ELM combined with feature selection techniques has been used to determine amino acid nitrogen in soy sauce (Ouyang et al., 2013), total acid content in vinegar (Chen et al., 2012), pear internal quality attributes (Jiang and Zhu, 2013), etc. Recently, in Li et al. (2016) was analyzed the feasibility of Fourier transform infrared transmission (FT-IR) spectroscopy to detect talcum powder illegally added in tea. In Yang and Sun (2016) the abilities of six popular multivariate classification techniques are compared, including ELM. In Bian et al. (2017) ELM was used for near-infrared spectral quantitative analysis of diesel fuel and edible blend of oil samples.

A main difficulty when working with NIR data is the fact that there are many pre-processing techniques from which to choose from (more details in Section 2), and even combinations of them, where different researchers use arbitrarily such methodologies, in order to achieve good performance in classification and prediction problems. With this motivation, there is a need for a robust methodology to solve this problem. Therefore, we propose to reduce the noise in NIR data by using a novel algorithm called C-PL-ELM, which has a parallel architecture. This algorithm has a non-linear layer in parallel with another non-linear layer, generating a more powerful nonlinear mapping. Using the soft margin loss function concept (Bennett and Mangasarian, 1992), we incorporate two Lagrange multipliers as optimization constraints (similar to the concept of support vector regression (Drucker et al., 1997)) with the objective of including the noise in spectroscopy data, thus avoiding the pre-processing of the experimental measures.

The remaining of the paper is organized as follows: In Section 2, we briefly review some of the most popular pre-processing techniques. The detail of the proposed C-PL-ELM algorithm is presented in Section 3. Simulation results and comparisons are provided in Section 4. Section 5 presents discussions on the performance of C-PL-ELM with respect to the different datasets. Conclusions are drawn in Section 6.

Section snippets

Brief review of Pre-processing techniques

The aim of signal pre-processing is to improve the data quality before modeling and to remove physical information from the spectra. Applying pre-processing can increase the repeatability/reproducibility of the method, model robustness and accuracy, although there are no guarantees this will actually work.

The most widely used pre-processing techniques in NIR spectroscopy can be divided into two group: scatter-correction methods and spectral derivatives.

The first group of scatter-corrective

Extreme learning machines

Extreme learning machine (ELM) (Huang et al., 2006b) is a unifying learning algorithm which can be used for several learning tasks. It was originally developed for the single hidden-layer feed forward neural networks (SLFNs), and then extended to the generalized SLFNs (Huang et al., 2012). In ELM, the feature mapping function is also called the activation function. The network structure of ELM is shown in Fig. 1. In accordance with the ELM’s universal approximation capability theorems (Huang et

Performance evaluation

In this section, the performance of the proposed C-PL-ELM learning algorithm is measured. All the simulations are carried out using the free R software for statistical computing environment running on a 2.6 GHz Intel Core i5 and 8 GB-RAM computer. In relation to the scope of the random weights and biases, the random values are considered typically in the range [1,1] (as we do in this paper), some additional conditions are considered in Gorban et al., 2016, Li and Wang, 2017, Tyukin and

Discussions

The selection of an appropriate pre-processing technique for NIR data is a difficult problem. As mentioned in Rinnan et al. (2009) this can affect the robustness of the model. In the previous sections, we have proposed a novel method to include the noise from the data in the model. We use two Lagrange multipliers as optimization constraints (similar to the concept SVR proposed by Drucker et al. (1997)). We propose an algorithm without using pre-processing techniques. We believe that a robust

Conclusion

In this paper, we propose a novel algorithm for the analysis in near-infrared spectroscopy. We use the algorithm C-PL-ELM with an architecture in parallel based on a non-linear layer in parallel by another non-linear layer. We incorporate two Lagrange multipliers as optimization constraints with the aim of avoiding the pre-processing of the spectra. The experimental results of this paper are promising and indicate that C-PL-ELM has a good performance in the presence of spectra with noise. More

Acknowledgment

The authors would like to thank CONICYT-Chile under grant CONICYT Doctoral scholarship (2015-21150790) (P.H.), Basal(CONICYT)-CMM (G.A.R), and the Research Center Millennium Nucleus Models of Crisis, Chile (NS130017) (G.A.R), for financially supporting this research.

References (64)

  • GorbanA.N. et al.

    Approximation with random bases: Pro et contra

    Inform. Sci.

    (2016)
  • HajnayebA. et al.

    Application and comparison of an ANN-based feature selection method and the genetic algorithm in gearbox fault diagnosis

    Expert Syst. Appl.

    (2011)
  • HenríquezP.A. et al.

    Extreme learning machine with a deterministic assignment of hidden weights in two parallel layers

    Neurocomputing

    (2017)
  • HenríquezP.A. et al.

    A non-iterative method for pruning hidden neurons in neural networks with random weights

    Appl. Soft Comput.

    (2018)
  • HuangG.B. et al.

    Extreme learning machine: Theory and applications

    Neurocomputing

    (2006)
  • KeithleyR. et al.

    Multivariate concentration determination using principal component regression with residual analysis

    TrAC - Trends Anal. Chem.

    (2009)
  • KimS.B. et al.

    An effective classification procedure for diagnosis of prostate cancer in near infrared spectra

    Expert Syst. Appl.

    (2010)
  • LiH. et al.

    Support vector machines and its applications in chemistry

    Chemometr. Intell. Lab. Syst.

    (2009)
  • LiM. et al.

    Insights into randomized algorithms for neural networks: Practical issues and common pitfalls

    Inform. Sci.

    (2017)
  • LiuY. et al.

    Predicting soil salinity with vis–nir spectra after removing the effects of soil moisture using external parameter orthogonalization

    PLoS One

    (2015)
  • LiuC. et al.

    A comparative study for least angle regression on nir spectra analysis to determine internal qualities of navel oranges

    Expert Syst. Appl.

    (2015)
  • LorenteD. et al.

    Visible–nir reflectance spectroscopy and manifold learning methods applied to the detection of fungal infections on citrus fruit

    J. Food Eng.

    (2015)
  • LuypaertJ. et al.

    Near-infrared spectroscopy applications in pharmaceutical analysis

    Talanta

    (2007)
  • MaboodF. et al.

    Detection and estimation of super premium 95 gasoline adulteration with premium 91 gasoline using new NIR spectroscopy combined with multivariate methods

    Fuel

    (2017)
  • PaoY.H. et al.

    Learning and generalization characteristics of the random vector functional-link net

    Neurocomputing

    (1994)
  • ParkJ.I. et al.

    Improved prediction of biomass composition for switchgrass using reproducing kernel methods with wavelet compressed ft-nir spectra

    Expert Syst. Appl.

    (2012)
  • PengJ. et al.

    Combination of activation functions in extreme learning machines for multivariate calibration

    Chemometr. Intell. Lab. Syst.

    (2013)
  • PiernaJ.F. et al.

    Comparison of various chemometric approaches for large near infrared spectroscopic data of feed and feed products

    Anal. Chim. Acta

    (2011)
  • RinnanA. et al.

    van den Berg F Engelsen S.B. Review of the most common pre-processing techniques for near-infrared spectra

    TRAC Trends Anal. Chem.

    (2009)
  • XuL. et al.

    Rapid and nondestructive detection of multiple adulterants in kudzu starch by near infrared (NIR) spectroscopy and chemometrics

    LWT - Food Sci. Technol.

    (2015)
  • YeM. et al.

    Rapid detection of volatile compounds in apple wines using ft-nir spectroscopy

    Food Chem.

    (2016)
  • ZhangL. et al.

    A comprehensive evaluation of random vector functional link networks

    Inform. Sci.

    (2016)
  • Cited by (17)

    • VasLine: Realize online detection and augmented NIR using deep learning

      2023, Engineering Applications of Artificial Intelligence
    • Near infrared spectroscopy as a fast and non-destructive technique for total acidity prediction of intact mango: Comparison among regression approaches

      2022, Computers and Electronics in Agriculture
      Citation Excerpt :

      The primary information that can be gathered from the interaction of the near-infrared radiation with the biological object is its physical, optical and chemical properties. Fruit, grain and forage material have shown to have identifiable CH, NH, and OH absorption bands in the near-infrared region whereas each have a specific vibrational frequency and it is different between one object and the others (Henríquez and Ruz, 2019). The whole measurement processing in NIRS generally consists of the following (Cen and He, 2007): (1) NIR spectra data acquisitions, (2) spectra pre-processing to eliminate noises and baseline shift from the instrument and background, (3) develop calibration models using a set of samples with known analysed concentration obtained by suitable and standard laboratory procedures correlated with sample spectra, and (4) validate the prediction models using another set of independent samples

    • Comparative assessment on smart pre-processing methods for extracting information in FT-NIR measured data

      2020, Measurement: Journal of the International Measurement Confederation
      Citation Excerpt :

      The development of more efficient methodologies is on a great demand for FT-NIR analysis of soil as there will be a large amount of bad quality, unpretreated soil samples at the online monitoring part. Extracting information and eliminating noise in spectroscopy analysis are very important for improving model predictive effect, especially for analytes with multi-components [9,10]. Linear method is widely used for comprehensively screening spectroscopic data, extracting information variables and overcoming spectral colinearity [11].

    • Stochastic parallel extreme artificial hydrocarbon networks: An implementation for fast and robust supervised machine learning in high-dimensional data

      2020, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      Moreover, it was proved that ELM is an effective universal approximation model (Huang et al., 2006a) when using as supervised learner. For instance, applications of ELM can be found in Henríquez and Ruz (2019), Chin and Ji (2018), Wang and Han (2015), Raghuwanshi and Shukla (2018), Wang et al. (2017), Geng et al. (2017), Nobrega and Oliveira (2015), Hu et al. (2017), Lu and Kao (2016), Yu et al. (2016) and Vitor de Campos Souza (2018). For implementation, the Moore–Penrose pseudo-inverse can be computed using singular value decomposition (SVD) (Kokkinos and Margaritis, 2018).

    View all citing articles on Scopus

    No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.engappai.2018.12.005..

    View full text