Dissertation > Excellent graduate degree dissertation topics show

Studies on Model Optimization and Model Transfer Methods of Near Infrared Spectroscopy

Author: ZhengKaiYi
Tutor: DuYiPing
School: East China University of Science and Technology
Course: Analytical Chemistry
Keywords: NIR spectroscopy fractional order Savitzky-Golay differentiation SCARS over fitting calibration transfer based on informative components
CLC: O657.33
Type: PhD thesis
Year: 2013
Downloads: 26
Quote: 0
Read: Download Dissertation

Abstract


In order to overcome the drawbacks of near infrared (NIR) spectroscopy, such as low absorption intensity and overlapped bands, chemometrics methods are used to construct models to extract chemical information. For the purpose of improving the prediction ability, the models should be optimized by spectral pretreatment and variable selection. And in the aim of improving generality of the models, the models should be executed calibration transfer.On aspect of spectral pretreatment, this paper applied fractional order Savitzky-Golay differentiation to preprocess NIR spectra. The fractional order Savitzky-Golay differentiation is the generalization of ordinary Savitzky-Golay differentiation (integral order Savitzky-Golay differentiation) while the ordinary Savitzky-Golay differentiation is the special case of fractional order Savitzky-Golay differentiation at integral order. Similar as ordinary Savitzky-Golay differentiation, the fractional order Savitzky-Golay differentiation also obtains the parameters of polynomial by fitting the data in the window of spectra. Then, with the aid of Riemann-Liouville fractional calculus theory and the parameters of polynomial, the results of differentiation can be obtained by the linear combination of the data in the window. Without complex mathematical formula, the fractional order Savitzky-Golay differentiation can obtain the spectra differentiation results by multiplying a band diagonal matrix on the right of raw spectra. Three datasets including diesel, wheat and corn datasets were applied to test this method. The results showed that compared with ordinary Savitzky-Golay differentiation, the proposed method can obtain more details of spectra to obtain small values of and root mean square error of cross valudation (RMSECV) and root mean square error of prediction (RMSEP), especially for the non-chemical information containing viscosity, density and hardness.A new variable selection method called stability competitive adaptive reweighted sampling (SCARS) was proposed. In SCARS, variable is selected by an index of stability that is defined as the absolute value of regression coefficient divided by its standard deviation. SCARS algorithm consists of a number of loops. In each loop, the stability of each variable is computed. Then based on stability, enforced wavelength selection and adaptive reweighted sampling (ARS) is used to select important variables. The selected variables are kept as a variable subset and further used in the next loop. After running the loops, a number of subsets of variables are obtained and the RMSECV of partial least square (PLS) models established with subsets of variables is computed. The subset of variables with the lowest RMSECV is considered as the optimal variable subset. The performance of the proposed algorithm was evaluated by three NIR datasets:tobacco, corn and wheat datasets. The results show that the SCARS can supply the least RMSECV and RMSEP comparing with methods of Moving Window PLS (MWPLS), Monte Carlo uninformative variable elimination (MCUVE) and competitive adaptive reweighted sampling (CARS).Furthermore, the overfitting caused by variable selection was also explored. We applied variable selection methods including SCARS, CARS and MCUVE to select variables from dataset without classification information generated from randomly variables. To our surprise, for the dataset without classification information, the variable selection methods can still select some "good" variable combinations to separate "two classes" with "low" prediction errors. Furthermore, the prediction errors decreased with the number of raw variables ascending. In addition to classification, when the randomly variables without regression information were generated, SCARS still selected "good" variable combinations to obtain low prediction errors. In essence, the phenomenon that variable selection method can obtain "good" variable combinations from uninformative variables is overfitting. In order to research the causes and diagnostic methods of the overfitting problems, the tobacco dataset were used by adding uninformative data torawspectra at different ratios to generate simulated data. After the simulated data had been constructed, the data were divided into two parts:calibration set and independent test set. Finally, variable selection was executed to compare the variation paths of RMSECV for calibration set with the corresponding variation paths of RMSEP for independent test set. The results show that when the ratio values of uninformative data to spectra are small (equal to or smaller than0.02for noise data as uninformative data and equal to or smaller than0.1for randomly permuted spectra as informative data), the paths of RMSECV are similar as those of RMSEP. While the ratio values are higher than0.02for noise data as uninformative data and0.1for randomly permuted spectra as informative data, the paths of RMSECV are different from those of RMSEP. The comparison of the paths between RMSECV and RMSEP can be used to evaluate the effect of variable selection:the high similarity of two paths means variable selection is effective while low similarity means variable selection is ineffective.For calibration transfer, we proposed a new calibration transfer method which corrects informative components instead of full spectral. This method employs partial least square (PLS) method for vector to extract the informative components related to predicted property from raw spectra and then corrects the informative components based on spectral transfer such as canonical correlation analysis (CCA), direct standardization (DS) and partial least square for matrix (PLS2). The performance of this algorithm was tested by three batches of spectra:corn dataset, tri-component solvent dataset and dataset of dimethyl fumarate in milk. The results showed that the performance of correcting informative components can decrease errors significantly in contrast with those of correcting full spectra.

Related Dissertations

  1. Hypertrophic scars CD90, Ⅰ, Ⅳ collagen and TGF-β Ⅰ type , Ⅱ-type receptor expression correlation analysis,R622
  2. Sea seamless Datum establishment,P229
  3. Influence Laws of Ground Particles Caused by Large Span Tunnel Blasting Construction in Hard Rock,U455.6
  4. The Research on Antarctic Sea Ice Variation, Sea-level Change and Their Relationship,P731.23
  5. Research on Calculation of Roadway Friction Resistance of Mine Ventilation,TD724
  6. Research and Application of Real-time Data Integration in TQC-DS System,TP311.52
  7. Research of the Related Problems on Fuzzy C-Means,TP311.13
  8. Research on Numerical Analysis Method for Color Matching in Textile Dyeing Based on Least Square Fitting,TS193.13
  9. Research on Cycle Slip Detection and Repair in GPS Carrier Phase Positioning,P228.4
  10. Research on the Models of GPS Height Fitting Based on BP Neural Network and Their Applications,P228.4
  11. Research and Design of Solar Cells Tester Based on Embedded Technology,TM914.4
  12. ECG QRS Complex Morphology Analysis Based on Feature Extraction,TN911.6
  13. Triangle exponential fitting Runge-Kutta method,O174
  14. Improve vehicle missile launchers deformation Method of measurement accuracy,TJ768.21
  15. Pipeline magnetic flux leakage detection signal reconstruction technology within,TN911.23
  16. 5mm FM Detection System Research and Implementation of Signal Processing,TN911.7
  17. Research on Women’s Fitting Raglan Sleeve Structure and Automatic Generation System of the Pattern,TS941.2
  18. Traffic Flow Analysis Using Real-Time Traffic Data,U491.112
  19. Reseach of Statistics and Analysis in Academic Electronic Resources Full-Text Usage Data,G250.74
  20. Humidity Contribution Analysis to Cn~2 on Surface layer in the Area of Cobi,P425.2
  21. Study on Approaches of MIMO Radar Array Design,TN958

CLC: > Mathematical sciences and chemical > Chemistry > Analytical Chemistry > Instrument analysis ( physics and physical chemistry ) > Photochemical analysis ( spectral analysis method) > Infrared Spectroscopy
© 2012 www.DissertationTopic.Net  Mobile