One attractive possibility of infrared (IR) spectroscopy is that it may be applied to investigate class (i.e., treatment, tissue type, etc)-specific alterations in the absorption signature. Such alterations can act as biomarkers of mechanism associated with pathways or effects. One may be interested in investigating such alterations from different standpoints including: (1) intensity; (2) statistical significance; and, (3) composite (multi-spectral-region) alterations. These three view-concepts were implemented computationally and named BM1, BM2 and BM3. They can be easily applied to datasets of classed IR spectra through a user-friendly MATLAB interface.
BM1. Most intuitive of the methods. The mean spectrum from a given class is subtracted from the mean spectrum from a reference class (e.g., “vehicle control”) thus obtaining a “difference-between-means curve”.
BM2. Each variable (wavenumber) is taken at a time as input to a univariate linear classifier thus obtaining a per-wavenumber “classification rate curve”1. Cross-validation is used to determine classification rates. This method is close to the t-test criterion2, but more precise.
BM3. When multiple variables are assessed together, the joint-best variables for classification may differ substantially from the rank of the individual best variables. This method generates a histogram that represents how many times each wavenumber appeared within the TopVars (method parameter: number of “best variables”) “best variable set” achieved through feature selection, which is repeated many times according to NoBootstraps (method parameter: number of validation bootstraps).
The aim of this protocol is identify and visualize class-related biomarkers in IR spectral datasets by means of a simple sequence of steps to be executed under a user-friendly interface (Figure 1). Two visual representations are provided where all BM results are presented concurrently allowing for comparison of results generated by each method.