Demonstration of protocol
Continuous sensor data (pH and DO) were collected at 15-minute intervals for a 60-day period. During the first 40 days of operation, the bioreactor transitioned toward steady state operation (less than 5% deviation in treatment) characterized by dampened oscillatory behavior. Starting on day 40, a pulse of 1.0 mg/L AgNP was added, and then at subsequent 30-day intervals two more pulses incremented by 1.0 mg/L were added (1.0 mg/L on day 40; 2.0 mg/L on day 70; 3.0 mg/L on day 100). Throughout this period the reactor was operating continuously. The protocol below describes the computational packages and steps required to repeat the analysis, and the approach can be used for any system with the appropriate amount of data. Time series data were stored each week (and backed up on an external hard drive) for subsequent processing vai the protocol below. Excel files with representative data from the experiments (pH and DO data after three pulse additions of nanoparticles) can be found in the supplemental section of this protocol.
Computational packages: The steps in the protocol were run with R 3.6.1 and the following packages: forecast v8.12 (Fourier spectrum)15, Rssa 0.13-1 (singular spectrum analysis)16; tseriesChaos 0.1-13 (phase space reconstruction and surrogate data analysis) 17; fractal 2.0-1 (compute AAFT surrogates) 18; multispatialCCM19 (convergent cross mapping); and Code 9.8 (phenomenological modeling) detailed in Huffaker, Bitelli, and Rosa6, and available at http://www.dista.unibo.it/~bittelli/.
Step 1: Signal processing of the three AgNP pulses for the O2 and pH records as described in McLamore et al. (2020)1.
Critical Step: See troubleshooting section for a summary of the failure of signal processing to detect strong signals in sensor data.
A. Fourier Frequency Spectrum
The raw data and results of Fourier analysis are shown in Fig. 2. Fig 2A shows the raw O2 data (15 min time scale) and Fig 2B shows the raw pH data (60 sec timescale) for three consecutive pulse additions of AgNP (see Excel file in supplemental section for sample data). Fourier spectra (Fig 2C-D) are developed using the forecast v8.12 package and plotted (the spectrum is generated by the R package). As demonstrated by the peaks in Fig 2C-D, the three-pulse series for both O2 and pH each display a dominant cycle length of 200 (15-minute blocks) = 50 hours. This 50-hour dominant cycle length was easily identifiable in the plots.
B. Singular Spectrum Analysis
Singular Spectrum Analysis (SSA) involves analysis of waveform(s), and then determination of the strength of the signals. To prepare for SSA, each sensor time series is standardized by removing the mean and dividing by the standard deviation.
The standardized data (black curves) and isolated signals (red curves) resulting from SSA are plotted in Fig. 3. Each time course refers to a AgNP concentration following the convention in Fig 2. The dominant 50-hour cycle can be adequately captured by resampling the data in 2-hour blocks (i.e., averaging every eight 15-minute blocks). Benefits of this reduction include working with substantially fewer observations, and reducing noise by averaging higher-frequency volatility. Fig 3 shows the records replotted on a 2-hour time scale, where damped oscillations are apparent for both O2 and pH.
Critical Step: Rerun the Fourier Spectrum on the resampled 2-hour blocks to ensure that it reproduces the dominant 50-hour oscillation in 25 two-hour blocks.
Critical Step: When the raw data (black curves in Fig 3) are plotted with the isolated signals (red curves in Fig 3), the plots will be aligned if there is no noise in the system; that is, if the signal accounts for all of the variation in the sensor record. Since we expect measurements to have some noise (i.e., unstructured variation), we do not expect the curves to be coincident.
Critical Step: Some signals contain only one dominant oscillation (see for example Table 1 and Fig 3). For example, pulse 1 for the O2 sensor data contained one 50-hour dominant oscillation (signal strength of 0.82 in Table 1 and also black/red curves in Fig 3A). Some signals are composed of multiple oscillations, for example, pulse 2 for the O2 sensor data contained three oscillations (signal strengths in Table 1 and also black/red/blue curves in Fig 3C).
The package Rssa 0.13-1 is used for singular spectrum analysis. Table 1 reports strength of these signals and constituent oscillations. We note that: (1) Signal strengths are relatively strong in explaining over half the total variation in the corresponding records (with the exception of pH Pulse 3); (2) Strengths are stronger for the O2 record than the pH record; (3) Strengths tend to decrease as AgNP contamination increases from pulse 1 to 3; and (4) As expected from the Fourier Spectra, 50-hour cycles account for most of signal strength.
Critical Step: Determination of whether signal strengths classify as “strong” is done on a case-by-case basis, and requires some expert judgement. In this case, signal strengths were strong in explaining over half the total variation in the corresponding records, as noted above. Care must be taken to establish a benchmark for making the evaluation of signal strengths when conducting singular spectrum analysis. This is similar to the interpretation of regression coefficients in linear plots, where users make judgements on what is a “good” R2.
Step 2: Phase Space Reconstruction
Time delay embedding and surrogate testing were carried out using the package tseriesChaos 0.1-13; fractal 2.0-1 was used to compute AAFT surrogates.
A. Time-Delay Embedding
Empirical attractors reconstructed from O2 and pH signals for the three AgNP pulse series are shown in the Fig. 4A. These attractors exhibit geometric regularity in the form of three-dimensional spiral-sink dynamics for each AgNP addition using coordinates of: (1) real-time (current) values of dissolved oxygen, denoted as O2(t); (2) dissolved oxygen values with a delay of five two-hour blocks, denoted as O2 (t+5); and (3) current values of pH, denoted as pH(t).
B. Surrogate Data Testing
The results of testing the null hypothesis that apparent geometric regularity shown in the empirical attractors was most likely generated by a mimicking linear stochastic process are found in Table 2. We see that all O2 AgNP pulse series’ pass surrogate tests for: (1) predictive skill since NSE values taken from empirical attractors (first column) are among the k largest values taken from surrogate attractors (the k largest values rest above the NSE values reported in the second column); and (2) permutation entropy since values taken from empirical attractors (third column) are among the k smallest values taken from surrogate attractors (the k smallest values rest below the those reported in the fourth column). The surrogate results for the pH signals are more inconclusive. Two of AgNP pulse series (1 and 3) fail the prediction test; while all three border-line pass the entropy test.
Critical Step: Compare empirical attractor characteristics to corresponding surrogate attractor characteristics. Test the null hypothesis that there is no significant difference in the measured characteristics. If null hypothesis is valid, apparent regularity in the empirical attractor is most likely due to linear stochastic dynamics and not deterministic non-linear dynamics.
Critical Step: Summary tables of surrogate data testing should be prepared for rapid analysis of predictive skill and permutation entropy tests. See for example summary Table 2 and also Overview section B.
Step 3: Convergent Cross Mapping (CCM)
Convergent cross mapping was run with package multispatialCCM19. CCM results indicate that causal interactions between O2 and pH shift with increased pulses of AgNP contamination (Fig. 4B). In this example dataset we see that pH remains a relatively strong driver of O2 for all concentrations of AgNP contamination, since CCM curves converge to correlation coefficients close to one (black curves). In the other direction, O2 becomes an increasingly weak driver of pH as injected contamination increases (from left to right in the figure), and by AgNP pulse 3 the interaction is indistinguishable from non-causal synchronized behavior, since cross mappings run at backward and forward delays peak at a positive level (Fig. 4C, right-most plot).
Critical Step: If CCM curves converge to correlation coefficients close to 1.0, this indicates that the process X is a strong driver of Y.
Critical Step: The convergence level for the CCM curve is used to infer the relative strength of the causal interaction.
Critical Step: When conducting extended CCM, non-zero positive peak values (i.e., greater than one) indicate that the interaction noted in CCM is indistinguishable from non-causal synchronized behavior.
These shifting interactions are summarized in the causal diagram shown in Fig. 5. In this simple example the causal maps are intuitive since there is only two variables and three test conditions, but in more complex data sets the use of causal maps is a highly useful visualization tool for understanding interactions between more than 3 nodes with multiple interactions.
Critical step: Causal diagrams are generated in external software based on exported output from package multispatialCCM19.
Step 4: Phenomenological Modeling
Code 9.86 was used for phenomenological modeling. The phenomenological digital proxies extracted from the O2 and pH signals are shown in Fig. 6A for each AgNP pulse, along with stability information (equilibria and eigenvalues) in Fig. 6B. This indicates that the models successfully reproduce the spiral-sink dynamics characterizing empirically-reconstructed bioreactor dynamics. Another indication of success is that the attractors reconstructed from solutions of the phenomenological models bear striking visual resemblance to the empirical attractors (Fig. 6C). Finally, maximum Lyapunov Exponents computed from the empirical and phenomenological attractors were similar as described in McLamore et al 1.
Critical Step: Eigenvalues can be used to determine if the digital proxy successfully reproduces the empirically reconstructed spiral-sink dynamics as described in Huffaker, Bitelli, and Rosa6, and available at http://www.dista.unibo.it/~bittelli/. In the case of the example data set shown here, this is the case since the computed eigenvalues are a triplet of a negative real eigenvalue and a complex conjugate pair with negative real parts (as in Fig 6).
Critical Step: To confirm reproduction of time series dynamics, compute the maximum Lyapunov Exponents from the empirical and phenomenological attractors as described in McLamore et al 1.
A. Causality Quantification
We used numerical partial derivatives computed from the phenomenological models—measuring the marginal response of a response variable to an incremental change in a driving variable—to quantify the interactions detected with CCM (Fig. 7). We observe that when AgNP concentrations are at or below 2.0 mg/L, pH (the system driver) has a negative marginal impact on O2 (black curves in Fig. 7A-B), while O2 (as the driver) has a positive marginal impact on the pH (red curves in Fig. 7A-B). However, the impact of pH on O2 decays to zero as time progresses. When AgNP concentration is 3.0 mg/L, pH has a detrimental impact on O2 and this relationship decays with time (Fig. 7C).
Critical Step: If the curve of X driver on Y output is positive, this indicates a positive marginal impact. Negative values indicate a negative marginal impact.
Critical Step: If the curve decays to zero (as in this case for some of the curves), this indicates that over time the impact decays (i.e., is non-persistent).