Direct Biomolecule Discrimination in Mixed Samples using Nanogap-Based Single-Molecule Electrical Measurement

doi:10.21203/rs.3.pex-2259/v1

Download PDF

Method Article

Direct Biomolecule Discrimination in Mixed Samples using Nanogap-Based Single-Molecule Electrical Measurement

https://doi.org/10.21203/rs.3.pex-2259/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 05 Jun, 2023

Read the published version in Scientific Reports →

Version 1

posted

You are reading this latest preprint version

In single-molecule measurements, metal nanogap electrodes directly measure the current of a single molecule. This technique has been actively investigated as a new detection method for a variety of samples. Machine learning has been applied to analyze signals derived from single molecules to improve the identification accuracy. However, conventional identification methods have drawbacks, such as the requirement of data to be measured for each target molecule and the electronic structure variation of the nanogap electrode. In this study, we report a technique for identifying molecules based on single-molecule measurement data measured only in mixed sample solutions. Compared with conventional methods that require training classifiers on measurement data from individual samples, our proposed method successfully predicts the mixing ratio from the measurement data in mixed solutions. This demonstrates the possibility of identifying single molecules using only data from mixed solutions, without prior training. This method is anticipated to be particularly useful for the analysis of biological samples in which chemical separation methods are not applicable, thereby increasing the potential for single-molecule measurements to be widely adopted as an analytical technique.

Single-molecule discrimination

Machine learning

Nano-Gap

Unlabeled data and unlabeled data classification (UUC)

Kernel density estimation (KDE)

Deoxyguanosine monophosphate (dGMP, Sigma-Aldrich) and deoxythymidine monophosphate (dGTP, Sigma-Aldrich) were diluted in Milli-Q water without any further purification process. The concentration of each solution of dGMP and dTMP used in the measurement was 10 μM. Measurements of dGMP:dTMP=3:1 used the mixture of 750 μM dGMP and 250 μM dTMP, and measurements of dGMP:dTMP=1:3 used the solution of 250 μM dGMP and 750 μM dTMP.

The measurement of single-molecules involved the utilization of a self-made MCBJ chip and a homemade jig.

A PC can be utilized to develop a machine learning classifier using Python 3.10.4, with the objective of predicting the identification/classification and mixture ratios of single-molecules based on laboratory tests

The main purpose of this study is to make single-molecule measurement and data analysis faster and more accurate so that this platform can be widely used as a new analytical technology in the future. Therefore, this study compared the new method with the old method for predicting the mixing ratio of dGMP (deoxyguanosine monophosphate) and dTMP (deoxythymidine monophosphate), and proved that the new method is faster and more accurate.

(1) Single-molecule measurement

After preparing the measurement solution, inject 10 μL of the solution into the PDMS well of the nanogap electrode chip. The nanogap distances are set to 0.52, 0.54 and 0.56 nm, respectively, and continuously controlled/maintained by feedback. For measurement, a bias voltage of 100 mV is applied to the electrodes. The step of a single measurement is 5 minutes, with a total of 60 minutes of measurement at each distance nanogap.

(2) Classification process

[Conventional method]

1. Signal extraction from raw data

Signals with a maximum current of 20 pA or more and a dwell time of 10 ms or more are individually extracted from the single-molecule measurement data.

2. Feature extraction

Extract features from the signal files. The factors of feature include Ip (peak current), Td (dwell time), 10-dimensional normalized current shape, and average current value.

3. Random Forest-based Classification with 10-fold Cross-Validation

In this study, the Random Forest (RF) classifier was employed for data classification. To evaluate the performance of the classifier, a 10-fold cross-validation technique was utilized. The dataset was divided into subsets, with one subset used for testing and the remaining subsets for training in each iteration. The RF classifier, with a parameter value of 100 for "n_estimators," was used to construct a single-molecule machine learning classifier using the dGMP and dTMP datasets.

4. Prediction of the mixing ratio of mixed solutions

The dataset is divided into training and testing sets. The Random Forest classifier is instantiated and trained using the training set. It constructs multiple decision trees and combines their predictions for accurate classifications. The trained classifier is then used to predict the mixing ratio of the mixture samples in the testing set. These predictions are compared against the true labels to evaluate the classifier's performance. Performance metrics such as accuracy, precision, and recall are calculated to assess the classifier's effectiveness.

[New method]

In this paper, we tried to classify two molecules without training data using the mixture measurement data used in the conventional method.

1. Signal extraction from raw data

Same as Conventional method

2. Feature extraction

Same as Conventional method

3. Classification with Kernel Density Estimation (KDE)

Classification is performed using Kernel Density Estimation (KDE), which is one of the algorithms belonging to the Univariate Unimodal Classifier (UUC) family. Probability density estimation is performed on the given training data, and weights are updated to proceed with the classification. Firstly, preprocess the training data by applying upper and lower bounds, shuffling the data, and performing undersampling if needed. Then, train the UUC algorithm using the preprocessed data. Next, load the prediction data using the specified file paths and labels. Preprocess the prediction data based on the specified features and conditions, set the prediction ratios, and finally, utilize the trained model to make predictions on the data.

[Conventional method]

Step (1)

Approximate time: 12 to 16 hours for 4 samples

Step (2)

Approximate time: 30 to 60 minutes

[New method]

Step (1)

Approximate time: 6 to 8 hours for 2 samples

Step (2)

Approximate time: 20 to 30 minutes

Compared to conventional method that had to be trained with individual samples, the new method was simple and anticipated high classification accuracy.

1. Ryu, J., et al. Single‐Molecule Classification of Aspartic Acid and Leucine by Molecular Recognition through Hydrogen Bonding and Time‐Series Analysis. Chem. Asian J. 17, e202200179 (2022).

2. Ohshiro, T. et al. Detection of post-translational modifications in single peptides using electron tunnelling currents. Nat. Nanotechnol. 9, 835–840 (2014).

3. Komoto, Y. et al. Time-resolved neurotransmitter detection in mouse brain tissue using an artificial intelligence-nanogap. Sci. Rep. 10, 1–7 (2020).

4. Yoshida, T., et al. Classification from positive and unlabeled data based on likelihood invariance for measurement. Intell. Data Anal. 25, 57–79 (2021).

5. Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

This work was supported by Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Numbers 19H00852, 21H01741, 22K14566 and Japan Science and Technology Agency (JST) Core Research for Evolutional Science and Technology (CREST) Grant Number JPMJCR1666 and JST Support for Pioneering Research Initiated by the Next Generation (SPRING) Grant Number JPMJSP2138, Japan. We would like to thank Editage (www.editage.com) for English language editing.

SupplementaryInformations.pdf

Download PDF

Journal Publication

published 05 Jun, 2023

Read the published version in Scientific Reports →

Version 1

posted

You are reading this latest preprint version

Direct Biomolecule Discrimination in Mixed Samples using Nanogap-Based Single-Molecule Electrical Measurement

Status:

Journal Publication

Version 1

Abstract

Reagents

Equipment

Procedure

Time Taken

Anticipated Results

References

Acknowledgements

Supplementary Files

Status:

Journal Publication

Version 1

Privacy Policy

Terms of Service

Cookie Settings