R. Thomson, D. Luettel, F. Healey, and S. Scobie, “Safer care for the acutely ill patient: Learning from serious incidents,” National Patient Safety Agency, 2007.
 K. E. Henry, D. N. Hager, P. J. Pronovost, and S. Saria, “A targeted real-time early warning score (trewscore) for septic shock,” Science Translational Medicine, vol. 7, no. 299, pp. 299ra122–299ra122, 2015.
 A. Rajkomar, E. Oren, K. Chen, A. M. Dai, N. Hajaj, M. Hardt, P. J. Liu, X. Liu, J. Marcus, M. Sun, P. Sundberg, H. Yee, K. Zhang, Y. Zhang, G. Flores, G. E. Duggan, J. Irvine, Q. Le, K. Litsch, A. Mossin, J. Tansuwan, D. Wang, J. Wexler, J. Wilson, D. Ludwig, S. L. Volchenboum, K. Chou, M. Pearson, S. Madabushi, N. H. Shah, A. J. Butte, M. Howell, C. Cui, G. Corrado, and J. Dean, “Scalable and accurate deep learning with electronic health records,” NPJ Digital Medicine, vol. 1, no. 1, 2018.
 J. L. Koyner, R. Adhikari, D. P. Edelson, and M. M. Churpek, “Development of a multicenter ward based AKI prediction model,” Clinical Journal of the American Society of Nephrology, pp. 1935–1943, 2016.
 P. Cheng, L. R. Waitman, Y. Hu, and M. Liu, “Predicting inpatient acute kidney injury over different time horizons: How early and accurate?,” in AMIA Annual Symposium Proceedings, vol. 2017, p. 565, American Medical Informatics Association, 2017.
 J. L. Koyner, K. A. Carey, D. P. Edelson, and M. M. Churpek, “The development of a machine learning inpatient acute kidney injury prediction model,” Critical Care Medicine, vol. 46, no. 7, pp. 1070–1077, 2018.
 M. Komorowski, L. A. Celi, O. Badawi, A. Gordon, and A. Faisal, “The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care,” Nature Medicine, vol. 24, pp. 1716–1720, 2018.
 A. Avati, K. Jung, S. Harman, L. Downing, A. Y. Ng, and N. H. Shah, “Improving palliative care with deep learning,” 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 311–316, 2017.
 B. Lim and M. van der Schaar, “Disease-Atlas: Navigating disease trajectories with deep learning,” Proceedings of Machine Learning Research, vol. 85, 2018.
 J. Futoma, S. Hariharan, and K. A. Heller, “Learning to detect sepsis with a multitask gaussian process RNN classifier,” in Proceedings of the International Conference on Machine Learning (D. Precup and Y. W. Teh, eds.), pp. 1174–1182, 2017.
 P. Nguyen, T. Tran, N. Wickramasinghe, and S. Venkatesh, “Deepr: A convolutional net for medical records,” IEEE Journal of Biomedical and Health Informatics, vol. 21, no. 1, pp. 22–30, 2017.
 R. Miotto, L. Li, B. Kidd, and J. T. Dudley, “Deep Patient: An unsupervised representation to predict the future of patients from the electronic health records,” Scientific Reports, vol. 6, no. 26094, 2016.
 Z. C. Lipton, D. C. Kale, C. Elkan, and R. Wetzel, “Learning to diagnose with LSTM recurrent neural networks,” International Conference on Learning Representations, 2016.
 P. Z. J. H. Yu Cheng, Fei Wang, “Risk prediction with electronic health records a deep learning approach,” in Proceedings of the SIAM International Conference on Data Mining, 850pp. 432–440, 2016.
 H. Soleimani, A. Subbaswamy, and S. Saria, “Treatment-response models for counter-factual reasoning with continuous-time, continuous-valued interventions,” arXiv Preprint arXiv:1704.02038, 2017.
 A. M. Alaa, J. Yoon, S. Hu, and M. van der Schaar, “Personalized risk scoring for critical care patients using mixtures of gaussian process experts,” arXiv Preprint arXiv:1605.00959, 2016.
 A. Perotte, N. Elhadad, J. S. Hirsch, R. Ranganath, and D. Blei, “Risk prediction for chronic kidney disease progression using heterogeneous electronic health record data and time series analysis,” Journal of the American Medical Informatics Association, vol. 22, no. 4, pp. 872–880, 2015.
 A. Bihorac, T. Ozrazgat-Baslanti, A. Ebadi, A. Motaei, M. Madkour, P. M. Pardalos, G. Lipori, W. R. Hogan, P. A. Efron, F. Moore, et al., “MySurgeryRisk: Development and validation of a machine-learning risk algorithm for major complications and death after surgery,” Annals of Surgery, 2018.
 A. E. W. Johnson, M. M. Ghassemi, S. Nemati, K. E. Niehaus, D. A. Clifton, and G. D. Clifford, “Machine learning and decision support in critical care,” Proceedings of the IEEE, vol. 104, no. 2, pp. 444–466, 2016.
 N. Tomasev, X. Glorot, J. W. Rae, M. Zielinski, H. Askham, A. Saraiva, A. Mottram, C. Meyer, S. Ravuri, I. Protsyuk, A. Connell, C. O. Hughes, A. Karthikesalingam, J. Cornebise, H. Montgomery, G. Rees, C. Laing, C. R. Baker, K. Peterson, R. Reeves, D. Hassabis, D. King, M. Suleyman, T. Back, C. Nielson, J. R. Ledsam, and S. Mohamed, “A clinically applicable approach to the continuous prediction of future acute kidney injury,” Nature, 2019.
 H. E. Wang, P. Muntner, G. M. Chertow, and D. G. Warnock, “Acute kidney injury and mortality in hospitalized patients,” American Journal of Nephrology, vol. 35, pp. 349–355, 2012.
 M. Kerr, M. Bedford, B. Matthews, and D. O’Donoghue, “The economic impact of acute kidney injury in England,” Nephrology Dialysis Transplantation, vol. 29, no. 7, pp. 1362–1368, 2014.
 A. MacLeod, “NCEPOD report on acute kidney injury—must do better,” The Lancet, vol. 374, no. 9699, pp. 1405–1406, 2009.
 A. Khwaja, “KDIGO clinical practice guidelines for acute kidney injury,” Nephron Clinical Practice, vol. 120, no. 4, pp. c179–c184, 2012.
 F. P. Wilson, M. G. S. Shashaty, J. M. Testani, I. Aqeel, Y. Borovskiy, S. S. Ellenberg, H. I. Feldman, H. E. Fernandez, Y. Gitelman, J. Lin, D. Negoianu, C. R. Parikh, P. P. Reese, R. Urbani, and B. D. Fuchs, “Automated, electronic alerts for acute kidney injury: a single-blind, parallel-group, randomised controlled trial,” The Lancet, vol. 385, pp. 1966–1974, 2015.
 S. J. Weisenthal, C. M. Quill, S. A. Farooq, H. A. Kautz, and M. S. Zand, “Predicting acute kidney injury at hospital re-entry using high dimensional electronic health record data,” arXiv Preprint arXiv:1807.09865, 2018.
 R. M. Cronin, J. P. VanHouten, E. D. Siew, S. K. Eden, S. D. Fihn, C. D. Nielson, J. F. Peterson, C. R. Baker, T. A. Ikizler, T. Speroff, and M. E. Matheny, “National Veterans Health Administration inpatient risk stratification models for hospital acquired acute kidney injury,” Journal of the American Medical Informatics Association, vol. 22, no. 5, pp. 1054–1071, 2015.
 S. J. Weisenthal, H. Liao, P. Ng, and M. S. Zand, “Sum of previous inpatient serum creatinine measurements predicts acute kidney injury in rehospitalized patients,” arXiv Preprint arXiv:1712.01880, 2017.
 H. Mohamadlou, A. Lynn-Palevsky, C. Barton, U. Chettipally, L. Shieh, J. Calvert, N. R. Saber, and R. Das, “Prediction of acute kidney injury with a machine learning algorithm using electronic health record data,” Canadian Journal of Kidney Health And Disease, vol. 5, 2018.
 Z. Pan, H. Du, K. Yuan Ngiam, F. Wang, P. Shum, and M. Feng, “A self-correcting deep learning approach to predict acute conditions in critical care,” arXiv Preprint arXiv:1901.04364, 2019.
 M. Singer, C. S. Deutschman, C. W. Seymour, M. Shankar-Hari, D. Annane, M. Bauer, R. Bellomo, G. R. Bernard, J.-D. Chiche, C. M. Coopersmith,et al., “The third international consensus definitions for sepsis and septic shock (sepsis-3),” Jama, vol. 315, no. 8, pp. 801–810, 2016.
 C. Rhee, R. Dantes, L. Epstein, D. J. Murphy, C. W. Seymour, T. J. Iwashyna, S. S. Kadri, D. C. Angus, R. L. Danner, A. E. Fiore,et al., “Incidence and trends of sepsis in us hospitals using clinical vs claims data, 2009-2014,” Jama, vol. 318, no. 13, pp. 1241–1249, 2017.
 P. Yadav, L. Pruinelli, A. Hoff, M. Steinbach, B. L. Westra, V. Kumar, and G. J. Simon, “Causal inference in observational data,” CoRR, vol. abs/1611.04660, 2016.
 Department of Veterans Affairs, “Veterans Health Administration: Providing health care for Veterans.”https://www.va.gov/health/, 2018 (accessed November 9, 2018).
 J. D. Hunter, “Matplotlib: A 2d graphics environment,”
Computing in Science & Engineering, vol. 9, no. 3, pp. 90–95, 2007.
 T. Oliphant, “NumPy: A guide to NumPy.”
http://www.numpy.org/, 2019 (accessed June 10, 2019).
 E. Jones, T. Oliphant, P. Peterson, et al., “SciPy: Open source scientific tools for Python.” http://www.scipy.org/, 2019 (accessed June 10, 2019).
 F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M.Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
 M. Reynolds, G. Barth-Maron, F. Besse, D. de Las Casas, A. Fidjeland, T. Green, A. Puigdomènech, S. Racanière, J. Rae, and F. Viola, “Open sourcing Sonnet - a new library for constructing neural networks.” https://deepmind.com/blog/open-sourcing-sonnet/, 2017.
 M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Van-houcke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” arXiv Preprint arXiv:1603.04467,
2015. Software available from tensorflow.org.
 T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and DataMining, KDD ’16, (New York, NY, USA), pp. 785–794, ACM, 2016.
 D. B. Suits, “Use of dummy variables in regression equations,” Journal of the American Statistical Association, vol. 52, no. 280, pp. 548–551, 1957.
 Y. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, “Efficient backprop,” in Neural Networks: Tricks of the Trade, This Book is an Outgrowth of a 1996 NIPS Workshop, (London, UK), pp. 9–50, Springer-Verlag, 1998.
 S. Romero-Brufau, J. M. Huddleston, G. J. Escobar, and M. Liebow, “Why the c-statistic is not informative to evaluate early warning scores and what metrics to use,” Critical Care, vol. 19, no. 1, p. 285, 2015.
 X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feed forward neural networks,” in International Conference on Artificial Intelligence and Statistics (Y. W. Teh and M. Titterington, eds.), vol. 9, pp. 249–256, 2010.
 D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” International Conference on Learning Representations, 2015.
 M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” European Conference on Computer Vision, 2014.
 J. M. Steppe and K. W. Bauer Jr, “Feature saliency measures,” Computers & Mathematics With Applications, vol. 33, no. 8, pp. 109–126, 1997.
 J. D. Fauw, J. R. Ledsam, B. Romera-Paredes, S. Nikolov, N. Tomasev, S. Blackwell, H. Askham, X. Glorot, B. O’Donoghue, D. Visentin, G. van den Driessche, B. Lakshminarayanan, C. Meyer, F. Mackinder, S. Bouton, K. W. Ayoub, R. Chopra, D. King, A. Karthikesalingam, C. O. Hughes, R. A. Raine, J. C. Hughes, D. A. Sim, C. A. Egan, A. Tufail, H. Montgomery, D. Hassabis, G. Rees, T. Back, P. T. Khaw, M. Suleyman, J. Cornebise, P. A. Keane, and O. Ronneberger, “Clinically applicable deep learning for diagnosis and referral in retinal disease,” Nature Medicine, vol. 24, pp. 1342–1350, 2018.
 B. Efron and R. J. Tibshirani, An introduction to the bootstrap. CRC press, 1994.
 B. Zadrozny and C. Elkan, “Transforming classifier scores into accurate multiclass probability estimates,” in Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 694–699, ACM, 2002.
 G. W. Brier, “Verification of forecasts expressed in terms of probability,” Monthly Weather Review, vol. 78, no. 1, pp. 1–3, 1950.
 A. Niculescu-Mizil and R. Caruana, “Predicting good probabilities with supervised learning,” in Proceedings of the International Conference on Machine Learning (L. D. Raedt and S. Wrobel, eds.), pp. 625–632, ACM, 2005.
 E. J. Topol, “High-performance medicine: the convergence of human and artificial intelligence,” Nat. Med., vol. 25, pp. 44–56, Jan 2019.
 J. Collins, J. Sohl-Dickstein, and D. Sussillo, “Capacity and learnability in recurrent neural networks,” International Conference on Learning Representations, 2017.
 K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
 D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” International Conference on Learning Representations, 2014.
 B. Shickel, P. J. Tighe, A. Bihorac, and P. Rashidi, “Deep ehr: A survey of recent advances in deep learning techniques for electronic health record (EHR) analysis,” IEEE Journal of Biomedical and Health Informatics, vol. 22, no. 5, pp. 1589–1604, 2018.
 M. Z. Nezhad, D. Zhu, N. Sadati, and K. Yang, “A predictive approach using deep feature learning for electronic medical records: A comparative study,” arXiv Preprint arXiv:1801.02961, 2018.
 J. Bradbury, S. Merity, C. Xiong, and R. Socher, “Quasi-recurrent neural networks,” International Conference on Learning Representations, 2017.
 T. Lei and Y. Zhang, “Training RNNs as fast as CNNs,” arXiv Preprint arXiv:1709.02755, 2017.
 A. Graves, G. Wayne, and I. Danihelka, “Neural turing machines,” arXiv Preprint arXiv:1410.5401, 2014.
 S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
 A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap, “Meta-learning with memory-augmented neural networks,” in Proceedings of the International Conference on Machine Learning (M. F. Balcan and K. Q. Weinberger, eds.), pp. 1842–1850, 2016.
 A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska-Barwi ́nska, S. G. Colmenarejo, E. Grefenstette, T. Ramalho, J. Agapiou,et al., “Hybrid computing using a neural network with dynamic external memory,” Nature, vol. 538, no. 7626, pp. 471–476, 2016.
 A. Santoro, R. Faulkner, D. Raposo, J. Rae, M. Chrzanowski, T. Weber, D. Wierstra, O. Vinyals, R. Pascanu, and T. Lillicrap, “Relational recurrent neural networks,” arXiv Preprint arXiv:1806.01822, 2018.
 X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (G. Gordon, D. Dunson, and M. Dudík, eds.), vol. 15, pp. 315–323, PMLR, 2011.
 A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proceedings of the International Conference on Machine Learning (S. Dasgupta and D. McAllester, eds.), vol. 30, p. 3, 2013.
 P. Ramachandran, B. Zoph, and Q. V. Le, “Searching for activation functions,” International Conference on Learning Representations, 2018.
 D. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (ELUs),” International Conference on Learning Representations, 2016.
 G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, “Self-normalizing neural networks,” in Advances in Neural Information Processing Systems (I. Guyon, U. Luxburg,1000S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), vol. 30, pp. 971–980, 2017.
 M. Basirat and P. M. Roth, “The quest for the golden activation function,” arXiv Preprint arXiv:1808.00783, 2018.