Notes on nomenclature: italics denotes a parameter that is supplied to a function. A bracketed [parameter] is optional. >> denotes the Matlab command line; anything following >> is meant to be entered on the command line. All time estimates for the functions are predicate on a model of about 1200 genes, 2300 reactions, 1800, metabolites, and a 2.4 GHz Intel Core 2 Duo processor. When substantial preprocessing efforts are required, we provide time estimates based on personal experience.
Initializing the Toolbox
1 | Navigate to the directory where you installed the Toolbox: >> initCobraToolbox()
2| Save the paths added if desired: >> savepath()
Changing COBRA solvers
3| Set the solvers used by the COBRA Toolbox using the following function:
Variables are defined as follows: solverName specifies the solver package to use; the COBRA Toolbox currently supports ̳gurobi‘, 'tomlab_cplex', 'glpk', and 'qpng'. solverType (default 'LP') specifies the type of problems ('LP', 'MILP', 'QP', 'MIQP', 'NLP') to solve with the solver specified by solverName. When changeCobraSolver is called without any arguments, it will return the names of the current solvers settings.
Run COBRA Toolbox test suite [~103 s]
4 | The test suite contains scripts that test the functionality of scripts within the Toolbox. The scripts in the the testing directory provide useful examples of many of the Toolbox's functions.
testAll sequentially navigates the test suite directory (testing) and runs each test. Upon completion, it displays which tests were completed successfully and which failed.
Caution! For solver suites other than Gurobi or Tomlab, the user may encounter failures that require tuning of solver parameters.
Critical! It is not necessary for all tests to be passed to use the COBRA Toolbox for your applications. It is only necessary that the tests related to your work do not fail.
Read COBRA-compliant SBML models into MATLAB [~102 s]
5 | Load a COBRA-compliant model into MATLAB. To load a model, navigate within MATLAB to the directory containing the model and call the following function from the command window:
model = readCbModel([filename]);
When called with no arguments, readCbModel will prompt the user to select a file using a dialog box. readCbModel supports SBML-formatted (Level 2 versions 1 or 4) files. SBML files for a variety of organisms are available from the BiGG database31. The function returns a COBRA Toolbox model structure containing the necessary fields to describe the model for use with subsequent steps. See SSupplementary Material for a description of the fields in a COBRA Toolbox model structure; hereafter, model denotes a COBRA Toolbox model structure.
CRITICAL STEP! If the model is not properly loaded into MATLAB, none of the following functions will work. Ensure that libSBML and SBML Toolbox are properly installed and accessible by MATLAB and that the SBML file is formatted correctly.
Saving the model
6 | COBRA Toolbox model structures may be saved as text or SBML files. On Microsoft Windows, the structures may also be written to an Excel (xls) file.
writeCbModel(model, format, [fileName], [compSymbolList], [compNameList], [SBMLLevel], [SBMLVersion]);
For format use 'sbml' for SBML file format or 'xls' for Excel format (only available on MS Windows). For filename use the name of the file. If not provided, a dialog box will prompt the user to specify name and location of the output file. This feature is dependent on the SBML Toolbox to generate the XML file. The toolbox is able to output SBML level 2 versions 1 or 4.
Modify COBRA Toolbox models
7 | Once the model is loaded into MATLAB by readCbModel, the model can be modified to simulate different conditions such as altering reaction bonds (A), adding (B) or removing reactions (C) or changing the model objective (D).
(A) to alter reaction bounds:
model = changeRxnBounds(model, rxnNameList, value, boundType);
rxnNameList is a cell array of reaction ids corresponding to reaction ids in model.rxns; value is a floating point number; boundType specifies which bounds to change for the reactions and can take values of 'l', 'u', or 'b' for lower, upper, or both, respectively. This function is useful for defining the in silico media composition by changing the lower bounds of exchange reactions.
(B) New reactions can be added to a COBRA Toolbox model using the following function:
[model] = addReaction(model, rxnName, metaboliteList, stoichCoeffList, [revFlag], [lowerBound], [upperBound], [objCoeff], [subsystem], [grRule], [geneNameList], [systNameList], [checkDuplicate]);
metaboliteList is a list of metabolites involved in the reaction (if a metabolite does not exist in model.mets then this function will add it);
stoichCoeffList is the stoichiometric coefficients for the corresponding elements in metaboliteList. This function checks for reactions with the same name or stoichiometic coefficients, however this can be disabled by setting checkDuplicate to false.
(C) To remove a reaction, call the following function:
[model] = removeRxns(model, rxnRemoveList)
rxnRemoveList a cell array of reaction ids corresponding to elements in model.rxns. Metabolites that are no longer involved in any reactions are removed from the model. The model may no longer function after reactions have been removed.
(D) COBRA modeling often entails performing calculations that focus on a specified objective, such as growth59. To change the objective function, use the following function:
model = changeObjective(model, rxnNameList, [objectiveCoeff]);
rxnNameList is either a string or a cell array of strings containing reaction ids corresponding to elements in model.rxns that should be included in the objective function; objectiveCoeff specifies the weight given to the respective reaction in rxnNameList. If left empty, objectiveCoeff is assumed to be 1.
Omics-Guided Creation of Context-Specific Models. Timing ~102 s + >1 hr to format data
8 | An emerging application of genome-scale reconstructions is analyzing omics data in a systems context14, 25, 26. In particular, this procedure is useful for building cell-, tissue-, or condition-specific models.
createTissueSpecificModel is designed to map transcriptomic or proteomic data onto a reconstruction using two established algorithms (GIMME25 or Shlomi26). The GIMME algorithm is an LP procedure that best matches high-throughput data to an original flux distribution derived from the full model; thus the algorithm requires a predefined objective function. The Shlomi algorithm is an MILP procedure that best matches high-throughput data to pathway length, thus avoiding the need for a predefined objective function. Novice users can utilize the GIMME algorithm with two inputs: the COBRA model and expression data; while more experienced users can tweak additional parameters.
[tissueModel,Rxns] = createTissueSpecificModel(model,expressionData, [proceedExp],[orphan],[exRxnRemove],[solver],[options], [funcModel]);
Required Inputs: model is a reconstruction with gene-protein-reaction associations; expressionData is a structure that contains two inputs: .Locus (a vector of GeneIDs matching gene ids in model.genes), and .Data (a vector of presence/absence calls). Optional Inputs: proceedExp (default value of 1, to process multiple data sets, set proceedExp to 0); orphan (default value of 1) controls whether or not reactions with no known gene-protein-reaction associated are included when peforming Shlomi-based network trimming (orphan reactions are always included when the GIMME reaction is employed, regardless of the orphan setting); exRxnRemove is a list of select exchange reactions that are excluded (that is if a specific cell or tissue is known not to have a particular metabolite transporter); solver is either 'GIMME' or 'Shlomi' and defaults to 'GIMME'; options is only used with the GIMME algorithm, and it specifies which reactions comprise the objective function (by default, the objective function is chosen from model.c with a 90% (0.9) threshold); funcModel controls whether the output tissueModel is fully functional (every reaction can carry a flux) or not when using the GIMME algorithm. Output: tissueModel is the final cell-, tissue-, or condition-specific model generated from the function; Rxns is a structure containing statistics about what reactions were or were not expressed based on the omics data and what reactions were added or removed from the model (see Anticipated Results).
Visualization Timing ~101 s
9 | Visual representation of a metabolic network can aid in understanding the model. Maps for a variety of metabolic pathways are available for many of the models hosted in the BiGG knowledgebase (http://bigg.ucsd.edu). See Supplementary Material for a description of the map file format. These maps may be used for other organisms that have similar metabolic pathways, given that the user uses the same metabolite and reaction ids as the BiGG model that was used to create the map. To load a map the following command is used:
map = readCbMap([filename])
If readCbMap is called with no arguments, a dialog box will prompt the user to select a map file. After the map has been read into MATLAB, it can be viewed as a MATLAB figure or a scalable vector graphic (svg).
10| To view a map as a MATLAB figure the following commands are used
where, options is a map options structure created by setMapOptions. See software documentation for description of optional parameters and ANTICIPATED RESULTS for an example.
11| To save a map as an SVG file the following commands are used:
By default, drawCbMap will create a file named 'target.svg' in the current working directory. The filename can be set by inputting additional parameters:
Simulate optimal growth using flux-balance analysis (FBA) Timing <102 s
12| Simulating optimal growth using FBA is one of the fundamental COBRA phenotypic calculations for metabolic network models. FBA is a method that calculates the flow of metabolites through a metabolic network28. Growth is simulated by optimizing the model for flux through the model's biomass function; however, it is also possible to perform simulations that focus on optimizing other biological characteristics, such as ATP production. The reaction to optimize is set using the model.c vector (see step 7D).
In addition to specifying an objective, it is also necessary to define the in silico growth medium; this is accomplished by modifying the bounds of exchange reactions. Exchange reactions for metabolites comprising the in silico growth medium should have a lower bound less than 0; all other exchange reactions should have a lower bound of 0. All exchange reactions should have an upper bound greater than 0 to prevent metabolite build up. The solution returned will have units based on the units used in the model (typically mmol·gDW-1· h-1). FBA can be performed either in (A) standard (B) geometric mode:
(A) Standard FBA is performed with:
[solution] = optimizeCbModel(model, [osenseStr], [minNorm], [allowLoops])
where: osenseStr is either 'max' or 'min' to maximize or minimize the value of the objective; minNorm (default 0, if nonzero, attempt to find a solution that minimizes the presence of loops0; allowLoops (default true, if set to false, use the loop law algorithm49 to remove loops—this proceedure can be time consuming).
optimizeCbModel will return a solution structure containing: the objective value 'f', the primal solution 'x', the dual solution 'y', the reduced cost 'w', a universal status flag 'stat', a solver specific status flag 'origStat', and the time to compute the solution 'time'. The primal solution, 'x' represents the flux carried by each reaction within the model. The dual solution, 'y' represents the shadow prices for each metabolite and indicates how much the addition of the corresponding metabolite will increase or decrease the objective value28, 60. The reduced cost, 'w', indicates how much each reaction affects the objective. A solver status of 1 indicates that an optimal solution was found.
(B) Geometric FBA44 is an alternative to standard FBA. Geometric FBA attempts to return the minimal flux distribution central to the bounds of the solution space while still maintaining optimal growth rate. The flux distribution returned should then be reproducible regardless of the solver used.
flux = geometricFBA(model,[varargin])
The function returns the vector 'flux' which contains the centered optimal flux distribution.
13| Visualizing an Optimal Flux Distribution
The optimal flux distribution obtained using optimizeCbModel or geometricFBA can be overlaid onto an existing map of the model using:
drawFlux(map, model, flux, [options], [varargin])
where: map is a map object created with readCbMap (see Visualization Step 9); model is the COBRA model structure that was used for performing FBA or Geometric FBA; options is a drawCbMap options structure.
14| Classification of Model Genes Based on Optimal FBA Solution Parsimonious FBA (pFBA) is an FBA approach that incorporates flux parsimony as a constraint to categorize the solution space61.
The concept of flux parsimony, in the context of a metabolic network, means to minimize the total material flow required to achieve an objective.
In this method, genes are classified into six categories: (1) essential genes, metabolic genes necessary for in silico growth in the given media; (2) pFBA optima, non-essential genes contributing to the optimal growth rate and minimum gene-associated flux; (3) enzymatically less efficient (ELE), genes requiring more flux through enzymatic steps than alternative pathways that meet the same predicted growth rate; (4) metabolically less efficient (MLE), genes requiring a growth rate reduction if used; (5) pFBA no-flux, genes that are unable to carry flux in the experimental conditions; and (6) Blocked, genes that are only associated with the reactions that cannot carry a flux under any condition ("blocked" reactions).
To categorize the genes and reactions within a model and return a model with flux minimization constraints, execute the following:
[GeneClasses, RxnClasses, modelIrrevFM] = pFBA(model, [varargin])
Where, GeneClasses contains a list of all genes that are within the categories above; RxnClasses contains a list of all reactions that are within the categories above; and modelIrrevFM is a model that contains the flux minimization constraints. If a map is available for the model, the results from this function can be visualized by using the 'map' and 'mapoutname' flags in the varargin input. A test case may be found in the ANTICIPATED RESULTS section. Additional options are described in the software documentation directory.
The subsequent steps in this protocol rely on the functionality of optimizeCbModel. If optimizeCbModel fails to return a feasible flux distribution for the examples within this protocol, the problem may be due to the installation of the LP solver. It is not necessary that geometricFBA return a solution for the subsequent steps.
Solving COBRA problem structures (Advanced User) Timing >10 s
15 | The COBRA toolbox has five function calls used for solving different optimization problems. Basic users will not need to call these low level functions directly as higher level functions encapsulate these calls. These functions act as a common interface for different LP, MILP, QP, MIQP, and NLP solvers ensuring that labs can share code even when using different installed solvers.
The five solver functions use a similar input argument structure: problem structure followed by optional argument/value pairs. The required fields in the problem structure vary for each function to supply the required information to solve the type of problem. For example, the mixed integer problem structures require a field which specifies variable type (continuous, integer, binary). A description on the format of COBRA problem structures can be found in Supplementary Material. The COBRA solution structure also provides a common output format regardless of the solver used.
[solution] = solveCobraLP(LPproblem, [varargin])
[solution] = solveCobraMILP(MILPproblem, [varargin])
[solution] = solveCobraQP(QPproblem, [varargin])
[solution] = solveCobraMIQP(MIQPproblem, [varargin])
[solution] = solveCobraNLP(NLPproblem, [varargin])
Simulating deletion studies Timing ~102-104 s
16 | Deletion studies can be easily simulated with in silico models. Gene deletion methods within the Toolbox are dependent on the proper setup of the gene-reaction matrix as well as the rules defining the Boolean relationship between genes and reactions. Reactions that are affected by a gene deletion have their upper and lower flux bounds set to zero and are therefore not functional. The set of reactions on which a gene deletion has an effect is calculated using the gene reaction association and rules.
It is possible to study either (A) single essential gene deletions or (B) pairs of synthetic lethal genes. The possible results from deletion studies are: 1) unchanged maximal growth, 2) reduced maximal growth, or 3) no growth (lethal). Deletion studies can be used to predict gene/reaction essentiality.
(A) Essential Gene Study
[grRatio, grRateKO, grRateWT, hasEffect, delRxns, fluxSolution] = singleGeneDeletion(model, method, [geneList])
where: method can be either 'FBA' (default) 'MOMA'62 or linear MOMA ('lMOMA'); geneList, is a cell array of genes corresponding to model.genes (if not provided deletion simulations are performed for all genes in the model); grRatio is the growth rate of the knockout / growth rate of WT; grRateKO is the growth rate of the knockouts; grRateWT is the wild-type growth rate; hasEffect is a Boolean list that contains true for each gene whose deletion alters the growth rate; delRxns contains a list of the reactions, the bounds of which are set to 0 for each gene deletion; and fluxSolution is the flux solution for each deletion.
(B) Synthetic Lethal Study
[grRatioDble, grRateKO, grRateWT] = doubleGeneDeletion(model, method, [geneList1], [geneList2])
where: method can be either 'FBA' (default) 'MOMA'62 or linear MOMA ('lMOMA'); geneList1 is a cell array of genes corresponding to model.genes (if not provided, the function assumes all genes in model.genes are to be interrogated); geneList2 is a cell array of genes that correspond to the second set of genes in the synthetic lethal pair (if not provided, the function assumes that all genes in model.genes are to be interrogated); grRatioDble is the growth rate of the knockout / growth rate of WT; grRateKO is the growth rate of the knockouts; and grRateWT is the wild-type growth rate.
Flux Variability Analysis (FVA) Timing ~102 s
17 | FBA only returns a single flux distribution that corresponds to maximal growth under given growth conditions. However, alternate optimal solutions may exist which correspond to maximal growth. FVA calculates the full range of numerical values for each reaction flux within the network63.
To determine the minimum and maximum flux values that the reactions within the model can carry while obtaining a specific percentage of optimal growth rate:
[minFlux maxFlux] = fluxVariability(model, optPercentage, [rxnNameList], [verbFlag], [allowLoops])
where optPercentage (default 100) specifies the percentage of optimal that an alternate flux distribution must realize to be considered an acceptable alternative flux distribution.
Visualization of Flux Variability Analysis Results
18| To visualize the results from this function, a flux variability map can be generated from an existing reaction map, color coding reactions based on flux directionality.
drawFluxVariability(map, model, minFlux, maxFlux, [options])
where: map is the map structure corresponding to the model read in using readCbMap; model is the COBRA model structure used in the fluxVariability function; minFlux and maxFlux are vectors generated by the fluxVariability function described above; and options is a structure containing optional parameters such as edge and node color and size: bi-directional reversible reactions are colored green, unidirectional reversible reactions that carry flux in the forward direction are colored magenta, unidirectional reversible reactions that carry flux only in the reverse direction are colored cyan, and irreversible fluxes are colored blue.
Sampling the solution space [>102 s] (Advanced User)
19| FBA only returns a single optimal point and thus gives little information about the entire solution space. An alternative approach is to characterize the solution space using sampling27. The generalized parallel sampler samples any arbitrary linearly- constrained space by moving a fixed number of points in parallel.
[sampleStructOut, mixedFrac] = gpSampler(sampleStruct, [nPoints], [bias], [maxTime], [maxSteps])
where: sampleStruct is the COBRA Toolbox problem structure for linear programming problems (see Supplementary Information); nPoints is the number of sample points is set through; maxTime is the maximum sampling time; bias is a structure that imposes marginal distributions on reactions; sampleStructOut is sampleStruct with the addition of the 'points' field containing the solutions; and mixedFrac gives an estimate of how mixed the sampling solution is relative to the warmup points—a mixedFrac value of 0.5 indicates complete mixing.
*Fluxomics (Advanced User)
Timing >102 s*
20 | Carbon 13 tracing experiments provide the ability to measure internal flux rates in a metabolic network64. To use this data, additional information about carbon tracking must be added to the COBRA model. This is stored in the .isotopomer field as described in Supplementary Information Section S.3. In order to use the 13C solver, the functions must be generated:
[experiment] = generateIsotopomerSolver(model, inputMet, [experiment], [FVAflag])
where: model is the COBRA model with an .isotopomer field; inputMet is a string corresponding to the 13C labeled input; experiment is a list of metabolites that must be measured; and FVAflag removes reactions that cannot carry a flux.
21| Two solvers are generated, one based on the cumomer method65 and one on the faster EMU method66. The solvers are called internally during the scoreC13Fit function below. A given flux distribution can be scored against a set of 13C data:
output = scoreC13Fit(v0,expdata,model)
where: vo is the initial guess for fitting; and expdata is one or more sets of experimental data described in Supplementary Information Section S.3.
22| Next, the most optimal flux distribution can be found with a non-linear optimization as such:
[vout] = fitC13Data(v0,expdata,model, [majorIterationLimit])
This function will return the flux with the lowest experimental score found by the NLP solver. Very often it is useful to compute the confidence intervals of reactions which are consistent with 13C data.
[vs, output, v0] = C13ConfidenceInterval(v0, expdata, model, max_score, [directions], [majorIterationLimit]); (~102 s)
where: v0 is the initial guess; expdata is the experimental data that must be fit; max_score is the highest acceptable score; and directions is the list of reactions and reaction ratios which will be maximized and minimized (by default all reactions).
Gap Filling [~103 s]
23 | Due to incomplete knowledge, a metabolic model may possess gaps. A gap is defined as missing biochemical information which can explain discrepancies between model predictions and experimental data. Gaps are typically reactions that facilitate the conversion of an available metabolite in the model to one necessary to achieve an objective. Identifying gaps in metabolic models can be attempted using either (A) detectDeadEnds or (B) gapFind.
(A) Detect Dead Ends in a Model
outputMets = detectDeadEnds(model,[removeExternalMets])
The detectDeadEnds function searches the model.S matrix for metabolites that participate in only one reaction (can only be produced or only be consumed) and returns the corresponding indices for the metabolites in the model.mets field. Setting removeExternalMets to true removes external metabolites from the results. Not all gaps can be identified by simply inspecting the model.S matrix.
(B) Find All Gaps in a Model
The GapFind algorithm45 allows one to find all gaps in a model and all metabolites that are downstream from a model gap.
[allGaps, rootGaps, downstreamGaps] = gapFind(model, findNCgaps, verbFlag)
where: allGaps is a list of the metabolite indices for a metabolite at a gap; rootGaps is a list of metabolites that cannot be produced; and downstreamGaps is a list of metabolites that are produced in a reaction that requires a metabolite that cannot be produced.
This function is run in an interactive and iterative fashion to guarantee that all gaps are identified. Set the lower bound of all exchange reactions within model to -1, the upper bound on all reactions to a relatively large positive number (for example 105), and the lower bound of all reversible reactions to a relatively large negative number (for example -105) within model. The appropriate bound magnitude required varies from model to model. If the bound magnitudes are too small, the algorithm will incorrectly identify many metabolites as gaps; if this occurs, increase the bound magnitudes by 10-fold. Repeat this process as necessary until the algorithm does not identify all metabolites as gaps.
24| In addition to these two gap identification functions, the Toolbox includes an optimization-based algorithm (growthExpMatch) that identifies candidate reactions to fill gaps in the model53. growthExpMatch identifies the minimum number of reactions from a universal reaction database that are required for a metabolic model to grow on a specified substrate.
[solution] = growthExpMatch(model, KEGGFilename, compartment, iterations, dictionary, logFile, threshold)
where: KEGGFilename is the name of the reaction .lst file downloaded from KEGG67, 68; compartment is a string denoting which compartment to generate exchange reactions for; iterations controls the number iterations to run the function; dictionary is an n by 2 cell array that maps metabolites to KEGG IDs; logFile is the name of the .mat file to save the solution to; and threshold is the minimum value that the biomass function can take for the model to be considered growing.
25| Display the growthExpMatch solution by printing the log file using the following function:
where: dir is the directory containing the growthExpMatch solution .mat file; and matFile is the name of the growthExpMatch solution .mat file.
Metabolic Engineering Timing ~102-103 s
25 | The COBRA Toolbox version 2.0 provides three methods for in silico metabolic engineering: (A) OptKnock46, (B) OptGene47, and (C) GDLS48.
OptKnock runs the OptKnock algorithm46 to determine reaction sets to knock out for the overproduction of a specific product when the model is optimized for internal cellular objectives.
[OptKnockSol, biLevelMILPproblem] = OptKnock(model, selectedRxnList, [options], [constrOpt], [prevSolutions], [verbFlag])
where: OptKnockSol contains the best knockout set; and biLevelMILPproblem is the MILP problem generated by the algorithm and subsequently solved. See ANTICIPATED RESULTS for an example setup of options and constrOpt structures.
There are several things to take note of when calling the OptKnock function. First the function does not use the upper and lower bounds set within the model that is passed in. The model is first converted into irreversible format, splitting reactions with a lower bound < 0 and upper bound > 0. The resulting set of reactions has its lower bounds set to 0 and upper bounds set to options.vMax. Use the constrOpt structure to apply constraints on reactions, such as a minimal flux through the biomass function or ATP maintenance. Failure to set the proper constraints may lead to incorrect predictions generated by the function.
OptGene is an evolutionary programming-based method to determine gene knockout strategies for overproduction of a specific product47. It can handle non-linear objective functions such as product flux multiplied by biomass.
[x, population, scores, optGeneSol] = OptGene(model, targetRxn, substrateRxn, generxnList, maxKOs, [population])
where: targetRxn specifies the reaction to optimize; substrateRxn specifies the exchange reaction for the growth; generxnList is a cell array of strings that specifies which genes or reactions are allowed to be deleted; and maxKOs sets the maximum number of knockouts; x is the best scoring set as determined by the functions optGeneFitness or optGeneFitnessTilt; population is the binary matrix representing the knockout sets; and optGeneSol is the structure summarizing the results. If resuming a previous simulation, the binary matrix (population) can be specified.
(C) Genetic Design Local Search
The Genetic Design Local Search (GDLS) algorithm48 may be used to identify what to knock out to increase in silico production of desired metabolites
[gdlsSolution, biLevelMILPproblem, gdlsSolutionStructs] = GDLS(model, targetRxns, [vargin])
where: targetRxns is a specific list of genes, gene sets, or reactions to knock; gdlsSolution is the knockout solution; biLevelMILPproblem is the bi-level MILP problem for the solution; and gdlsSolutionStructs containes the intermediate solutions. This approach typically runs faster than the global search performed by OptKnock, however, it is not guaranteed to identify the global optima.