This protocol represents Part 2 of a 5-part series outlining the phases of methods for this initiative. Part 2 outlines the methods used to conduct keyword searches and implement machine-assisted screening of documents based on inclusion and exclusion criteria. See Figure 1.
1.0 Objective
Screening focused on identification of documents that meet our PICoST search criteria (see Introduction).
The goal if the screening phase was to identify documents that met the PICoST search criteria:
* Population (P): Global human or natural systems of importance to humans that are impacted by climate change
* Interest (I): Observed/documented adaptation responses to climate change within human systems (or human-assisted in natural systems) in the scientific literature
* Context (Co): Any empirically documented/observed adaptation response by humans
* Time & Scope (T/S): Published between 2013 and 2020 in the scientific literature
2.0 Scoping
To ensure we capture relevant documents, we conducted initial scoping to identify appropriate search terms. A list of 10 a priori identified publications were used to construct search terms and refine the search string (Table 1). We did not replicate the search strings of review articles within this list, but rather used the papers to identify potential search terms and better understand the range of terminology used in this field. This informed the development of unique search strings for this protocol.
3.0 Search String
Search strings were developed for each bibliographic database as shown below (Table 3). The searches focus on documents combining two concepts: climate change, and adaptation or response. Given the huge number of publications referring to environment and resilience, we restricted our search string to documents including reference to climate change or global warming in their titles, abstracts, or keywords; articles referring to weather, environmental variability, or meteorological variables without explicit reference to climate change are thus not captured. We included terms such as ‘resilience’ and ‘risk management’ to reflect the breadth of literature relevant to climate adaptation that is indexed using these terms. We use natural language terms only since Scopus and Web of Science do not employ controlled vocabulary (e.g. MeSH terms).
Documents retrieved from searches will be uploaded to a customized platform, MCC-APSIS for management and screening.
3.1 Languages
Database search — including bibliographic databases, organizational websites, and web-based search engines — will be conducted in English only, but screening will not exclude by language. This means that documents written in any language are eligible for inclusion as long as they are indexed in English within selected databases. Given the global scope of this review, it is not considered feasible to search in all global languages. In addition, the bibliographic databases that we will search typically catalogue records using translated English titles and abstracts: non-English searches are thus not largely necessary for these resources. We did retrieve and include a number of non-English documents in this review.
3.2 Estimating the comprehensiveness of the search
The search involved screening a large volume of literature (casting a wide net) to identify a much lower number of relevant documents (<5,000). We anticipated that the major screening restriction will be the requirement that there is empirical documentation of activities that are directly linked to potential risk/vulnerability reduction. Yet much literature relevant to adaptation remains either unreported, not labelled or tagged as adaptation or climate-related, or reported in forms or platforms inaccessible within a global review.
In particular, the review includes scientific literature only, excluding important sources of adaptation action in grey literature and other sources of Indigenous Knowledge and Local Knowledge (IKLK). Our initial protocol intended to include grey literature and web-based IKLK sources by including documents retrieved via Google searches. This was found to be infeasible for two reasons: 1) grey literature documents and links frequently do not include abstracts or summaries, and were thus not able to be screened and prioritized by our machine learning methods; 2) the volume of scientific literature was already substantial and exceeded initial available resources. We thus chose to focus on scientific literature only for this first GAMI initiative, recognizing that our results will not reflect all relevant adaptation actions.
4.0 Article Screening and Study Inclusion Criteria
4.1 Screening strategy
Screening entailed manual (human) and machine-learning to review the title and/or abstract or summary of each potentially relevant document to determine whether it could be included in a database of recent empirical research on human adaptation to climate change. Screening involved working with a huge volume of literature (close to 50,000 documents). We thus leveraged machine learning methods to automate part of the process so that the screening team could focus on tasks that required human judgment.
4.2 Inclusion & exclusion criteria
The goal of the screening team was to assemble a database of papers published between 2013-2019 on actions undertaken by people in response to climate change or environmental conditions, events and processes that were attributed or theorized to be linked, at least in part, to climate change. The focus was on adaptation; documents focusing on mitigation responses (i.e. reducing greenhouse gas emissions) were excluded. Adaptation actions could take place at any level of social organization (individual, household, community, institution, government). Adaptation responses to perceived climate change impacts were eligible for inclusion. Documents synthesizing climate change impacts on populations, without explicit and primary emphasis on adaptation responses were also excluded except when climate responses were synonymous with climate impacts (e.g. human migration or species shifts). Documents whose contributions are primarily conceptual or theoretical were treated as non-empirical and therefore excluded. We focused on documents that reported on responses that constituted adaptation based on a strict definition of the term: behaviors that directly aimed to reduce risk or vulnerability. Documents presenting empirical syntheses of vulnerability or adaptive capacity without primary or substantive focus on tangible adaptation responses (reactive or proactive) were excluded. Documents were considered eligible for inclusion if they explicitly documented adaptation actions that were theorized or conceptually linked to risk or vulnerability reduction. This excluded assessments of potential adaptation, intentions/plans to adapt, and discussion of adaptation constraints or barriers in the absence of documented actions that might reduce risk, exposure, or vulnerability.
Documents published between 2013 and 2019 were considered, including documents reporting on adaptations undertaken prior to 2013. Documents were not excluded from screening based on language as long as they are indexed in English. Documents were not excluded by geographical region, population, ecosystem, species, or sector. Grey literature was not included.
The screening team evaluated the suitability of each paper using a set of seven inclusion and exclusion criteria (Table 3).
These criteria were converted to a set of decision steps to facilitate efficient screening decisions and inter-screener reliability:
1. Does the paper have anything to do with climate change? If the paper does not explicitly or implicitly draw connections between the objectives, methods, or findings and global climate change, global warming, global change, or changes that are driven by global atmospheric variables, then the answer is no. If the answer is yes proceed to criterion number 2.
2. Does the paper report on analyses of empirical data (i.e., data derived from observation or experience; not theoretical or simulated) or a systematic review of empirical research? If the paper presents concepts and theories not grounded in empirical research, puts forth propositions without clear descriptions of methods, or the results of simulations that are not based on empirical data, then the answer is no. If the answer is yes, proceed to criterion number 3:
3. Does the paper report on findings about changes in human systems OR human-assisted changes in natural systems intended for adaptation in human systems (i.e., what people think and do)? If the paper reports on biological/ecological conditions and processes then the answer is no)? If the answer is yes, proceed to criterion number 4:
4. Does the paper report on how people respond to environmental change, including factors that influence how people respond? If the paper reports the results of an assessment of vulnerability or impacts from climate change, the answer is no)? If the answer is yes, proceed to criterion number 5.
5. Do the responses have to do with adaptation through reduction of risk or impacts, or improvements in well-being or suitability to the environment beyond mitigation? If the paper only reports on efforts to prevent, slow or reverse climate change, the answer is no. If the answer is yes, proceed to criterion 6:
6. Is the timeframe of the research current (e.g. within the past 10yrs) or recent? If the timeframe was prehistoric or historic, the answer is no. If the answer is yes, proceed to criterion 7:
7. Does the paper report on tangible/observed behavioral responses (e.g., actions, practices, improved knowledge, altered social structure) that people have undertaken and that could arguably reduce risk to people or improve people's ability to cope with/adapt to environmental change? If the paper reports on planned or recommended behavior change, the answer is no. If the answer is yes, the article should be included in the sample.
4.3 Machine learning contributions to screening
Given the large volume of documents requiring screening, we used machine learning techniques to filter and prioritize screening of documents that were most likely to meet inclusion criteria. To identify relevant documents within the larger set of retrieved documents, we used supervised machine learning. This approach involves manually screening (human coding) a subset of documents to ‘teach’ an automated classifier which documents are relevant according to a set of pre-defined criteria, and then use this trained classifier to predict the ‘most likely to be relevant’ literature (see for example: O’Mara-Eves et al., 2015). To be labelled as relevant, documents needed to meet inclusion and exclusion criteria (Table 3) based on their title, abstract, and keywords.
All searches were performed within a bespoke platform developed by the APSIS group at the Mercator Research Institute on Global Commons and Climate Change (https://github.com/mcallaghan/tmv), and duplicates were removed.
Initial manual screening: We first screened a random sample of documents retrieved via the search strings. This sample of documents was reviewed by multiple team members; the documents that were labelled differently by different team members were then discussed until consensus was reached, to reduce bias and ensure consistency between team members. This initial phase created the first of several training samples used to train the machine-learning algorithm to predict relevant documents.
Iterative screening and training of algorithm: This sample of manually screened documents was used to train a machine learning classifier to predict the relevance of remaining documents. The algorithm generated a ‘probability of relevance’ for all un-screened documents, allowing the screening team to prioritize screening of documents most likely to be relevant. Batches of documents with the highest predicted probability of relevance were then screened by hand, with iterative re-training of the classifier after each batch to continuously improve prediction. All documents identified by the algorithm as potentially highly relevant were manually screened by the screening team, with these results acting to improve the algorithm’s prediction of relevance with each manually screened batch.
Assessment of ‘borderline’ documents: This iterative process continued until the classifier stopped predicting new relevant documents, and most documents being identified were only borderline relevant. While not all documents had been manually screened by the screening team at this stage, none of the documents being identified by the algorithm as ‘likely to be most relevant’ were deemed by the screening team to be highly relevant. Further manual screening of documents identified by the algorithm found increasingly borderline and irrelevant documents, suggesting that the majority of relevant documents had already been identified and screened.
Estimating proportion of relevant documents retrieved through machine-learning. We used a random sample of the remaining un-screened documents to estimate how many of these documents might still be relevant. In total, 43,462 documents were not screened. Based on a sample of 200 of these documents, we found 3 potentially relevant documents, all of which were deemed as marginally relevant only. From these results, we can predict that the chance that we achieved a recall of less than 68% is less than 50%, and the chance that we achieved a recall of less than 50% is less than 5%. We feel these numbers are conservative, as the three relevant documents in the sample were only of marginal relevance. We therefore concluded that the returns of additional screening would be low.
All documents deemed eligible in the final database were manually screened by the screening team. The machine learning component of this review means that the screening team did not, however, screen 100% of documents retrieved via the search strings. Instead, the screening team screened a non-random sample of the documents predicted to be the most relevant to inclusion criteria, as predicted by the machine learning classifier. There will therefore be some relevant documents missed by this process. These will be cases where the classifier was unable to recognize that the document met inclusion criteria. In most cases, these were borderline-relevant documents.
4.4 Inter-screener reliability
To ensure consistent interpretation of the screening criteria between screeners, the members of the screening team each screened the same initial set of 50 documents and then compared and contrasted our application of the inclusion and exclusion criteria. After refining our collective interpretation of the criteria, we repeated this process with 100 documents, at which point we were confident that we were able to apply the criteria in a similar fashion.
Screeners were given the option of responding ‘Maybe’ to inclusion/exclusion where there was uncertainty regarding inclusion criteria, or where inclusion was borderline. In these cases, the Screening Coordinator (PF) double-screened the document.
4.5 Sample size
Of a total of 48,816 documents retrieved following duplicate removal, the screening team manually screened 4500 documents through an iterative screening process detailed in Table 4. Batches 1 and 3 were small batches of 100 documents each, and were used to train screeners and conduct consistency checking and ensure quality control of screening across team members. Batch 2 was conducted on a random sample of documents, and was used as the primary training batch for the machine classifier. From Batch 3 onwards, each screened batch represented a non-random sample of documents, selecting documents identified by the classifier as the most likely to be relevant. After each batch, the classifier was re-run to improve prediction performance. As the sample of manually screened documents increases, the classifier is increasingly able to differentiate relevance of documents to inclusion and exclusion criteria.
In later batches, the proportion of relevant documents decreases as the screeners have increasingly already screened the most relevant documents. Beginning with Batch 10, therefore, small batches were screened to iteratively assess the number of remaining relevant documents. Once this was determined to be minimized, Batches 12 and 13 were conducted on random samples of literature. These random samples of remaining documents allowed us to estimate the proportion of relevant documents remaining that had not yet been manually screened.
Performance statistics generated by the machine learning classifier showed negligible potential to increase recall further, meaning that the remaining un-screened documents were likely to be: a) not relevant and would be excluded if screened manually, or b) if relevant, would be borderline or marginally relevant, or c) relevant but include limited reference to key climate adaptation vocabulary (Figure 3). A total of 347 borderline or unclear documents were double-screened by the Screening Coordinator.
5.0 Coding of meta-data
Although the majority of data extraction was collected during a separate coding phase, we did extract some data during screening. This included:
1. Reason for exclusion
2. Sector (multiple answers allowed)
3. Region (multiple answers allowed)
Data on sector and region were collected only for documents meeting inclusion criteria for either database, and were designed to: a) allowed sorting of relevant documents by sector and region, and b) allow assignment of documents to coders with relevant expertise. Our extraction form also collected data on cross-cutting themes, though these were found to be unreliable and difficult to determine from the title and abstract only, and were thus not used for further coding or analysis.
6.0 Critical Appraisal
Critical appraisal was not used for article inclusion or exclusion since this review includes literature with a range of methods. Quality appraisal was, however, undertaken on all documents/studies meeting inclusion criteria, and was part of the assessment of confidence in evidence. Details are thus summarized in Protocol 3 (Data extraction).