1. Introduction

Data, both experimental and observational, have always played a central role in science, as Strasser (2012) points out, but recently the creation and curation of data, datasets, and datastreams have become of intense interest to information scientists as well as to a new generation of data scientists (Halevi & Moed 2012). The question of how data from any specific source are found, used, and cited remains understudied, as compared to the numerous studies of citations to individual papers, scientists, and theories, all of which have been greatly facilitated by the increasing standardization of those citation practices over time (Williams 2011), which has yet to become the case for data citation practices (Altman et al. 2015; Mayernik 2012; Mooney & Newton 2012).

Acknowledging the use of externally-obtained data was once considered merely a ‘scholar’s courtesy’ (Cronin, 1995); today, such acknowledgment is often elicited by either disciplinary or publication pressures, though it remains far from universal, despite the increase in funding agency and institutional requirements for data attribution and sharing (Kim & Stanton 2016). Even such sophisticated approaches to identifying datasets as the development of personalized, contextualized dataset search engines (Singhal, Kasturi, & Srivastava 2014; Singhal & Srivastava 2013; Singhal & Srivastava 2017) are hampered by this lack of uniformity.

According to Parsons & Fox (2013), actual data sources are much more diverse than is often recognized, ranging from an individual scientist’s own development of small datasets, through scientific support systems that manage the data production for entire laboratories, through large-scale installations that produce massive datastreams for different disciplines or in different locations. There is also wide variation among data, ranging from qualitative to quantitative data types, from potentially replicable experimental data to unique, unrepeatable observational data, from small datasets to enormous datastreams, making the interrelated questions of data attribution (Hou & Mayernik 2016; Hou & Mayernik 2017), data citation (Borgman 2015), and data provenance (Tilmes, Yesha & Halem 2010) even more complex.

Although such recent data-centric initiatives as the Joint Declaration of Data Citation Principles (Data Citation Synthesis Working Group 2014), DataCite (Rueda, Fenner & Cruse 2016), and the Data Citation Index (Robinson-García, Jiménez-Contreras & Torres-Salinas 2016) have helped in the creation and communication of these identification standards, their adoption and use continue to be limited within many scientific communities (Goldstein, Mayernik & Ramapriyan 2017). Borgman et al. (2014) observe, however, that even very large ‘knowledge infrastructures’ that produce and utilize data in such well-established areas as astronomy, for instance, vary in their robustness, as telescopic arrays and other astronomical instrumentation, both ground and space-based, must rely on economic, policy, and scientific support to maintain their viability over time. Tracking data uses is becoming an important part of these efforts to gain this support, as well as communicating best practices in data curation and data citation.

Accordingly, institutional studies from major knowledge infrastructures, such as that of Apai et al. (2010) on the use of Hubble Telescope data in scientific papers over a ten year period, Parsons et al. (2011) on the management of diverse, interdisciplinary data within the International Polar Year project, Belter (2014) on the value of oceanographic datasets from the National Oceanic & Atmospheric Administration, Peng et al. (2015) on the development of a unified data stewardship maturity model at NOAA’s Cooperative Institute for Climate and Satellites, and Mayernik (2016) on the institutional data curation and dissemination practices at organizations such as the University Corporation for Atmospheric Research are now gaining visibility, with varying suggested solutions to these data-related issues, and should help to promote standardization across disciplines.

Adding to the challenges faced by knowledge infrastructures, the rapid increase in the number of environmentally embedded wireless sensors providing real-time, heterogeneous data for a wide variety of applications has contributed significantly to the so-called ‘data deluge’ (Porter, Hanson, & Lin 2012). Connected sensors of all types have been predicted to number a trillion by 2025 (Johnson 2015). Studies focusing on the various factors involved in the creation, curation, and citation of sensor data, particularly sensor datastreams in the sciences, include work by Bates, Lin & Goodale (2016), Borgman (2015), Cragin, Palmer & Chao (2010), Ganguly et al. (2007), and Mayernik, Wallis & Borgman (2013).

The present article is a contribution to this literature on the citation of specific sensor datastreams, as it examines the use of environmental sensor data over the past 20 years from a well-established knowledge infrastructure, the extensive mesoscale network (or ‘Mesonet’) for weather-related information throughout the state of Oklahoma. The history of the standardization of weather-related data in the United States has been described by Edwards (2010), Fiebrich (2009), Fleming (2016), and Klemm & McPherson (2017), while the evolution of global weather prediction techniques and tools has been detailed by Bauer, Thorpe & Brunet (2015) and Lynch (2008). A study of scientific papers focusing on the various uses of sensor data from a statewide mesoscale network such as that of Oklahoma, however, has not previously been performed, and should be of value in understanding the opportunities and challenges presented by such studies.

2. The Oklahoma Mesonet

This section describes the history and current operations of the Oklahoma Mesonet. Discussions concerning the mesonetwork installation studied here began in the 1980s, as the result of an ongoing collaboration between agricultural researchers at Oklahoma State University and meteorological researchers at the University of Oklahoma to develop a near real-time, extremely reliable source of surface observational data about local weather conditions across the state (Brock 2013; Crawford 2013). This need was felt to be particularly critical for Oklahoma, given the state’s long history of drought and drainage issues. Losses due to crop failure (Ding, Hayes & Widhalm 2011) and urban flooding (Waite 2011) ran into the billions of dollars. The initiative garnered support throughout the state, largely due to the efforts of those involved in this unprecedented partnership among scientists at Oklahoma’s two major research universities. Improvements in both research into and mitigation of weather impacts were the intended outcomes of this extensive project.

As a result, the Oklahoma Mesonet, a statewide network of 121 automated environmental monitoring stations, was officially launched in March of 1994. These 10-meter-tall towers, one or more of which is located within each of Oklahoma’s 77 counties, provide regular measurements of air and soil temperature, barometric pressure, rainfall, relative humidity, solar radiation, soil moisture, and wind speed and direction, both direct and calculated, including instrumentation data, all of which are then packaged into ‘observations’ that are regularly transmitted every 5 minutes to the Oklahoma Climatological Survey, where the data quality is immediately verified and then made available to Mesonet users.

The Oklahoma Mesonet has been termed the ‘gold standard’ among statewide climate and weather networks because of its well-known attention to quality assurance and quality control (National Research Council 2008: ix). As explained by Campbell et al. (2013: 575), quality assurance involves specific processes to ensure that the sensor networks and associated protocols are developed and adhered to in a way that minimizes inaccuracies in the data produced. This proactive approach attempts to avoid system problems that may produce suspect data, while the subsequent quality control measures are employed to identify and flag any suspect data after they have been generated. It should be noted that the explosive growth in sophisticated mathematical models for numerical weather prediction and environmental problem solving increases rather than lessens the importance of the quality of the data being utilized (Apte 2015).

Mesonet data and products are stored and delivered in a number of ways. The official data archive is stored in self-describing Network Common Data Form (Unidata 2017). From the NetCDF archive, approximately 1300 products are routinely produced, many of which are updated every five minutes. Data files are delivered in several text formats (e.g., comma delimited or fixed column length) for users to download via Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), or Local Data Manager (LDM).

Although some commercial and media entities are licensed users that receive Mesonet data on a paid basis, the data are also made available online free of charge at www.mesonet.org to other users who wish to access them (Klockow, McPherson & Sutter 2010). Examples of Mesonet data users include emergency and public safety personnel, other state and local government agencies, aviation and other transportation firms, farmers, ranchers, construction firms, water experts, weather forecasters, and a variety of others vitally interested in Oklahoma’s severe storms, dry line movements, soil moisture, drought conditions, heat impact, and other weather-related phenomena (Ziolkowska et al. 2017).

This direct distribution of data is in addition to the Oklahoma Mesonet’s participation in various data distribution channels, such as the National Oceanic & Atmospheric Administration’s National Mesonet Program, the U.S. Department of Agriculture’s National Agricultural Statistics Service, and the U. S. Geological Survey’s National Soil Moisture Network, and various other provisions of data made upon request to the Oklahoma Mesonet.

The Oklahoma Mesonet’s requested acknowledgment statement in its agreement for researchers using its data (as shown under the heading ‘Oklahoma Mesonet Data Access Policy’ on its website) is that ‘Oklahoma Mesonet data are provided courtesy of the Oklahoma Mesonet, which is jointly operated by Oklahoma State University and the University of Oklahoma. Continued funding for maintenance of the network is provided by the taxpayers of Oklahoma.’

Beginning in June 2014, the ‘For Researchers’ section of the website has listed two technically-oriented articles about the Mesonet (Brock et al. 1995 and McPherson et al. 2007) as useful potential citations by any researchers working with Oklahoma Mesonet data. These articles are also often informally recommended by the Oklahoma Mesonet staff to anyone wishing to learn more about the technical details of the Mesonet.

Beginning in March 2016, the ‘About the Mesonet’ section of the website has listed the following DOIs for Oklahoma Mesonet data and datasets: 10.15763/dbs.mesonet (for all Mesonet products) and 10.15763/dbs.mesonet.standard (for standard Mesonet observations). These are the first permanent identifiers of Oklahoma Mesonet data made available on its website.

3. Methodology: Oklahoma Mesonet Sensor Data Publication Analysis

The specifics of the study and its methods are described in this section. The overarching goal of the research effort is to identify, collect, and analyze all peer-reviewed scientific articles published in the English language that may have utilized sensor data from the Oklahoma Mesonet during the first decades of its existence, to determine who used the data, how it was used, and which specific data and data site(s) were used. Although the Oklahoma Mesonet has been routinely compiling a bibliography of publications pertinent to its efforts since its inception, the study described here is an initial attempt toward a formal analysis of the peer-reviewed scientific publications, their uses of the Oklahoma Mesonet data, and toward further efforts to understand and facilitate such Mesonet data usage.

Mayernik et al. (2017: 1344) assert that ‘Manual processes for collecting publications that cite data sets … are fraught with problems given that they are time-consuming, require multiple tools (e.g., WoS, Google Scholar, and literature databases), and do not scale.’ This project was undertaken with the understanding that efforts to identify and analyze all published journal articles using the Mesonet data will be time-consuming and complex, so a pilot project would be a useful initial step towards a full analysis, which would in turn further inform and potentially impact Mesonet data operations and delivery. The initial pilot project processes are shown in Figure 1.

Figure 1 

Pilot project processes for Oklahoma Mesonet data.

Prior to the start of this project in May 2015, the Oklahoma Mesonet online bibliography listed 509 peer-reviewed journal articles published between 1994 and 2014. The bibliography, although extensive, was known to be incomplete, as its collecting emphasis had been on research in the agricultural and atmospheric sciences and closely related areas. No 2015 journal publications had been identified and listed in the bibliography at that time. An important focus of the project, therefore, has been to identify, obtain, and add all relevant peer-reviewed scientific papers published between 1994 and the present day to the online bibliography.

The second step was to plan a pilot project that would inform future analysis of the corpus being constructed from articles obtained for the bibliography. In order to obtain a historical perspective on the development and diffusion of these articles, as well as an understanding of whether and how the Mesonet data and its uses have been described in various disciplines, three different publication years were selected for this initial analysis: the first full year of Mesonet operations (1994–5) hereafter referred to as 1995 for brevity, the end of the first decade of Mesonet operations (2005), and the end of the second decade of Mesonet operations (2015).

This study’s methodology combines the standard bibliometric techniques of citation analysis and co-word analysis, with a close reading of the identified articles’ text, graphic content, and references. Beginning with the online bibliography of articles routinely maintained by the Oklahoma Mesonet itself, an initial set of 11 journal articles from the Mesonet’s first operational year was identified, providing the preliminary pilot project corpus. (Four of these articles were found outside the bibliography and added during the pilot project.) The second set of 21 articles from 2005 was also identified. (Three of these articles were found outside the bibliography and added during the pilot project.) Next, additional research using author, full text, keyword, and topic searches for the use of the phrases ‘Oklahoma Mesonet,’ ‘Oklahoma Mesonetwork,’ and ‘Oklahoma Climatological Survey’ were performed in the Web of Science and Google Scholar for the publication year of 2015 to identify any peer-reviewed journal articles in English potentially using Oklahoma Mesonet data. (As the Mesonet DOIs mentioned above were not publicly available prior to March 2016, these were not part of the pilot project search process.) Additionally, any citations to the 1995 articles in 2005 and 2015 publications were examined, as possible sources of additional Oklahoma Mesonet data uses. All articles found in these searches were examined to determine whether they included explicit mention of the use of Oklahoma Mesonet data. Those articles not making explicit use of Mesonet data were removed from the pilot project corpus, though still retained in the bibliography for future study.

All initial identification, categorization and coding of data usage were first done by project researchers not affiliated with the Oklahoma Mesonet, then checked for accuracy by researchers affiliated with the Mesonet, then re-checked by the researchers not affiliated with the Mesonet. This was to ensure that both scientific and non-scientific aspects of the data usage were captured correctly.

As shown in Table 1, the number of articles analyzed for each year is as follows: 1995 (11 articles), 2005 (21 articles), 2015 (78 articles) for a total of 110 articles using Oklahoma Mesonet data. Of these 110 articles, 85 were new additions to the online bibliography as well.

Table 1

Analyzed articles from Oklahoma Mesonet corpus.

Year Main

1995 11
2005 21
2015 78
Total: 110

The 110 articles in this analysis were published in 62 different journals, in such diverse fields as agriculture (representing 18% of the 62 journals), biological sciences (13%), environmental sciences (27%), hydrology (16%), meteorology (18%), and other interdisciplinary areas (8%). The use of the data is clearly not confined to a single research specialty or discipline.

4. Findings

The findings from the pilot study of these 110 scientific publications, their citations of, and uses of the Oklahoma Mesonet sensor data are described in detail in this section.

4.1 Analysis of Oklahoma Mesonet Pilot Project Publications

1995 Publications

As shown in Table 1, there were 11 peer-reviewed journal articles published during the initial year of Oklahoma Mesonet operations. These earliest publications focused on the economic considerations, technological specifications, and potential uses of the mesoscale network and its data for agriculture and meteorology.

The most highly-cited article from this period (Brock et al. 1995), co-authored by eight authors from the University of Oklahoma and Oklahoma State University, all closely involved in the development of the Oklahoma Mesonet, is a technical overview of the network itself, including its physical composition and procedures for quality assurance. The second most-highly-cited article (Rasmussen et al. 1994), co-written by seven authors from the National Oceanic & Atmospheric Administration’s National Severe Storms Laboratory, also located in Norman, Oklahoma, describes the planned use of Mesonet data in the study of tornadogenesis and associated severe weather phenomena. The next three most-cited articles (Morrissey et al. 1995; Ryzhkov & Zrnic 1995a; Ryzhkov & Zrnic 1995b) deal with techniques for validating precipitation estimates through comparisons with empirical data from Mesonet rain gauges.

Thus, unsurprisingly, most early attention to the possibilities provided by data from the Oklahoma Mesonet came from Oklahoma-affiliated scientists interested in evaluating and extending existing measurement instruments and techniques, particularly in hydrology and meteorology.

2005 Publications

The next decade of publications utilizing data from the Oklahoma Mesonet showed broadening in scientific scope by the 21 papers published in 2005. The Brock et al. (1995) paper continued to be well-cited for its technical information, for instance in Schroeder et al.’s (2005) description of the development of the West Texas Mesonet, which was modeled after the Oklahoma Mesonet. It was also cited continuously in atmospheric research such as Demoz et al.’s (2005) determination of the causes of undular bores in the boundary layer, in agricultural research such as MacKown & Carver’s (2005) study of dual-purpose winter wheat cultivars, and in interdisciplinary investigations like those of Haugland & Crawford (2005) and McPherson & Stensrud (2005) on the impact of the replacement of tallgrass prairie by agricultural cropland on daily atmospheric conditions at the mesoscale.

A number of publications focused on uses of Oklahoma Mesonet data in other disciplines, such as biology (e.g., Major et al.’s (2005) study of algal mass microbial communities in the Oklahoma salt flats), ecology (e.g., Verberg et al.’s (2005) experimental study of soil CO2 efflux within a tallgrass prairie ecosystem), and geology (e.g., Harris, Tapp & Sublett’s (2005) research on remediation of oil-field brine impacted soil). Mesonet data was also employed in several data assimilation projects, notably Dunne & Entekhabi (2005)’s extensive use of precipitation data to compare improvements in estimations by various filter ensembles for satellite remote sensing of soil moisture.

The geographic distribution of scientists using Mesonet data continued to broaden as well. While a third of the papers published in 2005 were authored by scientists at Oklahoma universities, the other two-thirds showed lead authorship from six other states, as well as from the National Aeronautics & Space Administration’s Goddard Space Center in Maryland and the European Centre for Middle-Range Weather Forecasting in Great Britain.

2015 Publications

The further expansion in scope for use of Oklahoma Mesonet data is readily apparent in the 78 papers published in 2015. Interestingly, both the Brock et al. (1995) and the McPherson et al. (2007) articles were cited by 25% of these 2015 papers, though it was not possible to ascertain whether this was due more to their prevalence in the literature prior to June 2014 or to their recommendation as potential citations on the Oklahoma Mesonet website beginning in June 2014.

Atmospheric research, especially in regard to severe weather phenomena, exemplified by such work as that of Brotzge & Lutrell (2015) on interactions between boundary layers and mesocylonic winds in tornadogenesis, Xu et al. (2015) on vortex wind analysis in tornadic mesocyclones, and French et al. (2015) on bulk “hook echo” precipitation drop size distributions in tornadic and nontornadic supercells, all continued extensive use of the atmospheric pressure, wind, and precipitation data. A growing number of papers used the soil moisture data to explore terrestrial impacts, particularly warning signs for flash droughts (Ford, et al. 2015) and for wildfires (Krueger et al. 2015), as well as for the effects of long-term drought (Bajgain et al. 2015).

Relatedly, environmental research using Oklahoma Mesonet data was performed for various species, ranging from the increasingly endangered greater prairie chicken (Hovick et al. 2015), to the more common Western chicken-turtle (MacKnight et al. 2015), to the wide assortment of crustaceans found in seemingly desiccated Oklahoma prairie playa (Bright & Bergey 2015).

Agricultural uses of the data continued to be strong, especially for research on improving management of resources such as cattle (Scasta et al. 2015), crops (Lollato & Edwards 2015), and timber (Bendixsen, Hallgren & Frazier 2015). Increased attention to potential energy resources, such as biofuels (Yimam, Ochsner & Kakani 2015), solar (McGovern et al. 2015), and wind (Stadler, Dryden & Green 2015) also drew upon Mesonet sensor data for economic and technical model building.

The most striking increase, however, was in the number of data assimilation projects, in which Mesonet surface data was combined with a variety of remote sensing data in order to develop models for various complex systems such as those involving boundary-layer (Endo et al. 2015), convection (Jones et al. 2015), erosion (Garbrecht & Zhang 2015), groundwater (Guzman et al. 2015), precipitation (Liu, Sorooshian & Gao 2015), river basin (Islam & Gan 2015), soil moisture (Kim, Mohanty & Shin 2015), watershed (Goldstein & Tarhule 2015) and wind speed (Genton, Padoan & Sang 2015) research. Validation efforts, such as the comparison by Kuster, Lunday & McPherson (2015) between North American Regional Climate Change Assessment Program output and Oklahoma Mesonet observations, emphasized the necessity of attending to the data themselves in all these endeavors.

Geographic diversity of lead authorship for these 78 papers using Mesonet data increased substantially in 2015 as well, indicating diffusion of data usage among different institutions during the decade since 2005. While Oklahoma-based researchers were lead authors of approximately 55% of the papers, the remaining 45% showed primary authorship by scientists from thirteen other states and four foreign countries.

4.2 Analysis of Oklahoma Mesonet Data Citation and Data Uses Categorization

This section describes how the Oklahoma Mesonet data were referenced, and how the data were used in the 110 papers from the pilot study. In ‘How to Cite Earth Science Datasets,’ Parsons (2012) observes that current data citation practices in earth sciences publications include the following: citing a traditional publication that contains the data; not mentioning the source of data, just using the data in tables or figures; referencing the name or source of data in text; including a Uniform Resource Locator (URL) in text (with variable degrees of specificity); citing a related paper; citing an actual data set, typically using a recommended citation format given by data center; or citing a data set including a persistent identifier, typically a DOI.

Since the Oklahoma Mesonet made its dataset DOIs available on its website in 2016, after the 2015 papers were published, no use of these DOIs was observed in this pilot study. Of the other six practices, the study found only three: referencing the name or source of data in text, including a URL in text (with variable degrees of specificity), or citing a related paper. The most commonly-used phrases used to refer to the Mesonet data sources in text were ‘Oklahoma Mesonet’ and ‘Oklahoma Climatological Survey,’ although these were only sometimes cited as sources in the reference lists of the papers. A number of papers (2 in 2005; 33 in 2015) provided the URL of either the Oklahoma Climatological Survey or the Oklahoma Mesonet in the text or the reference list. Finally, the Brock et al. 1995 paper often appeared as a reference accompanying an initial mention of the ‘Oklahoma Mesonet’ and could be considered to be a ‘related paper.’

As noted above, citations to Oklahoma Mesonet data have varied to such an extent over the past 20 years, even within disciplines, that it remains a time-consuming task to identify these, and to determine their respective formats. However, as a general observation, it can be said that most of the researchers utilized only highly select portions of the entire datastream available to them, and most did not comment on why these specific selections were made, presumably because it was thought to be obvious to the reader within the particular scientific context.

Because the Mesonet datastreams include multiple observations (e.g, temperature, windspeed, etc.) from specific station locations at specific times, and because the papers in the pilot project seldom included all details of the data actually used, further investigation would be necessary in most cases in order to identify and match the actual Mesonet data from specific site locations and times to recreate or replicate the research performed. This problem of so-called ‘data granularity’ (that is, the level of depth to which data can be referred) is endemic to large datasets and datastreams, however, and, in the case of the Mesonet, may eventually be addressed through new data curation and identification techniques. At present, however, it remains a very labor-intensive process to ‘match’ the stored Mesonet data to particular studies.

Uses of the Mesonet data shown in the pilot study were diverse, though the most prevalent uses tended to be event-oriented, focused on phenomena such as severe storms; geographically-oriented, focused on phenomena such as prairie environments; or time-oriented, focused on phenomena such as long-term droughts. Importantly, explicit use of the data also was seldom repeated or replicated by different studies, with the exception of a few specific severe storm events. This may be a result of the ‘data granularity’ issue discussed in the preceding paragraph.

Based on the descriptions given of various uses of the specific data within each of the 110 papers citing Oklahoma Mesonet data, we empirically derived a broad typology of data uses across disciplines, noting of course that different scientists may use the same data in different ways. The five major categories are assimilation, experimentation, observation, simulation, and validation. A sixth category, utilization, includes all other uses not otherwise categorized.

  • Assimilation describes the use of Oklahoma Mesonet sensor data in an ensemble with a variety of other data for purposes of system-level prediction or similar large-scale pattern analyses.
  • Experimentation involves the use of Oklahoma Mesonet sensor data as a controlled factor within an experimental or quasi-experimental environment.
  • Observation involves the use of Oklahoma Mesonet sensor data as a non-controlled factor within an experimental, quasi-experimental, or other environment.
  • Simulation involves the use of the Oklahoma Mesonet sensor data with or without other data for purposes of modeling.
  • Utilization indicates a direct utilization of Oklahoma Mesonet data not otherwise categorized here.
  • Validation involves the use of the Oklahoma Mesonet sensor data to validate other data or to ground-truth other instruments.

Table 2 provides textual samples showing the variety in data citation formats and the six derived categories (assimilation, experimentation, observation, simulation, utilization, validation) from the data usages.

Table 2

Textual samples of Oklahoma Mesonet data citation categorization.

Category Source Text of Data Citation

Assimilation (Tanamachi, Heinselman & Wicker 2015: 511–512) Twenty-eight Oklahoma Mesonet (Brock et al. 1995) stations were operating in the analysis domain on 24 May 2011 … recording observations of relative humidity RH, aspirated air temperature, wind speed, wind direction, and atmospheric pressure every 1 min. Of particular interest are the data from the El Reno Mesonet station [http://ticker.mesonet.org/select.php?mo505&da527&yr52011]. As the tornado passed, the atmospheric pressure decreased 17 hPa at 2120 UTC … while the 1-min-average ensemble of pressure traces from the lowest model scalar level (125 m AGL) to the surface using the hydrostatic equation and the surface pressure in the initial sounding (947 hPa). wind speed at 10 m AGL increased from 10 to 51 ms21 … We derived simulated near-surface observations for comparison by interpolating the model variables at 125 m AGL from the grid points closest to the El Reno Mesonet station …. We adjusted the simulated ensemble of pressure traces from the lowest model scalar level (125 m AGL) to the surface using the hydrostatic equation and the surface pressure in the initial sounding (947 hPa).
Experimentation (Verburg et al. 2005: 1721–1722) Starting on 10 February, 2002, temperatures inside the EcoCELLs were maintained based on 8-year (1993–2000), 5-min averages from a MESONET station (Brock et al., 1995) less than 1.6 km from the excavation site [in Purcell, Oklahoma] … Using a heating and cooling underneath the monoliths, soil temperatures in the deepest horizons (1.45 m depth) were kept as close as possible to the 1993–2000 mean annual ambient air temperature (16 1C) measured at the excavation site to simulate realistic soil temperature profiles … Precipitation was applied using an overhead rain simulator with a fixed intensity. We based frequency and amounts on measurements from the MESONET site (same 8-year period used for temperature) with precipitation at each watering applied at the mean monthly total divided by the mean frequency for that month.
Observation (Tanner et al. 2015: 477–478) Our objectives were to determine if sympatric populations of bobwhite and scaled quail respond behaviorally to artificial surface-water sources in a semiarid region at the species distribution extremes… Over the course of our study (2012–2014), average temperatures in summer ranged from 19.56–22.28, 25.72–27.22, and 26.78–30.06 °C during May, June, and July, respectively … Average temperatures in the winter ranged from –0.83 to 2.17, 1.28–1.33, and –0.33 to 2.39 °C during December, January, and February, respectively … Annual precipitation was 34.44, 50.29, and 39.42 cm in 2012, 2013, and 2014, respectively…. Climate data were obtained from the Beaver Mesonet station (Brock et al., 1995; McPherson et al., 2007).
Simulation (Stadler, Dryden & Greene 2015: 210–211) Wind measurements at individual wind farm turbines would have been ideal for use in this research, but wind farm owners are notoriously protective of such data, which could be used by competitors to plot the efficiency of performance. Therefore, we used proxy data from the Oklahoma Mesonetwork. The Oklahoma Mesonetwork (Mesonet) is a network of 120 automated meteorological observation stations across all 77 counties in Oklahoma [36]. A full suite of data is reported and quality-controlled for five-minute time increments. Wind was measured at the standard 10-m height by the RM Young wind monitor with an accuracy of ±0.3 m s–1 [37]. Each wind farm in Oklahoma is within 20 kilometers of a Mesonet station. We chose two wind farms in the western Oklahoma within areas of the most dramatic projected wind velocity changes in the NARCCAP output. Each wind farm was paired with the closest Mesonet station to represent that wind farm with a year of five-minute data (Figure 3). The Centennial wind farm in the northern portion of the study domain was paired with the Buffalo Mesonet station, and the Weatherford Wind Energy Center and the Weatherford Mesonet station were paired further south.
Utilization (Kenkel & Norris 1995: 365) The two objectives of the study were to estimate average willingness to pay and to determine the characteristics of producers who would pay to access and use mesoscale weather information. To this end, two maximum likelihood models were estimated: one for the raw [Oklahoma] Mesonet data and one for the raw data/value-added information combination …. Variables representing payments for agricultural publications, full- versus part-time farming, gross sales, use of irrigation, and weather-related crop income losses were found to significantly impact the willingness to pay for raw mesoscale weather data.
Validation (Bajgain et al. 2015: 153) Daily precipitation and soil moisture data from 2000–2013 at the Oklahoma Mesonet Marena and El Reno stations were downloaded from the Oklahoma Mesonet website (www.mesonet.org/index.php/weather/daily_data_retrieval). The daily data were aggregated into 8-day periods to match with the temporal resolution of the Moderate-Resolution Imaging Spectroradiometer (MODIS) derived VIs. Three different soil moisture data products (soil water potential, fractional water index and volumetric water content) are available at the Mesonet website. These soil moisture data products were derived based on the calibrated change in soil temperature over time after a heat pulse is introduced (Illston et al., 2008). In our analysis, we used volumetric soil water content (SWC) collected at three different soil profiles (5, 25, and 60 cm depth.)

As shown in Table 3, an increasing variety of applications were made of the data as its availability and utility became more widely known among different research communities.

Table 3

Typology of Oklahoma Mesonet data uses.

Use Typology 1995 2005 2015 TOTAL

Assimilation 0 7 25 32
Experimentation 3 4 20 27
Observation 0 3 8 11
Simulation 0 1 15 16
Utilization 2 0 0 2
Validation 6 6 10 22
TOTAL 11 21 78 110

5. Conclusion

This pilot study of 110 papers from various scientific disciplines using Oklahoma Mesonet data over the past 20 years demonstrates the value of understanding the diversity of uses possible for a robust, rigorously maintained, easily available source of current and historical environmental data that can be used for experimentation, explanation, and prediction in a wide variety of disciplines. The pilot study shows that the Mesonet data are utilized in multiple scientific communities around the world, not only by the region’s agricultural and meteorological researchers, as was originally anticipated by the Mesonet’s founders and funders, and can potentially contribute to a much broader range of endeavors, making it even more essential. The development of an initial typology of Mesonet data uses is an important step in that direction, as it shows that these uses seem to be changing over time.

Besides the obvious value of Oklahoma Mesonet data for forecasting purposes, as well as in current meteorological and hydrological uses, the increasing historical value of the Mesonet data has become apparent, as shown by the numerous climate-related papers in the corpus. The recent implementation of Oklahoma Mesonet DOIs was done with the intent of making it easier to facilitate research uses in future and to increase stakeholder awareness of their importance. However, the current growth in data simulation and assimilation studies also makes evident that any single infrastructure such as the Mesonet that routinely supplies data for these rapidly evolving, data-intensive computerized models should be cognizant of the other stakeholders involved in such efforts, and attempt to work together on mutually beneficial outcomes, just as the researchers themselves do.

Efforts to correctly attribute assimilated data, as discussed by Hou & Mayernik (2013, 2017) in reference to climate data models, and by Tilmes, Yesha & Halem (2010) in reference to data transfer and transformations throughout the earth sciences in general, will likely become even more critical to data providers and other stakeholders. This realization has new implications for the Mesonet in terms of both its administrative and advocacy goals. The results of the pilot study, therefore, have already been of practical value, and an incentive toward further investigation of the past, present and potential uses of the Oklahoma Mesonet sensor data.

In an extensive review of research efforts similar to the Oklahoma Mesonet pilot project described here, Mayernik et al. (2017: 1354) have suggested that practical, in-depth studies of specific ‘research infrastructures’ that routinely provide significant research assets, such as datasets and datastreams, also offer unique opportunities to contribute to theoretical knowledge of citation purposes in general, though this is seldom an explicit purpose of such studies. They note that data citations and acknowledgments “do not fall cleanly into either the traditional scholarly ‘citation as reward’ or ‘citation as persuasion’ categories but may represent a mix of both. Citing a data source may thus serve to recognize the contribution of the data provider and also to illustrate ‘how data of sufficiently high visibility or quality underlie a particular scientific result.’ As an example, they offer several analyses of successive ‘release articles’ related to a specific oceanographic and atmospheric data set which showed that roughly 80% of the citations to the release articles were data-usage related, which may indicate that certain published articles can be highly cited as proxies for data sets.

Although the Oklahoma Mesonet pilot study was not designed to focus on the Brock et al. 1995 article in this way, the findings do suggest that it may serve not only as a proxy for the data source, but also as an indication of the high quality of the data source. The key role of quality assurance in sensor data research, as shown by the continued references to this article and to other Mesonet articles that provide the technical and instrumental sensor specifications for data collection and communication, may be viewed as implicit signals of the quality of the data source and, accordingly, to that of the datastreams and to the data used in each particular study. A future study focused on this ‘signal’ aspect of sensor data citation is now being planned, with special attention to its possible significance and the various channels by which such ‘signal papers’ may be diffused.

Another planned study involving the preliminary typology of data uses derived from the articles in the pilot study is its further use and potential modification during the analysis of all journal articles in the full corpus. A comparison of these uses to those described in peer-reviewed journal publications using sensor data from the growing number of similar mesonetworks in other states, offering an opportunity for more collaboration with other researchers interested in this type of study, is also being planned.

In conclusion, as the Oklahoma Mesonet can provide a uniquely valuable source of environmental sensor data to those who study a wide range of weather-related phenomena, it can also provide an invaluable source of information to those researchers interested in how that data, its uses, and its users may be studied.