1. INTRODUCTION

Data has become the new currency of both the global economy and the scholarly community (). Thus, scientific, research, and scholarly communities around the globe are endeavoring for sound research data management and sharing practices. Many funders, such as the Arts and Humanities Research Council (AHRC), the Biotechnology and Biological Sciences Research Council (BBSRC), the Engineering and Physical Sciences Research Council (EPSRC), the European Commission (EC), the National Science Foundation (NSF), the United States Geological Survey (USGS), and the Australian Research Council (ARC), among others, are requiring data management plans for proposals to conduct sponsored research (). Large Data centers, like the Finnish Social Science Data Archives, the Australian National Data Service (ANDS), and the United Kingdom (UK) Data service have accepted responsibilities for national long-term data preservation. Universities and libraries offer training opportunities to improve the skills, knowledge, and capabilities of those who are accepting responsibilities for data stewardship (; ; ). The Consultative Committee for Space Data Systems (), the Digital Curation Centre (DCC) (), DataOne (), OpenAIRE (), and many other international organizations have established guidelines for managing research data and the systems for hosting the data. These and many other government agencies, research institutions, and other stakeholder entities of the research enterprise have developed holistic data policies to address the recognized need to improve capabilities for responsible open data access and stewardship (; ; ).

Recognizing that, in addition to published research articles, data represent first-class research deliverables, progress is materializing. Publishers of scientific journals, within various disciplines, are establishing data policies to facilitate the reproducibility of data that have been used for studies reported in their scholarly publications (; ). Furthermore, several instruments have been developed for evaluating the maturity of research data and for assessing the trustworthiness of the repositories where research data are housed (). The Findable, Accessible, Interoperable, and Reusable (FAIR) principles for data, initially proposed in 2014, have been widely adopted (; ; ; ; ). And, the importance of Transparency, Responsibility, User focus, Sustainability, and Technology (TRUST) for the stewardship of research data also has been recognized (). These efforts to improve capabilities for open data access and stewardship contribute to responsible open research data policies and practices, internationally. Taken together, this progress reflects an international movement for ensuring continuing accessibility and usability of open data products, services, and research-related information that is produced by research and scientific endeavors.

Like many of the countries involved in the international efforts to improve capabilities for open research data sharing and stewardship, China, one of the most productive scientific countries, plays a key role in improving data policies and practices (; ; ). According to the National Science and Technology Infrastructure Center (NSTI) of the China Ministry of Science and Technology (MOST) (), original research data generation in China had reached 83.72PB by the end of 2017. Information about the evolution of data policies and corresponding practices in China can improve understanding about current progress and additional opportunities for research data management and sharing, internationally.

Here, the ecology perspective is leveraged to provide a systematic review of the general progress of open research data in China. As is frequently used in information ecology (; ), data ecology borrows the concept from biological science and studies the environment, and the relationships among organisms within and across ecosystems constituted by data, people, technologies, and their interactions, as well as other intersectional aspects, like platforms, work and value (; ; ). These potential components form a complexity of dynamic equilibrium and could be organized into three general components, including context, content, and driving forces (see Figure 1). Such an open data ecology also emphasizes open service trends within different ecosystems.

Figure 1 

Potential components within an open data ecology and its ecosystem.

Furthermore, we envision that improvements in data policies and data practices can serve as a threshold from context and content aspects into the current state of data ecology, as well as its prospects for the future. Taken together, the general data ecology analysis encompasses data policy (content), data practice (context) and people (driving forces), which includes policy-makers and research sponsors, as well as the data producers, end-users, data stewards, and others whose work is supported by the sponsors. Among the different elements in the ecology of data, the needs of the people are the driving forces, especially when an ideal and harmonious data ecology is still emerging.

The design and methodology of this paper includes the following elements:

  • The Literature review is conducted within the China National Knowledge Infrastructure (CNKI) (www.cnki.net), which is one of the largest online platforms for Chinese journals, dissertations, standards, and patents, along with reference databases for Chinese polices. Further governmental portals are also explored for particular content-level policy studies.
  • Field observation has been carried out by performing analysis and participating in data program planning, development, and operations activities and related efforts, as part of the authors’ work in the Computer Network Information Center (CNIC), Chinese Academy of Sciences (CAS). CNIC is the CAS-level leading institution for general data infrastructure and various data services and has been managing data programs sponsored by major funding agencies in China for years. Thus, members of this community have gained in-depth understanding and insight into the development of data policies and data practices in China. Typical open research data practice examples are selected from research programs, data repositories, data journals, and citizen science. Furthermore, the top four exemplars, representing salient initiatives in China, from both policy and funding perspectives, are identified and their endeavors to foster open data have been introduced.
  • The conceptual model for open research data ecology has been framed and future work for a better world of open data also has been discussed.

2. EVOLVING DATA POLICY ENVIRONMENT

2.1 NATIONAL LEVEL LEGISLATIONS

Data policy should not only simplify the path towards “effective research data stewardship and infrastructure development” (), but also help maximize the data benefits through data sharing (). Table 1 provides an overview of the current state-level rules governing open research data in China.

Table 1

Key legislations guiding open research data in China.


LEGISLATIONSISSUED BY

Law of the People’s Republic of China on Science and Technology Progress (2008 amended)
Standing Committee of the National People’s Congress (SCNPC, P.R.C.)
Copyright Law of the People’s Republic of China (2010 amended)
Law of the People’s Republic of China on Promoting the Transformation of Scientific and Technological Achievements (2015 amended)
Cybersecurity Law of the People’s Republic of China (2017)

Measures for Managing Scientific Data (2018)General Office of the State Council, P.R.C.

According to Table 1, generally, the “Law of the People’s Republic of China on Science and Technology Progress” (2008) established the fundamental rules for research data stewardship, stating that, “The Science and Technology Administrative Department of the State Council shall, in conjunction with the relevant competent departments of the State Council, establish information systems for scientific and technological resources, such as S&T research bases, scientific instruments and scholarly literature, S&T data and natural resources…and should release the distribution and usages of all the research sources as well”. That law also addresses the disclosure of governmental information since it is one of the largest contributors of open research data in China.

Other rules also affect research data stewardship by providing provisions on cybersecurity and intellectual property, as well as those governing particular elements, such as research outcomes. In particular, the “Data Security Law of the People’s Republic of China (Draft Version)” has been released during June 2020 for public comment. Like the practice of General Data Protection Regulation (GDPR) in Europe, this law aims to ensure the flow of data for the protection of data rights within a safe environment, nationwide. Key measures include clarity and implementation of data security protection obligations for different stakeholders, promotion of various data flow in sound manners, and guidance for institutional measures to ensure the safety of governmental data sharing.

By contrast, “Measures for Managing Scientific Data”, brings research data management and sharing to a new phase in which general rules covering comprehensive data aspects have been set up for implementation, including enhanced entire lifecycle data management, strengthened capability for data assets protection, sustained open data ecology supported by continuing funding for specific data programs, clear data property rights, and responsibility for long-term data stewardship. In effect, many institutions are revising their implementation guidelines and plans accordingly. For example, CAS plans to initiate several steps to complete this implementation, including the development of CAS level data policies following national laws, construction of strengthened CAS data centers and mature data infrastructure, exploration of innovative open data models, and improvements in fair incentive mechanisms for multiple stakeholders, etc. (). By the end of 2019, fifteen provincial-level administrative regions, including Anhui, Chongqing, Gansu, Guangxi Autonomous Region, Hainan, Heilongjiang, Hubei, Inner Mongolia Autonomous Region, Jiangsu, Jilin, Shaanxi, Shandong, Sichuan, Tianjin, Xinjiang, and Yunnan, have published regional rules to govern scientific data in line with the national-level law. And more steps, in accordance with the “Measures for Managing Scientific Data” (2018), also are being planned for completion.

2.2 INSTITUTIONAL DATA POLICIES

In addition to these national laws, administrative institutions have produced more guidelines for sound data stewardship and these call for open data and sharing of other research resources generated from public funded research programs.

Government information, serving as one of the major sources for data generation and sharing, is governed by the “Regulations of the People’s Republic of China on Disclosure of Government Information”. Also, the document, “Interim Measures for the Management and Sharing of Government Information resources”, provides guidance for electronic records, which helps to facilitate the governance of scientific data from an administrative perspective.

The Ministry of Science and Technology in China (MOST) provides another research data governance example. Early in 2001, the “Interim Provisions on the Administration of National S&T Plans” states the responsibilities of the MOST to establish databases, archiving systems, and rules for the preservation, usage, and sharing of data and related information. Following up with rules for program management, the MOST also points out that “All national science and technology plans should establish compatible databases to share information and data resources”. In particular, as a key component of research outcomes, scientific data is highlighted for deposit into archives and research facilities that provide guidance to ensure their integrity, completeness, and accuracy (MOST, 2003). In effect, the MOST becomes the motivator for initializing and implementing first state-level guidance on “Measures for Managing Scientific Data” (2018). In line with these efforts, the Chinese Academy of Sciences (2019), The Chinese Academy of Agricultural Sciences (2019) and others also published their institutional open data rules based on a decade of data work experience, in which, mechanisms, responsibilities, workflow, and plan for data center development are underlined accordingly.

2.3 DISCIPLINARY DATA POLICIES

Data has not been emphasized enough in every field to be regulated by specific data rules, but there are some data-intensive research fields taking the lead in developing data policies. For example, searching the China Legal Knowledge Database (CLKD) retrieved 254 excerpts of disciplinary policies that were entitled with “data”. Shown in Table 2 are several selected disciplinary research data policy examples covering geoscience, medical and health sciences, meteorology, ocean science, seismology and space science.

Table 2

Examples of disciplinary rules governing research data sharing in China.


SUBJECTSPOLICYMAKER(S) (EFFECTIVE SINCE): POLICY TITLE

GeoscienceState Oceanic Administration, PRC (2018): Measures for Managing Polar Expedition Data of China

GeoscienceMinistry of Natural Resources, PRC (Former Ministry of Land and Resources, PRC) (2010): Interim Measures for Managing Land and Resources Data

Medical and health sciencesNational Health Commission, PRC (2018): Interim Provisions on National Health Care Big Data Standards, Safety and Service Management

MeteorologyChina Meteorological Administration (2018): Fengyun Meteorological Satellite Data Management Measures (for Trial Implementation)

MeteorologyChina Meteorological Administration (2017): Measures for Meteorological Data Exchange and Management

Ocean scienceChina Oceanic Information Network (2015): Data Sharing and Service Procedures in Marine Ecological Environment Monitoring (for Trial Implementation)

SeismologyChina Earthquake Administration (2006): Measures for Managing Seismological Science Data

Space scienceState Administration of Science, Technology and Industry for National Defense, PRC; National Development and Reform Commission, PRC; Ministry of finance, PRC (2018): Interim Measures for the Management of National Civil Satellite Remote Sensing Data

Space ScienceState Administration of Science, Technology and Industry for National Defense, PRC; China National Space Administration (2016): Measures for Managing Scientific Data in Lunar and Deep Space Exploration

3. PROACTIVE OPEN DATA PRACTICES

Data practices often reflect policies and also demonstrate how such policies have been implemented. Based on their importance and popularity, exemplars for four different types of open data practices are described, below, and include open data in research programs, repositories, data journals, and citizen science.

3.1 IN RESEARCH PROGRAMS

According to the 13th Five-year science plan, there are five types of major projects, and among those, three major research projects have policies governing data and information. The other two are talents and enterprise innovation-related grants, in which we suppose more flexibility is left due to the complexity of such work as well as priorities for Intellectual Property protection.

Shown in Table 3, data responsibilities, data infrastructures (i.e. data platforms and databases), data curation and sharing, are stressed within these three program provisions. Moreover, rules for sharing of national major scientific research infrastructures and large-scale scientific research instruments also strengthen the capability of data production and broaden the scope of data sharing by reducing investment costs and increasing the efficiency of facility usage (Table 3).

Table 3

Typical cases of program-level rules governing research data sharing in China.


POLICYMAKER(S) (ENACTED SINCE): POLICY TITLECONTENTS

NSFC (2015): Measures for the Management of Research Outputs in Projects Funded by the National Natural Science Foundation of China
  • PI should collect and reserve the original records with sound manners and submit them to the supporting unit as required, whilst also ensuring the accuracy, completeness and consistency of the scientific data within the report for research outputs…. (Art.9)
  • NSFC shall set up platform(s) for the exchanges of research outputs and ensure the persistency, completeness and openness of research resources. Relying units should share valuable research outputs (i.e. databases, samples, research facilities) gratis or no gratis. (Art.13)

MOST&MOF (2017): Interim Measures for the Administration of National Key R&D Programs
  • Research information disclosure (Art.4).
  • Establishment and implementation of Public Service Platform for National Science and Technology Information System (Art.5)
  • Assign responsibilities of project outcomes application and information sharing to MOST, Program management institutions, PI affiliated institutions et al. (Art. 8/11/12/39)

MOST, NDRC, MOF (2017): Measures for Managing National S &T Major Projects
  • Establishment of an information management platform and sub-branches for major special programs and merged into the National Science and Technology Management Information System (Art.53).
  • Detail definition of “Information contents” in which all research workflow is included (Art.54) together with information security rules (Art.59).

MOST, NDRC, MOF (2017): Administrative Measures for the Opening and Sharing of National Major Scientific Research Infrastructures and Large-scale Scientific Research Instruments
  • Scientific research facilities and equipment should be open to the public …except as otherwise provided by law (Chap.1, Art.5).
  • Responsibilities for managing units to take (i.e. MOST, MOF) to share research facilities as well as to provide guidance for assessment, awards & punishments are addressed in detail (Chap.2 & 4).

Furthermore, especially after the release of the national-level rules, “Measures for Managing Scientific Data” (2018), the “data submission agreement” is compulsory for MOST grant programs (). This requirement can be traced back to early 2016 () and the main purpose is to reach legal agreements between the funding agency (MOST), individual researchers, and their affiliated institutions to guarantee complete and on-time data capture and submission to the MOST platforms during the research and data sharing that commences after an embargo period. Such conditions are considered mandatory requirements during proposal review and award processes.

3.2 IN DATA REPOSITORIES

Data repositories, such as data centers and archives, often provide users with integrated data platforms that offer data curation capabilities to enable efficient data publication. Data services can include in-depth quality control as well as data sharing. Some data repositories also provide linkage between datasets, data papers, and publications (). Many data repositories specialize in providing services to specific disciplines, such as the geoscience data repository, GSCloud (www.gscloud.cn), omics data center, GSA (bigd.big.ac.cn/gsa/), and institutional repositories (i.e., Peking University Open Research Data Platform, opendata.pku.edu.cn), as well as facilities serving the general public (i.e. ScienceDB, www.sciencedb.cn).

3.3 IN DATA JOURNALS

In addition to data published as supplements to scholarly journals, data journals, which jointly publish data papers with datasets, have become popular. China Scientific Data (www.csdata.org), which was established in 2015 and began releasing data in 2016, and Global Change Research Data Publishing & Repository (www.geodoi.ac.cn), which established online services in 2014 and began publishing in 2017, take the lead in promoting FAIR data by publishing data papers and datasets. Until June 2020, China Scientific Data has shared over 224 data papers and datasets, from across sundry disciplines, with around 300,000 page views yearly. Among the represented disciplines, geoscience and biology data sharing rank highly. In addition, Global Change Research Data Publishing & Repository (GCRDP) has published 267 GB of datasets online with over 245,000 data files downloaded cumulatively. Distribution of disciplinary data articles in the two journals is listed in Figure 2. Furthermore, GigaScience also publishes datasets with joint efforts from GigaDB (). Also, Big Earth Data includes geoscience big data publishing as part of their scope ().

Figure 2 

Disciplinary distribution of data articles in CSTDATA & GCdataPR.17,18

In addition, the implementation of norms for data also guarantees and facilitates the reuse of data. According to the CNKI standard database, there are over 700 data-related works of national-level and disciplinary standards, which contribute to the quality of data throughout the data life cycle. Such data norms include quality control for metadata and data (i.e. “Information Technology – Big Data – Terminology GB/T 35295-2017”), methods for data security (i.e. “Information Security Technology – Personal Information Security Specification GB/T 35273-2017”), data processing, exchange & communication (i.e. “Technical Specification for Environmental Thematic Spatial Data Processing HJ 927-2017”, “Specification for Drafting Basic Dataset of Ecology and Environment Information HJ 966-2018”), data sharing (i.e. “Information Technology Big Data Governmental Data Sharing Part 1: General Provisions GB/T 38664.1-2020”), data metrics and evaluation (; ), as well as information systems (; ) and information technology (i.e. “Information Technology – Big Data – Technical Reference Model GB/T 35589-2017”) for data.

3.4 IN CITIZEN SCIENCE

Citizen science techniques offer another way of collecting open data, leveraging multiple contributions to capture and analyze data. This approach is becoming popular for researchers who study daily life and have access to data collection communities, such as Birdnet (www.birdnet.cn) and the Chinese Field Herbarium (CFH, www.cfh.ac.cn). Records in the “Database of Cetacean Stranding Records around Hainan Island” are also partly contributed by volunteers (). However, data exchanges are not enough, as some citizen scientists go even further. For example, some contribute to the exploration of new species or participate in a non-governmental organization (NGO) for more citizen science data collection and analysis opportunities ().

4. RESPONSIBLE DRIVING FORCES

Understanding the driving forces behind data policy efforts offers insight into the intentions and objectives of data policy initiatives (). It appears that key national data policymakers in China include, but may not be limited to the MOST, the Chinese Academy of Sciences (CAS), the China Association for Science and Technology, and the National Natural Science Foundation of China (; .). These agencies have promoted improvements in data practices to facilitate the implementation of data policies ().

4.1 MINISTRY OF SCIENCE AND TECHNOLOGY (MOST)

As is illustrated in the discussion of institutional data policies, being one of the major funding agencies for research, the MOST takes the lead in pushing open data across domains. Since its initial efforts in 2001, the MOST began supporting the NSTI program with the initial establishment of 13 scientific data centers covering agriculture, forestry, seismicity, meteorology, marine science, Earth systems, population and health, biology, chemistry, materials, and energy, as well as others. In recent years, the funding mechanism has evolved from pre-funding awards to subsidies for further development of selected data portals to foster data sustainability (). Design, establishment, and evaluation of national scientific data centers is another major contribution enhancing the generic service capability of national data infrastructures facing open data and open science currently ().

4.2 CHINESE ACADEMY OF SCIENCES (CAS)

As one of the most important and largest research institutions in China, CAS takes the lead in promoting the production of research data, contributing substantially to the efforts of the science community in different disciplines (i.e. ; ). Among all of the CAS programs, the Scientific Database Program (SDP) has been focusing on research data generation, curation, and sharing as its primary goals since 2006 ().

Before 2000, data was disseminated offline, near-line, and, to some extent, online. The SDP program mainly focused on expanding the scale of research data and curation to improve data management and data sharing capabilities (). Subsequently, during the next five years, data sharing capabilities were emphasized as online data services were developed further (). Then, data-sharing efforts expanded considerably through 2010, as the volume of research data increased and the data grid was employed to integrate data from different sources to foster new scientific discoveries (; ). From 2011 to 2015, the adoption of the data cloud has provided an opportunity for attaining flexible, but more robust data infrastructures, and also for supporting value-added data analysis (). By the end of 2015, multidisciplinary scientific data volume had reached 655 TB with over 96,290,000 unique visits and 456 TB downloads in total (). The following “Big Data Engineering” program continues the trend of open data based on consolidating engineering construction (). And the “Measures of Managing Scientific Data” further clarify the general duty to provide open, public-funded data by default. These actions leave no doubt that open data shall prevail across domains and regions in accordance with the open science paradigm. The evolving history of SDP is depicted in Figure 3.

Figure 3 

Data management and sharing in CAS Scientific Database Program.

4.3 NATIONAL NATURAL SCIENCE FOUNDATION OF CHINA (NSFC)

As one of the vital funding agencies in China, the NSFC supports the sharing of research outcomes, including data. In 2014, NSFC jointly announced, with CAS, open-access rules for all their programs, providing direction to release articles as open access, with an embargo of no more than 12 months after publishing (). Moreover, during the last 30 years, data-intensive research has gained notable support from the NSFC through approximately 6,000 different programs. With up to 3.58 billion yuan in total, the annual tendency for data-related grants is depicted in Figure 4. The NSFC also supports data stewardship activities as a necessary part of other types of research. Incidentally, the NSFC has been merged into the MOST in 2018 (; ), but will continue to be one of the major funding agencies, especially for fundamental research.

Figure 4 

Tendency of Yearly NSFC Funding on Data Topics.22

4.4 CHINA ASSOCIATION FOR SCIENCE AND TECHNOLOGY (CAST)

CAST is the largest society of science and technology (S&T) professionals that operates as a non-governmental organization in China. The focus of CAST includes various topics, like database sharing and data exchanges. In the typical project, titled “Discipline development in CAST member societies”, data have been recognized as the main source of research outcomes, and sharing of databases for scientific research, provided by over 200 national-level academic societies, has been highlighted in particular (.).

Moreover, other driving forces include international organizations in China, such as CODATA China, and the World Data System (WDS). These international organizations and their members promote regional data exchanges through comprehensive rules, showcases, trainings, and workshops, as well as through other innovative ways to facilitate lifecycle data sharing and communication. Furthermore, local administrative departments, research associations and their sub-branches also serve as stakeholders, contributing to the driving forces within the data ecosystem. While the importance of people and the research community is implied in the discussion of such organizational stakeholders, it is vital to emphasize that data producers, data stewards, and data users are the stakeholders whose efforts and needs are of paramount importance and necessary for the development and evolution of an ecosystem for curating and sharing open data.

5 LANDSCAPE OF OPEN RESEARCH DATA ECOLOGY IN CHINA

In China, policymakers within national, regional, and local levels develop and disseminate data policies and promote recommended implementation practices, while funding agencies provide support to improve open data practices via various data programs, which often will request for additional funding. Along with the research community, China policymakers and funding agencies have served as catalysts for much of the evolution of open data practices in China and their decisions guide the mainstream of the data ecology. Therefore, we pinpoint them as the driving forces who initialize the open data cycle in data ecology.

In practice, data policies serve as guidance for developing and operating data programs; the latter provides data curation and sharing experiences and exposes problematic issues to be considered for revising data policies (; ). Furthermore, open research data practices offer insight into the adoption of open data policies by providing feedback about enforcement of those policies. Data programs support data practices with direct funding and strict program guidance, including data requirements. Moreover, we can take policy, program, and practice together as a compound of context and contents for open data ecology, since they are usually reflected through entity actions simultaneously. In effect, certain open data practices simultaneously decode the policy constraints and program support to afford progress.

Essentially, driven by the initiatives of policymakers and funding agencies, the three primary elements that contribute to the open data ecology in China are data policies, data programs, and open research data practices (). These interactions are depicted in Figure 5.

Figure 5 

Landscape of open research data ecology in China.

Comprehensive data policies employ a bottom-up approach driven by the rapid growth of research data scales and the recognition of community needs. Some provinces (i.e. Shandong, Henan, Hainan, Guizhou, etc.) have taken the lead to set up independent bureaus for data-driven city governance. Moreover, data programs serve as effective data engines to support data production with grants. Such programs also emphasize issues about data sharing and data management, encouraging clarification to differentiate between restricted and open data, and stimulating the development of strategies for managing sensitive data to address issues of privacy and public safety. Data policies are being developed to guide such data practices and are expected to be implemented fully as data practices evolve over time. Although the ecosystem for data policy is just beginning to flourish, the flexible approach of many data programs in China is fostering the acquisition of research resources and raises awareness of the need to continually improve scientific data management and sharing practices.

6 DISCUSSION AND CONCLUSION

In essence, the formation of an ideal and harmonious data ecology is under construction in China. Data policies have evolved from constitutions of general legislation to independent rules focusing mainly on publicly-funded data governance. Also, like the evolution of data sharing practices internationally (), the focus in China has shifted from primarily data management to both data governance and data sharing throughout the entire scientific data lifecycle. Furthermore, like the increasing acceptance of data sharing practices among researchers internationally (), we can observe that the gap between positive attitudes of sharing data and not-so-active data sharing behaviors in China is shrinking (; ). Relatedly, as improvements to data management and data sharing policies are occurring in the United States to face the complexities of confidentiality, privacy, and intellectual property concerns (), China also spares no efforts in the governance of those areas, as analyzed earlier in this paper. These improvements in open data sharing policies and practices in China also appear to be consistent with observed trends in open knowledge practices in China (). Stronger participation in open access practices among research communities (), as well as regular and close collaboration (), should be encouraged across disciplines, including the natural sciences and social sciences. While challenges remain for scientific data management and data sharing in China and internationally, cultural norms for data sharing appear to be improving and these may work together with data policies and data practices to create a friendly data culture that also nourishes the whole society ().

Moreover, as global public goods (), the use of research data is embracing the grand human and societal challenges and the sharing of data has become a vital part of research collaboration. Based on the review of evolving data policies and data practices in China, we see the future of open data progressing across the international realm in the following manner:

  • Open data is being adopted into mainstream practices, but faces many challenges in various sectors. The outbreak of COVID-19 presents a compelling case for the adoption of data sharing, globally: the leading role of the World Health Organization (WHO); the re-affirmation to share data during COVID-19 by almost 160 key stakeholders, internationally, including government agencies, research institutions, publishers, universities, and companies (); the emergence of ongoing international efforts like the “RDA COVID-19 Recommendations and Guidelines on Data Sharing” (), the 2019-nCOV related genomic databases released from Chinese Academy of Sciences (), and the cases and testing data collection shared by the Johns Hopkins Coronavirus Resource Center and many other institutions around the world. These and other endeavors that have been initiated and the data that is shared demonstrate how we have united for the sake of the global village. However, there are still obstacles that must be addressed, such as privacy, security, intellectual property, and other ethical concerns. Therefore, more innovation is needed to maintain a balance between closed and open data.
  • Stable and robust open data ecology calls for diversity. As we know, diversity can contribute to the stability and resilience of an ecosystem. And, when considering an open data ecology, diversity may refer to: (1) Involvement of diversified stakeholders. As illustrated partly in Figure 4, stakeholders within an open data ecology may include driving forces (such as policymakers, funding agencies), affiliated institution(s), individual researchers, as well as publishers, and research associations, etc. Participation of different stakeholders jointly contributes to the ecological balance through optimal allocation of resources throughout the entire data lifecycle. (2) Multi-factor open data governance. Impelling data programs and data policies, together with various data practices, all serve as compulsory factors in an open data ecology. And effective operation of the open data ecosystem may require an intermediate approach in which there’s top-down policies of regional and national-level guidance, but also bottom-up exemplars to follow. The routine of following general data practices when developing data policy might be cost-effective, but some data policy development could take the lead, if necessary (). (3) Various open data models. Diversity also could mean that there’s no fixed-up way to facilitate the sharing and use of open data, but better exploration of information technology may help. For instance, the application of blockchain technology (; ) bypasses access to sensitive datasets, and shares models, algorithms, and results online without the need to migrate local data, which can make sharing sensitive data possible without violating complex data protection rules.
  • Under the umbrella of open science, open data also needs to be curated through efficient data facilities. Such facilities should be networked across international research efforts, interoperable between platforms, and integrated to leverage different research resources. The European Open Science Cloud, the United States National Research Cloud, the Chinese Science & Technology Cloud, and the African Open Science Platform, as well as others, are current examples. Such data infrastructures serve as integrated platforms that foster the implementation of data policy and technical support of data programs, while also serving as consolidated bases for various data practices. But that’s not enough. If we envision a future with open data at larger scales and better linkage across regions and domains, then we need improved data infrastructure to guarantee capabilities for connecting sundry research resources together. The idea of building up a global open science cloud is still at an early stage with active participants from the CNIC, CAS, EGI Foundation, and their African partners as well as other partners around the world.
  • According to our field observations, fairness, trust, and sustainability are essential ingredients that involve many interdependent components within an open data ecology. Fairness is necessary for data policy-making and program design. Likewise, for data practices, fairness provides general principles for facilitating the use of open data, as previously mentioned in terms of FAIR capabilities. TRUST also is necessary for implementing capabilities to adopt data policies and programs while ensuring the continuing quality of data for reuse and protecting the legal rights of contributors. Such efforts also can improve trust between data producers and data re-users. Often, data programs survive on trust, especially when data policies and programs are under development. With trust, successful data transactions are possible. Providing such systems sustainably also requires efficiency, while offering incentives and metrics for assessing data policies and programs, along with the business models that have been adopted to foster enduring data practices.