1. Introduction

The first decade of the 21st century has seen increasing support for open access to research data, whereby more governments, research funding agencies and research institutions have affirmed open access principles for publicly funded research data (). Between 2010 up to now, numerous principles/policies/guidelines on open access to research data have been developed in various countries such as Australia, Canada, the EU, the UK and the USA. Open access to research data can be seen as part of the broader access to knowledge movement (A2K) which advocates the distribution of educational, intellectual, scientific, creative and innovative works online through permissive licenses by the right holders (; ; ; Uhlir, 2010; ; ).

Open access to research data principles entail more than just granting access to research data with limited or no restrictions. The core of open access to research data principles aspire to make research data available for any type of reuse by any user (; Swan, 2010; ; ). Under this principle, research data is freely available on the public internet permitting any user to download, copy, analyse, reprocess, pass them to software or use them for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself (Panton Principles, 2010).

Although enabling open access to research data is technically feasible with the internet and ICT, there are social, legal and ethical issues which become barriers to open access (; ; ). A recent study made by Lämmerhirt () on open research data identifies legal and ethical issues as primary factors influencing data sharing practices. The intellectual property law, security law, information policies, institutional guidelines and contracts at the national and international levels often impede data access and sharing practices (; ; ; ; ; ). The existence of various legal impediments is deemed problematic as they restrict, obstruct, hinder and slow down the objective of enabling open access to research data.

This study focuses on the legal impediments to open access to publicly funded research data in Malaysia. For the purpose of this study, “Publicly Funded Research” means research projects using funds which are allocated wholly or partly by the government departments or agencies at any level. Next, “Research data” refers to data sets generated through research that are commonly accepted in the research community as necessary to validate research findings (OECD, 2007). These data are typically derived from experiments, measurements, simulations or fieldwork activities such as survey, case study, observations or interviews. Types of research data include: i) raw data ii) processed data in the form of texts (transcript, report), graphics (table, chart, diagrams, animations, simulations, models), numeric (equation, statistics, algorithms), images (whether fixed or moving such as pictures, photos, visual recordings) and sounds (audio recordings); iii) published data used to support scholarly publications and; iv) associated metadata (; National Academy of Sciences, 2009).

Legal impediment arises when the existence or absence of legal rights and duties have the effect of restricting, obstructing, hindering or slowing down the objective of enabling open access to publicly funded research data. Previous studies have identified 11 legal impediments to open access to research data arising from intellectual property, confidentiality, privacy, national security, patent and tort laws (; ; RECODE, 2014; ). The legal impediments which have been identified are hereby summarized in Table 1 below.

Table 1

Legal impediments to the Objective of Enabling Open Access to Research Data.

Legal ImpedimentsHow Does The Legal Impediment Arise?

1Intellectual property protection in research dataAccess to and reuse of the research data protected by intellectual property rights is restricted and subject to permission from data owner.
2Ambiguity about ownership of research dataThe ambiguity hinders data sharing/self-archiving practices/open access participations among university researchers as the researchers are unsure whether they have the right to deposit the research data in open access repository.
3Data owner’s exclusive rights in research dataA data owner who does not want to lose control over the research data may exercise their exclusive rights by refusing to release the research data in open access environment.
4The restrictive scope of the legitimate use of research dataData users are in a state of uncertainty whether their usage is within the permitted acts, preventing them from utilizing the research data deposited in open access repositories.
5Complex and lengthy licensing procedures for research dataLicensing of research data which are protected under copyright law is costly and time consuming, and is not well suited to be used in the digital environment.
6Data creator’s moral right of integrity/attributionLack of attribution discouraged data creators from sharing their research, while moral right of integrity enables data creators to prevent data users from making alteration or modification to the research data that tarnishes their honor or reputation.
7Non-disclosure duty of confidential research dataDisclosure of research data which are subject to promise of confidentiality or under non-disclosure agreement is prohibited unless the research participants can be re-contacted for permission.
8The right to informational privacy of subjects of research dataDisclosure and use of personal information against the will or consent of identified or identifiable data subjects will violate their right to informational privacy.
9Protection of national securityDisclosure of research data which is classified as prejudicial to national security is restricted.
10Novelty requirements in patent lawResearchers are required by the law to restrict, limit, delay or withhold disclosure of research data until the patent application has been filed.
11Lack of a legal duty to ensure data qualitySince open access data providers have no legal duty to ensure data quality, data users are at risks of accessing and re-using incomplete, unfit, inaccurate or erroneous research data.

A report prepared by Christian () for the International Development Research Centre suggests that in order to derive the maximum benefit arising from open access to data, the legal impediments must first be resolved. Greenleaf () also agrees that there is a need to deal with a myriad of legal issues surrounding suggests that support for open access to government funded research output. The intellectual property legal experts and scholars such as Lievesley (), Uhlir and Schroder (), Moskovkin (), Ambruster (2008), Fitzgerald & Fitzgerald () and Arzberger et al () have all argued for the legal impediments to be addressed through a set of principles and guidelines. Therefore, it is submitted that, opening up access to publicly funded research data in Malaysia requires these legal impediments to be addressed through proper instrument (). The research question to be answered in this study is: “How should the legal impediments to open access to publicly funded research data be addressed?

Review of literature found that Sveinsdottir et al () report identifies legal and ethical obstacles to open access research data and provide good practice solutions from the perspectives of a range of different stakeholders. Earlier in 2013, Guibault & Andreas addressed the legal issues when implementing an open access to research data by examining the legal requirements for different kinds of data usage in an open access infrastructure. In addition, existing intellectual property legal framework in Europe and different licence models were analysed in order to identify the licence that is best suited to the aim of open access to research data. Based on the outcomes of these analyses, some recommendations were given aimed at improving the rights situation in relation to open access to research data.

Further, the Open Access to Knowledge (OAK) Law Project Law Project has published several reports on open access which identify various legal issues surrounding open access (; ; Fitzgerald, Pappalardo & Austin, 2008). The reports also contain guidance to manage the legal rights in research outputs with respect to ownership, data sharing, access and reuse, patents, confidentiality, contract and privacy law (). However, the OAK Law Project’s guidelines were developed mostly for open access publications, rather than for open access to research data.

The guidelines that is developed by this study could be seen as an extension from the above reviewed works as this study aims to address the legal impediments identified by the previous studies by way of a guidelines. This study also makes its own original contribution in Malaysia as it develops a model guidelines on open access to publicly funded research data which can be applied by research funding agencies, universities and other research institutions in the country.

2. Methodology

Being a legal study, the research methodology is purely qualitative. Data collection was drawn mostly from secondary sources. The principles, policies and guidelines addressing the legal impediments to open access to research data were selected as data samples. Those data samples were collected from the official websites of the civil society, government bodies, research funding agencies and research institutions in Australia, Canada, the EU, the UK and the USA which support open access to research data. Altogether 24 data samples have been collected for analysis. The data samples are listed in Table 2 below.

Table 2

Principles/Policies/Guidelines Which Have Been Selected as Data Samples.

InstitutionsPrinciples/Policies/Guidelines

National Health & Medical Research Council (US)‘NHMRC Statement on Data Sharing’ (2016).
Australian Research CouncilARC National Principles of Intellectual Property Management for Publicly Funded Research 2015.
International Development Research Centre (Canada)Open Access Policy for IDRC-Funded Project Outputs 2015.
Government of CanadaPolicies and Guidelines: Research Data (2011).
Directorate-General for Research & Innovation, European CommissionH2020 Programme Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020’ ().
Government of the Republic of Slovenia,‘National Strategy Of Open Access To Scientific Publications And Research Data In Slovenia 2015–2020’ (2015).
European UnionEU Guidelines on recommended standard licences, datasets and charging for the reuse of documents (2014/C 240/01).
RECODEPolicy Guidelines For Open Access And Data Dissemination And Preservation: A Practical Guide For Developing Policies For Research Funders (2014).
Research Council of NorwayOpen Access to Research Data Policy for The Research Council of Norway
Secretary-General of the OECDOECD Principles and Guidelines for Access to Research Data from Public Funding 2007.
Natural Environment Research Council (UK)NERC Data Policy – Guidance Notes Version 2.1 (May 2016)
Biotechnology and Biological Sciences Research Council (UK)BBSRC Data Sharing Policy: Version 1.2 (March 2016 update).
Research Councils UKRCUK Guidance On Best Practice In The Management Of Research Data 2015.
Economic and Social Research Council (UK)ESRC Research Data Policy 2015.
The UK GovernmentUK Cabinet Office, ‘G8 Open Data Charter 2013’.
Science and Technology Facilities Council (STFC)STFC Scientific Data Policy 2011.
Cancer Research UKCRUK Data Sharing Guidelines 2009.
Institute of Education SciencesIES Implementation Guide for Public Access to Research Data 2016.
US Office of Science & TechnologyUS Office of Science & Technology Policy: ‘Increasing Access to the Results of Federally Funded Scientific Research’ (2013).
National Institutes of Health (US)Plan for Increasing Access to Scientific Publications and Digital Scientific Data from NIH Funded Scientific Research 2015.
Department of Veterans Affairs (US)Policy and Implementation Plan for Public Access to Scientific Publications and Digital Data from Research Funded by the Department of Veterans Affairs (2015).
University of North TexasDenton Declaration on Open Access to Research Data 2012.
The USA GovernmentUS Open Government Data Principles (OGD) 2007.
Open Knowledge Foundation Working Group on Open Data in SciencePanton Principles for Open Data in Science 2010.

Analysis of the data samples applied a positive analysis approach, which asks ‘What are the legal impediments which have been addressed?’. The positive analysis approach, requires the data samples to be critically analysed. Besides that, a normative analysis approach, which asks ‘What are the appropriate measures that ought to be adopted by the model guidelines to address the legal impediments?’, was also applied in data analysis (see ; ). The data samples were also analysed using comparative analysis method.

The scope of comparison is pertaining to measures adopted in the principles/policies/guidelines of the government bodies, research funding agencies and research institutions to address the legal impediments. The criteria in making the comparison are the similarities and differences of measures adopted to address the legal impediments (see ; ). Another criteria of comparison is the special feature or uniqueness of the measures adopted to address the legal impediments (see ).

3. Results

Analysis of the principles/policies/guidelines of the civil society, government bodies, research funding agencies and research institutions in Australia, Canada, the EU, the UK and the USA have identified various measures which have been adopted in addressing the legal impediments to open access to research data. The measures which have been identified are presented in the Table 3 below:

Table 3

Measures Adopted to Address the Legal Impediments to Open Access to Research Data.

LEGAL IMPEDIMENTSMEASURE 1MEASURE 2MEASURE 3

Intellectual property protection in research dataThe policy should set open access for research data as the default and mandatory requirement (RECODE).Research data have to be shared freely on the internet, as open as possible, accessible with as few restrictions as possible through public database or repositories (Government of the Republic of Slovenia; NHMRC).Non-proprietary research data have to be made available in a format over which no entity has exclusive control (OGD Principles).
Ambiguity about ownership of research dataOwnership will initially be vested in the employer/research institutions receiving and administering the grants (NERC; ARC).IP generated as a result of collaborative endeavours between research institutions will vest as agreed between those institutions (ARC).Research institutions must have policies relating to the ownership IP generated as a result of public funding (NHMRC).
Data owner’s exclusive rights in research dataEmbargo period to enable researchers to publish findings are between of 30 to 60 days after data collection (NIH), no longer than 12 months from the end of the grant (ESRC), maximum of two years from the end of data collection (NERC), never later than three years after the project has concluded (NERC).‘Published’ data should be made available as soon as possible (NIH), never later than at the time of publication (Research Council of Norway), at the time of publication in machine readable format (NIH), within six months of the date of the relevant publication (STFC).Data owner will be required to grant to the funder a non-exclusive licence to allow the funder to manage and supply the data for reuse (NERC).
The restrictive scope of the legitimate use of research dataData owner to grant rights to use and reuse the research data, to the widest range of users for the widest range of purposes, permitting any user to download, copy, analyse, re-process, pass them to software or use them for any other purpose (RCUK); Government of the Republic of Slovenia; EU; Research Council of Norway; UK Cabinet Office; Panton Principles; OGD Principles).The use of licenses which limit commercial reuse or limit the production of derivative works by excluding particular purposes or persons or organizations is strongly discouraged (Panton Principles).Any restrictions should be outlined in the data sharing plan and applicants should explore ways data sharing requests can be considered by the body that owns the data (CRUK).
Complex and lengthy licensing procedures for research dataThe license should be internationally recognized/worldwide, perpetual, royalty-free, irrevocable by using Creative Commons (CC) licenses (version 4.00)/CC Zero Public Domain Dedication and Licence (Research Council of Norway; EU; Panton Principles).Research data related to publication should be explicitly placed in the public domain/in the form of waiver of license (Panton Principles).Research data for which no restrictions apply should be in the public domain by using CC0 Public Domain Dedication to make a research data license-free (OGD Principles).
Author’s moral rightof integrityData users should acknowledge the sources of their data (RCUK).Data users must provide citation of the research data (Denton Declaration).Data users must give appropriate attribution/credit to the originator/proprietor of the research data (NIH).
Non-disclosure duty of confidential research dataAnonymization/Confidentiality procedures that ensure a satisfactory level of confidentiality to preserve as much data utility as possible for researchers (OECD).Researchers to develop a data management plan that protects the rights of study participants and confidentiality of the data (IES).Researchers can opt out at any stage (either before or after signing the grant) to free themselves from the obligations of open access (European Commission.
The right to informational privacy of subjects of research dataResearch data should be de-identified/redacted to strip all identifiers that would permit linkages to individual research participants and variables that could lead to deductive disclosure of the identity of individual participants (Government of Canada); IES).Depositing data in data secure access facility/data archives or enclaves/making personal data protection a contractual obligation/sign data sharing agreement before data release/used of ‘Smart Notices’ to indicate the original purpose of personal data collection (ESRC; IES; RCUK; EU).Where data cannot be stripped of identifiers, data may be exempted from the data sharing (Government of Canada). Researchers to apply for Certificates of Confidentiality to protect identifiable research information from forced disclosure (NIH).
Protection of national securityWhen open access to the data may threaten personal or national security, the datasets must not be made openly accessible (Research Council of Norway).Specific aspects of the data may need to be kept protected (Government of Canada).
Novelty requirements in patent lawThere may be a need to delay data release/sharing for a period of time, until the patent applications have been filed by the institutions or researchers (CRUK).Policies may permit delays in sharing research data for a period of time, in cases whereby institutions or researchers are applying for patents or developing new applications based on that data (Government of Canada.If the outcomes of the research result in inventions, the provisions of the Bayh-Dole Act of 1980 apply (NIH).
Lack of a legal duty to ensure data qualityThe licensor provides the information ‘as is’ and assumes no responsibility for its correctness or completeness (EU).

4. Discussions

This section interprets the findings derived from data analysis with the aim of identifying the appropriate measures to address the legal impediments to open access to research data.

4.1. Intellectual Property Protection in Research Data

From the above finding, it can be concluded that an appropriate measure to address the legal impediment arising from intellectual property protection in research data is by making open access for research data in digital format as the default (RECODE, D.5.1 Open Access as Default), mandating data owner to facilitate access to publicly funded research data for public research or other public-interest purposes (OECD, para E Protection of Intellectual Property), with as few restrictions as possible in a timely and responsible manner (RCUK, Principle 1), on the internet through publicly accessible databases or repositories (NIH), in a format over which no entity has exclusive control (OGD Principles).

4.2. Ambiguity About Ownership of Research Data

The appropriate measure to address ambiguity about ownership of research data is by requiring research institutions to have policies relating to the ownership of intellectual property generated as a result of public funding (NHMRC). The policy should clarify ownership of research data by vesting ownership of publicly funded research data in the employer of the researcher (NERC, 4(f) Intellectual Property Rights), or the research institutions receiving and administering the grants (ARC, para (c)). The policy should also clarify ownership of publicly funded research data IP generated as a result of collaborative research between research institutions (ARC, para (c)).

4.3. Data Owner’s Exclusive Rights in Research Data

The legal impediment arising from data owner’s exclusive rights in research data was addressed by the NIH and the Research Council of Norway by requiring published data to be made freely available at the time of not later than initial publication (Research Council of Norway, 3.1 The Research Council’s guidelines). This position has to be contrasted from the position of the STFC (para xi) which provides that published data should be made available within six months of the date of the relevant publication. However, STFC also provides that where there are accepted norms within a scientific field or for a specific archive they should generally be followed. The legal impediment was also addressed by imposing an embargo for the researchers to publish their research findings as provided by ESRC, NERC, NIH, RCUK and the Research Council of Norway. The embargo period vary between 30 to 60 days (NIH, Intellectual Property Protection), to a maximum of two years from the end of data collection (NERC, 3(a) Restrictions to Access). By comparison, ESRC (Principle 5) imposed embargo no longer than 12 months calculating from the end of the grant, while the Norwegian Research Council (para 3.2) fixed the embargo at no later than three years after the project has concluded. In contrast, RCUK position is that the length of embargo period varies by research discipline (RCUK, Principle 5). Apart from embargo, NERC requires data owners to grant a non-exclusive license to allow the funder to manage and supply the data for reuse (NERC, 4(f) Intellectual Property Rights). Based on the above finding the legal impediment arising from data owner’s exclusive rights in research data could be addressed by imposing a minimum period of exclusive use for the researchers/data owners to exploit the research data.

4.4. The Restrictive Scope of the Legitimate Use of Research Data

Analysis found that most research funders require the research data to be accessible free of charge (UK Cabinet Office, Principle 3(19) & (20)), on the internet (Panton Principles), with as few restrictions as possible (RCUK, Principle 1), by anyone on equal terms (Research Council of Norway, para 2.1), for the widest range of purposes (OGD Principles, para 4 Accessible), including for reuse (Government of the Republic of Slovenia, 2.1.2 Open Access to Research Data, re-purposing (Panton Principles, 201), redistribution (Research Council of Norway, 5.0 The Research Council’s Guidelines) and commercial gain (NERC, 3(a) Restrictions to Access), as long as there are no legal, ethical or security-related reasons to preclude this (Research Council of Norway, para 2.1). Besides the research funders, the Panton Principles proposed an open data access policy which permits any data user to download, copy, analyze, re-process, pass them to software or use them for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The Panton Principles also discourage the use of licenses which limit commercial reuse or limit the production of derivative works by excluding use for particular purposes or by specific persons or organizations (Panton Principle, para 3). In addition, CRUK requires any restrictions to data access to be outlined in the data sharing (CRUK, (Intellectual Property Rights and Proprietary Data). In can be concluded that, the legal impediment arising from restrictive scope of legitimate use of research data could be appropriately addressed by allowing the right to use and reuse research data beyond fair dealing exceptions.

4.5. Complex and Lengthy Licensing Procedures for Research Data

In terms of licensing, The Norwegian Research Council requires the license to be internationally recognized (Research Council of Norway, para 5.1 & 5.2), while the EU requires a licensor to grant worldwide, perpetual, royalty-free, irrevocable non-exclusive licensed to use research data (EU, para 2.2 Open Licences). Further the EU recommends Creative Commons (CC) licenses (version 4.0), especially CC0 Creative Commons Zero Waiver (CC0) Public Domain Dedication, as it avoids the need to develop and update custom-made licenses (EU, 2.2 Open Licences) and make a work license-free OGD Principles, para 8). Similar position can be observed in the Panton Principle (para 2–4). As for published research data, the Panton Principles require the research data to be explicitly placed in the public domain with a clear waiver or license (Panton Principles, para 1). Based on the above finding, the appropriate measure to address the legal impediment arising from complex and lengthy licensing procedures should be for the policy to adopt an open content licensing regime based on advance permission which removes the permission barrier, making it faster, simpler and more flexible.

4.6. Author’s Moral Right of Integrity/Attribution

Several research funders such as the NIH requires data users to recognise the proprietary interests of the originator of the research data by giving them appropriate credit for their work (NIH, para 3 & 12). Besides NIH, RCUK also requires data users to acknowledge the sources of their data as a way to recognize the intellectual contributions of researchers who generate, preserve and share key research datasets (RCUK, Principle 6). Further, Denton Declaration on Open Access to Research Data states that the principles of open access should not be in conflict with the intellectual property rights of researchers, whereby a culture of citation and acknowledgement should be cultivated by providing citation (Denton Declaration, para 12). NERC reportedly adopts the citation and DOI-specific metadata laid out in the DataCite metadata schema in ensuring that the researchers responsible for creating the research data receive appropriate recognition for their efforts (). Based on the above finding, among the appropriate measure to address the legal impediment arising from an author’s moral right to integrity/attribution is by introducing data attribution/citation standards that provide a basis for incentives, recognition and rewards for data sharing activities (See ).

4.7. Non-Disclosure Duty of Confidential Research Data

The OECD proposed the data custodians to consider using anonymization or confidentiality procedures that ensure a satisfactory level of confidentiality to preserve as much data utility as possible for researchers (OECD, D. Legal Conformity). The IES, requires its grant holder to develop a data management plan that protects the rights of study participants and confidentiality of the data as required by the laws and regulations (IES, Human Subjects and Privacy Issues). In contrast, the EU allows the researchers to opt out at any stage (either before or after signing the grant), which free themselves from the obligations of open access (EU, 4. Extended Pilot on Open Access to Research Data). Hence, the most appropriate measure to address the legal impediment arising from non-disclosure duty of confidential research data is through anonymization process, confidentiality procedures and development of a data management plan.

4.8. The Right to Informational Privacy of Subjects of Research Data

To address the privacy issue, the research data need to be free of identifiers that would permit linkages to individual research participants and variables that could lead to deductive disclosure of the identity of individual participants (Government of Canada, Policy Environment; IES, Human Subjects and Privacy Issues). This could be done by de-identifying/redacting process which strips all identifiers (Government of Canada, 2.7 Challenges for Policy Implementation). Besides that the ESRC has proposed for sensitive & confidential data or data which pose a disclosure risk after anonymization to be deposited in data provider’s secure access facilities (ESRC, Data Security). Where data cannot be free of identifiers or when identifiers are important for linking datasets, apart from qualifying/exempting the data from data sharing requirements (Government of Canada, 2.2 Policy Environment), the researchers should also consider restrictions on data sharing as provided by data archives or enclaves (IES, Human Subjects and Privacy Issues). RCUK requires data sharing agreement to be signed before data are released prohibiting use of the released data to identify participants or to make unapproved contact with participants (RCUK, Principle 4). On the other hand, the EU introduced ‘Smart Notices’ which is stored in a permanent online location, to indicate the original purpose of personal data collection and processing and serve as a reminder of the obligations with regard to EU rules and national law on personal data protection (EU, 2.4 Personal Data). Where data cannot be stripped of identifiers, the NIH requires the researchers to apply for Certificates of Confidentiality to protect identifiable research information from forced disclosure (NIH, para 2 Protecting Confidentiality and Personal Privacy). To conclude, there are various methods of data redaction and data release that can be adopted to address the legal impediment arising from the right to informational privacy of subjects of research data.

4.9. Protection of National Security

While the research funders recognized national security as one of the exceptions to open access, specific measure to address the legal impediment arising from protection of national security has not been included in the principles/policies/guidelines which have been analysed. In a report “Seeking Security: Pathogens, Open Access and Genome Databases”, the US National Research Council’s Committee on Genomics Database for Bioterrorism Threat Agents () states that the classification system has traditionally been used to restrict access to information that poses a national security risk. Further, the US National Committee on Ensuring the Utility and Integrity of Research Data in a Digital Age (2009) has made a recommendation for the policy makers to draw the line between classified and unclassified data and to balance restrictions on access to sensitive data with the potential costs of such restrictions. Based on the above recommendations, the most appropriate measure to address the legal impediment arising from protection of national security is by drawing a clear line between classified and unclassified research data.

4.10. Novelty Requirements in Patent Law

Most research funders addressed the legal impediment arising from novelty requirements in patent law by allowing data release to be delayed until patent applications have been filed (CRUK, Intellectual Property Rights and Proprietary Data). In contrast, the government of Canada permits delay data sharing until the institutions or researchers are applying for patents or developing new applications based on that data (Government of Canada, 2.2 Policy Environment). The NIH requires the provisions of the Bayh-Dole Act of 1980 (equivalent to Intellectual Property Commercialization Policy for Research & Development Projects Funded by the Government of Malaysia 2009) to be applied, if the outcomes of the research result in inventions (NIH. Intellectual Property Protection). Based on the above finding, the appropriate measure to address the legal impediment arising from novelty requirements in patent law should be for the policy to fix a timeframe for the patent application to be filed to avoid prolonged and unnecessary delay/restriction of data release.

The EU Guidelines states that the licensor provides the information ‘as is’ and assumes no responsibility for its correctness or completeness (EU, 2.3.5 Disclaimer of Liability). The US Committee on Ensuring the Utility and Integrity of Research Data in Digital Age () proposed for a standard of care to be developed as part of the strategy to ensure data quality. Among the standards of care recommended to be imposed on data providers is the responsibility to properly inform, advise and warn data users on the potential risks related to use/reuse of the data (). There is also a recommendation for data providers to supply the information pertaining to the content and the limitation or defect or potential risk in the data utilisation (). Based on the above recommendations, the appropriate measures to address the legal impediment arising from lack of legal duty to ensure data quality is by developing a standard of care on the data providers to ensure data quality.

5. Proposals/Recommendations

This section proposed a model guidelines to address the legal impediments to open access to publicly funded research data in Malaysia. The model guidelines is developed with reference to the principles/policies/guidelines analysed in this study as well as based on the recommendations made by the previous studies. The measures considered most appropriate to address the legal impediments to open access to publicly funded research data in Malaysia have been adopted in the model guidelines. The model guidelines are hereby provided below.

5.1. Guidelines Recommendation 1

RESEARCH DATA PROTECTED AS INTELLECTUAL PROPERTY

1.Research data may be protected as intellectual property especially where sufficient effort has been expended to make the research data as original works.
2.The intellectual property protection of research data does not relinquish the research data from being a subject of data release under the policy.
3.Data owner is to permit open access to research data in accordance to the requirement of the funding agency.
4.Where data owner is an institution, the researcher who is the creator/originator of the research data must be appointed as data custodian to give effect to data release.

5.2. Guidelines Recommendation 2

OWNERSHIP OF PUBLICLY FUNDED RESEARCH DATA

1.To avoid any ambiguity about ownership and worldwide right, title and interest to or in all publicly funded research data in Malaysia which are covered under this Guidelines, it is hereby clarified that:
i.Where the research data is created/originated individually by a researcher who is an employee/registered student of the institution administering the research grant, full ownership and worldwide right, title and interest to or in the research data is vested in the institution regardless whether the research data is originated or created in or outside the course of employment/learning activities.
ii.Where the research data is created/originated jointly under research collaboration, ownership and worldwide right, title and interest to or in the research data is vested in the institution where the researcher is employed/attached/registered, in equal share with the collaborating party.
For the purpose of this guidelines:
i.the terms “employee” and “student” are to be interpreted in accordance to the law, constitution or policy of each institution;
ii.the research data is created/originated individually when the research data is the work of a singular nature, is made up of distinguishable contributions (where each contribution can be identified as coming from a particular researcher) and the research data is independently copyrightable;
iii.the research data is created/originated jointly when the research data is the unified/composite/blended work, is made up of indistinguishable contributions (where each contribution cannot be identified as coming from a particular researcher) or the contribution is distinguishable but copyright of the research data is dependent on the work of other researcher.

5.3. Guidelines Recommendation 3

DATA EXCLUSIVITY

1.Data owner/creator/originator has a legitimate interest in benefiting from the research data but not in prolonged exclusive use of the research data.
2.Data owner/creator/originator is allowed a limited period of data exclusivity, during which a data owner has the exclusive rights in research data.
3.The period of data exclusivity depends on the requirement of the funding agency.
4.Where the period of data exclusivity is not fixed by the funding agency, it is expected that data release is to be given effect:
i)not later than two years from the collection/creation of the research data; or
ii)immediately upon the first publication based on the research data; or
iii)not later than one year from the end (either by expiry or termination) of the award/grant which funds the collection/creation of the research data; or
iv)not later than one year upon completion of the research project for which the research data is collected/created.
5.The earliest data release of the three options shall be the expiry period of data exclusivity.
6.A longer period of data exclusivity shall be allowed only in exceptional circumstances and subject to approval by the funding agency.
7.Upon the expiry of the data exclusivity, the research data must be released in accordance to the policy of the funding agency.
8.Data owner is required to grant to the funding agency a non-exclusive licence to allow the funder to manage and supply the released data for reuse.

5.4. Guidelines Recommendation 4

THE LEGITIMATE USE OF RESEARCH DATA

1.Pursuant to the principles of open access which requires the research data to be released with as few restrictions as possible, data owner must expand the scope of the legitimate use of research data which are protected by copyright beyond the fair dealing exceptions.
2.For the purpose of clarity, the expansion of the scope of the legitimate use of research data beyond fair dealing exceptions should include:
1)for commercial gain;
2)permitting data user to download, copy, analyse, re-process, pass them to software or use them for any other purpose;
3)to distribute full-copies of the research data to the public;
4)to burn copies of the research data on CDs for bandwidth-poor parts of the world;
5)to distribute semantically-tagged or otherwise enhanced (modified) versions of the research data;
6)to migrate the research data to new formats or media to keep them readable as technologies change;
7)to create and archive the research data for long term preservation;
8)to include the research data in a database or mash-up;
9)to make an audio recording of a textual research data;
10)to translate a text of the research data into another language; and
11)to copy a text of the research data for indexing, text mining and other kinds of processing

5.5. Guidelines Recommendation 5

LICENSING RESEARCH DATA

1.Research data which are protected as copyright, sui generis database rights or other “copyright-like” rights and which are released under the policy must be licensed under Creative Commons License with the most liberal CC License which reserves only the right to be attributed as data owner (CC-BY) to be adopted.
2.While Creative Commons Zero Waiver (CC0) licence and Open Data Commons Public Domain Dedication and Licence (PDDL) are more liberal than CC-BY licences, both CC0 and PDDL licences with no rights reserved are inconsistent with the principles of open access not to harm the intellectual property rights in research data and to balance the interests of all stakeholders.

5.6. Guidelines Recommendation 6

MORAL RIGHTS OF DATA CREATOR/ORIGINATOR

1.Data creator/originator is required to permit alteration and modification of the research data which are released under open access policy through a non-assertion pledge of his/her moral right of integrity in the research data.
2.In return, data users are required to recognise the intellectual contributions of researchers who create/originate/generate, preserve and share the research data.
3.Data users are required to acknowledge the sources of their data by giving data creator/originator appropriate attribution/credit for the research data which they exploit.
4.Data users may use the citation and DOI-specific metadata laid out in the DataCite or other appropriate citation and metadata scheme.

5.7. Guidelines Recommendation 7

CONFIDENTIAL RESEARCH DATA

1.Data release must be given effect without violating the non-disclosure duty of confidential research data arising from promise of confidentiality, common law duty (tort or equity) or contractual duty such as confidential agreement or non-disclosure agreement.
2.Confidential research rata must be released using statistical methods such as data suppression, data random perturbations, data coding and recoding which protect the confidentiality of the research data. The statistical methods recommended above must balance the non-disclosure duty against the possibility that the methods applied will also reduce the quality and integrity of the research data.
3.Where statistical methods recommended above are not appropriate/possible, data release must not be given effect. Instead, confidential research data must be deposited in data archive/enclave which is provided by the research institution/funding agency.
4.The data archive/data enclave shall provide a secured, controlled environment where technical mechanisms such as encryption and password are to be used to protect the research data from unauthorized third party’s access and reuse.
5.Where the confidential research data is deposited in data archive/enclave, disclosure of the research data may be considered upon ad hoq request made by the third party, either individual or organisation.
6.Where ad hoc request is made by the third party, disclosure of confidential research data can only take effect after full compliance of the Data Security Procedure of the policy.

5.8. Guidelines Recommendation 8

THE INFORMATIONAL PRIVACY OF SUBJECTS OF RESEARCH DATA

1.The research data may contain:
i.personal information which directly identifies or which could be used to identify subject of research data such as name, address, passport, identity card number, telephone number, e-mail address, photograph, fingerprint, DNA and social security numbers (hereinafter referred as “direct identifier”);
ii.indirect identifier that could lead to “deductive disclosure” of subject of research data. Deductive disclosure of subject of research data become more likely when samples are drawn from small geographic areas, rare populations or linked data sets; or
iii.sensitive personal information such as health information, genetic information, race, religion, culture, ethnicity, national origin, gender, age, marital status, socio economic status, political opinion, educational background, geographic location, sexual orientation or physical or mental health, ability or condition, criminal or prosecution record of identified or identifiable subject of research data.
2.The research data which contains direct/indirect identifier or sensitive personal information of identified/identifiable subject of research data must only be released in a form that protects the right to informational privacy of subject of research data.
3.The research data which contains direct/indirect identifier or sensitive personal information of identified/identifiable subject of research data can only be released with prior-informed consent of subject of research data.
4.In the absence of consent or where consent is not given, the research data can only be released for the purpose that is compatible with the purpose for which the research data was collected.
5.Alternatively, the research data can be released for different purposes and without consent from subject of research data after one of the following data redaction techniques is applied:
i.anonymization/de-identification by stripping or removing personal information which become direct identifier;
ii.pseudonymization by replacing direct identifier such as names with numerical identifiers;
iii.obfuscation by aggregating or reducing the precision of data, information or a variable;
iv.perturbation by introducing random errors into individual records whilst preserving descriptive statistics;
v.generalizing the meaning of detailed text; or
vi.restricting the upper or lower ranges of a variable to hide outliers.
6.Where redaction techniques is not possible, the research data which contains direct/indirect identifier or personal information of identified/identifiable subject of research data must be deposited in data archive/enclave and can only be released in accordance to Data Security Procedure of the policy.

5.9. Guidelines Recommendation 9

CLASSIFIED RESEARCH DATA

1.Release of research data of which disclosure is prejudicial to the national security is strictly prohibited regardless whether or not there is any specific law on this matter.
2.The Data Management and Sharing Plans must clarify whether the research data created/originated by the university researcher may contain information which is prejudicial to national security.
3.Disclosure of research data which contains the following information is classified as prejudicial to national security:
i.instructions and guidance on bomb-making, biological weapon, illegal drug production or counterfeit products;
ii.information and statements with regards to possible terrorist attacks;
iii.information which compromise law enforcement activities, incitement to violence, counsels disobedience to the law or to any lawful order;
iv.information pertaining to prohibited place, munitions of war, apparatus, equipment, and machinery which are used in the maintenance of the safety and security of Malaysia;
v.information with regards to the outbreak of a deadly or contagious diseases;
vi.information which could likely lead to a breach of the peace or to promote feelings of hostility between different races or classes of the population which has a seditious tendency;
vii.information which could likely lead to outbreak of racial, sectarian or political disturbances in general or a specific part of the country; and
viii.documents relating the affairs of states such as military secrets, international affairs or Cabinet documents.
4.The research data which contains any of the information classified above, must be deposited in data archive/enclave and its disclosure is subject to Data Security Procedure of the policy.

5.10. Guidelines Recommendation 10

RESEARCH DATA ABOUT AN INVENTION

1.Release of research data about an invention need to be delayed until patent application is filed in order not to violate the novelty requirements in patent law.
2.To avoid prolonged and unnecessary restriction/delay, decision to patent the invention must be made by the institution within six (6) months upon formal notification of the invention by the researcher.
3.Prior to the decision by the institution, disclosure of the research data about an invention may be given effect in accordance to Data Security Procedure of the policy.
4.Where the institution’s decision is not to patent the invention, the research data about an invention must be immediately released in accordance to Data Release Procedure of the policy.
5.Where the decision is to patent the invention, the patent application should be filed within six (6) months from the date the decision was made, unless it is shown that it is not possible due to the complexity of the patent to be filed.
6.Regardless of the above provisions, the research data about an invention may be disclosed without violating the novelty requirements in patent law, provided the patent application is filed within one year after its disclosure to the public.

5.11. Guidelines Recommendation 11

DATA PROVIDERS’ DUTY TO ENSURE DATA QUALITY

1.The duty to ensure the quality of the research data is shared between the researcher as creator/originator/custodian of the research data (hereinafter known as the “Primary Data Provider”), the institution as data owner and the online repository/archive/enclave where the research data is deposited (The institution and data repository/archive/enclave center are collectively known as “Secondary Data Providers”).
2.For the purpose of the policy, it adopts the definition of data quality given by the US Office of Management and Budget Guidelines for Ensuring and Maximizing the Quality, Objectivity, Utility and Integrity of Information Disseminated by Federal Agencies 2002 (hereinafter referred as the “OMB Guidelines”).
3.Under the OMB Guidelines “Quality” is defined as encompassing utility, objectivity and integrity.
4.Being the Primary Data Provider, the responsibility to ensure data quality ultimately falls on the researcher. The researcher must supply the metadata describing the research data which enables data users to assess the quality of the research data. The metadata must be in accordance to the minimum standard required under Data Documentation and Record Keeping Procedure of the Policy.
5.The Data Repository/Archive/Enclave Manager must ensure that the research data is deposited together with the metadata. The Data Repository/Archive/Enclave Manager must require the depositors to declare whether the research data is subject to evaluation, validation and verification by formal, independent, external peer review in-line with accepted best practice to determine its quality.
6.Where the research data is not subject to peer-review prior to data release, the Data Repository/Archive/Enclave Manager must require the university researcher who is the creator/originator of the research data to properly advise and warn the data users about the fact.
7.Regardless whether or not the research data is peer-reviewed prior to data release, the university researcher must advise and warn the non-expert/non-professional data users on the potential risks related to the use/reuse of the research data.
8.The warning should cover information such as data quality, source materials, the date data was last updated, any known limitations of the data, as well as the limitation, defect or potential risk in the data utilization. The warning should also include an advice on the need to obtain independent or professional advice and verification before acting or relying based on the research data which are not subject to peer review.
9.The institution as owner of the research data must treat data quality assurance as integral to data release. The institution should adopt the standard of care to ensure data quality which is provided under the OMB Guidelines and applicable to the institution and the researchers.

6. Conclusion

Based on the key findings from data analysis, this study developed a model guidelines addressing the legal impediments to open access to open access to research data. As the model guidelines was developed after analyzing principles/policies/guidelines on open access to research data from Australia, Canada, the EU, the UK and the USA it is of international standard and suitable for adoption by research funding agencies or research institutions that plan to introduce policy open access to publicly funded research data in Malaysia. In the future, it is suggested that more research to be conducted to fill the gaps left by this study. A gap exists because of the emphasis given by this study to the legal impediments, as opposed to other types of impediments which also become barriers to open access to research data. It is suggested that research on technical, technological or cultural impediments to the objective of enabling open access to publicly funded research data be conducted in future. To complement open access initiative for research data, future research should also focus on research data in non-digital formats which cannot be released online. Finally, since the model guidelines is still at an early stage of development, it is suggested that future research be conducted to determine what other substantive and procedural provisions ought to be introduced to support the implementation of the guidelines.