Proposed Guideline for Minimum Information Stroke Research and Clinical Data Reporting

Judit Kumuthini1, Lyndon Zass1, Melek Chaouch2, Michael Thompson3, Paul Olowoyo4, Mamana Mbiyavanga5, Faniyan Moyinoluwalogo6, Gordon Wells1, Victoria Nembaware5, Nicola J. Mulder5, Mayowa Owolabi6, H3ABioNet Consortium’s Data and Standards Working Group as members of the H3Africa Consortium 1 Centre for Proteomic and Genomic Research, Cape Town, ZA 2 Laboratory of Bioinformatics, Biomathematics and Biostatistics, Institut Pasteur de Tunis, Tunis, TN 3 National Institute of Mathematical Sciences, Kumasi, GH 4 Federal Teaching Hospital, Ido-Ekiti/Afe Babalola University, Ado-Ekiti, NG 5 Computational Biology Division, IDM, University of Cape Town, Cape Town, ZA 6 University of Ibadan, Ibadan, NG


Introduction
High-throughput technologies are increasingly being employed in biomedical-and healthcare-informatics research, producing large biological and clinical data sets at rapid speed (Luo et al., 2016). Modern biomedical and clinical research is thus characterised by the exponentially increasing volume of a variety of data types and structures, produced and processed at unprecedented velocity (Luo et al., 2016). Integrating these different sources of information holds great potential to elucidate the aetiologies of complex medical conditions, develop novel treatments for such conditions and revolutionise modern health care with the incorporation

Results
The online survey was completed by 20 international stroke-specialists, majority of respondents were based in Africa (10), followed by America (4), Europe (4) and Australia (2). Of these respondents; 28% were working as clinicians, 15% were working as researchers, and 57% were working as dual clinician-researchers. The majority of respondents had between 10-and 20-years' experience in the field (38%), whilst 34% of respondents had more than 20 years' experience in the field, 15% had between 5-and 10-years' experience in the field and 13% had less than 5 years' experience in the field. Figures 1 and 2 illustrate the survey response to the proposed elements. Furthermore, respondents proposed additional elements, which shaped the final structure of the reporting guideline. This includes, but is not limited to, Diet and Dyslipidaemia. The raw survey results can be found in Supplementary File 2.
The Minimum Information Required Guideline: Stroke Research and Clinical Data Reporting is summarised in Table 1.
The quintessential information reported using the standard are separated into three fields; participant-, study-and experiment-level information. The standard further divides elements into essential and optional information. Optional elements refer to information which is not necessary for the interoperation of studies within the same field but useful for integrating studies from varying disease fields. Participant-level information contains 13 subsections of varying essential and optional elements, including Demographics, Lifestyle Factors, Anthropometrics, Blood Pressure, Adverse Drug Reactions, Urine-Related Test Index, Stroke History, Figure 1: Survey response to proposed participant-level information.
Sample-Specific Information, Stroke-related Information, Prescribed Medication, Non-Prescribed Medication, and Therapy. The Study-level information includes various elements which describe the details of a given study, including essential elements such as Study ID, Research Institute and Study Design, and optional elements such as Study Duration, Study Start Date, and Pubmed Unique Identifier. Finally, experimentlevel information includes various elements which describe the various experiments within a given study, including essential elements such as Biospecimen Type, Instrumentation employed, Sample Management Protocol, Quality Control Protocol and Experimental Aim, and optional elements such as Output Location, which describes where the data will be saved. Although descriptions of the Output Location are widely encouraged, this data element remains optional to accommodate scenarios where data is private, under embargo and(or) the reporting guideline is used explicitly for internal research data management.
The complete reporting guideline can be found in Supplementary File 3, specifying each element's data type, collection format and (or) accepted values, and related ontologies and standards. Herein, the Ontology ID column contains the most appropriate ontology which the element is mapped to whilst the Concordant Ontologies and Concordant Standards columns describe ontologies and standards which include similar data elements. These lists are not meant to be comprehensive or exhaustive, but to illustrate the utilization and overlap with existing resources. A comprehensive guideline explaining how to employ the reporting guideline locally can also be found in Supplementary File 4.
An associated XML schema was developed for REDCap implementation, consisting of 3 sections -the participant-, experiment and study-level information, and can be found in Supplementary File 5. The relationship between these sections are illustrated in Supplementary File 6. The XML schema represents and requests information as outlined in the proposed reporting guideline, and therefore functions as a standard format of the reporting guideline. Importantly, the reporting guideline and the associated XML schema can also be obtained from the H3ABioNet website (www.h3abionet.org/data-standards/datastds), along with a guideline document on how to employ the reporting guideline locally.

Discussion
The paper outlines the development of the Minimum Information Required: Stroke Research and Clinical Data Reporting Guideline. To our knowledge, though ontologies and collection standards have previously been described for stroke-related clinical care and research, no reporting guideline for stroke research and  Elements Importance Definition

Intensity
The average energy expended per physical activity. Light exercise is 20-60 minutes and elevates heart rate to 35-60% of maximum heart rate (e.g. housework, gardening, slow walking); moderate exercise is 20-60 minutes and elevates heart rate to 35-60% of maximum heart rate (e.g. basketball, single tennis, brisk walking); strenuous exercise elevates heart rate to over 60% of maximum heart rate (e.g. jogging, swimming, bicycling).

Alcohol Use
Lifetime Use E A description of an individual's current and past experience with alcoholic beverage consumption.

Age of Initiation
The age of initiation of alcoholic beverage consumption.

30-Day Frequency
The number of occurrences of alcoholic beverage consumption per unit time (past 30 days).

30-Day Quantity
A record of the quantity of alcohol consumption (in standard drinks) (past 30 days).

Tobacco Use
Lifetime Use E Record of whether the participant has ever used any tobacco product during his or her entire life.
Lifetime Frequency

Age of Initiation
The age of initiation of tobacco use.

Recreational Drug Use
Lifetime Use E Record of whether the participant has ever used a drug during his or her entire life.

Age of Initiation
The age of initiation of drug use.

30-Day Type
A record of the participant's type of drug use within the past 30 days.

30-Day Frequency
The number of occurrences of drug use per unit time (past 30 days).

Elements Importance Definition
Stroke Impact E Disabilities and impairments due to a stroke.

Histopathology E
The visual examination of cells or tissue (or images of them) with an assessment regarding the quality of the cells or tissue.

Pre-Stroke Co-morbidities (Systemic) E
The presence of co-existing or additional medical conditions pre-stroke.

Post-Stroke Co-morbidities (Systemic) E
The presence of co-existing or additional medical conditions post-stroke.
Pathogenic Co-morbidities E The presence of co-existing or additional pathogenic diseases with reference to an initial diagnosis or with reference to the index condition that is the subject of study.
Allergies O An immune response or reaction to substances that are usually not harmful.
Prescribed Medication Medication E A record of the prescribed drug product currently in use.

Dosage
The size or frequency of a dose of a medicine or drug.

Strength
The amount of the medicine or drug that provides its particular effect.

Reason
The cause of the prescription.

Start Date
The calendar date on which treatment was initiated.

Stop Date
The calendar date on which treatment is to be or was terminated.

Non-Prescribed
Medication Medication E A record of the non-prescribed drug product use in the past 2 weeks.

Dosage
The size or frequency of a dose of a medicine or drug.

Reason
The cause of the prescription.

Start Date
The calendar date on which treatment was initiated.

Stop Date
The calendar date on which treatment is to be or was terminated. clinical data has previously been proposed or published. Most notably, the Stroke Ontology (https://bioportal.bioontology.org/ontologies/STO-DRAFT) defines the terms and relationships of the knowledge domain of stroke and The Human Phenotype Ontology (HPO) (Robinson and Mundlos, 2010) defines stroke-related phenotypes. Similarly, PhenX, Clinical Data Interchange Standards Consortium (CDISC) and Health Level Seven (HL7) have previously developed and proposed standards fit for clinical data collection in various disease fields. In the development of our reporting guideline, we utilised these existing resources to harmonise collection measures and terms and develop a comprehensive and harmonised data management tool which allows centralised management of both clinical and research data in a complex disease field (stroke) which requires collaborative and inter-disciplinary research. Combining the clinical and research data elements in one standard allows principal investigators to maintain various levels of data access whilst still centralising comprehensive data management and storage. This empowers principal investigators to manage their research data in a coordinated and comprehensive manner, and to maintain the participant-level data associated with various studies and(or) experiments in a user-friendly way (and vice versa).
Employing the reporting guideline can thus add great benefit to stroke research studies, as it references stroke-based ontologies, data dictionaries and collection standards, ensuring comprehensive, harmonised data reporting, which is re-usable and enhances interoperability. The reporting guideline is designed for use by research clinicians and healthcare workers, researchers, data managers and bioinformaticians involved in stroke research, bearing in mind different levels of data access. Given the appropriate levels of data input and access right, allows the reporting guideline to be used in both research and clinical settings whilst defining the information as essential or optional allows the research to be adaptable for various types of research with regards to stroke. Additionally, the reporting guideline goes beyond listing "minimum required" data elements and aims to provide a comprehensive data dictionary and controlled vocabulary with standardised response options, which is scalable and can be adapted for broader or custom use.
In multidisciplinary fields, standardization can often be difficult to implement, therefore, the reporting standard is also accompanied with an associated platform-specific XML schema. Although XML is not inherently user friendly, and is highly computationally amendable, the schema was specifically designed with the REDCap platform in mind. It is therefore immediately implementable to promote user friendliness in terms of both data capturing and governance, allowing accurate and seamless duplication in the local setting (Eito-Brun, 2018). Additionally, the accompanying Recommendations For Use guideline (in supplementary material) further enables use and user friendliness. The XML has been used extensively for describing data in many applications for storage or transport (Eito-Brun, 2018). The language, by its design, allows for extensibility and self-description. Its openly documented standards, wide adoption, and support in many applications and existing tools make it a good first choice for describing scientific data that could be exchanged between healthcare systems (Eito-Brun, 2018). It has previously been used in health reporting for such purposes (Huser et al., 2015;Schweiger et al., 2005).
As previously exhibited in oncology research, widespread utilization of the developed reporting guideline can function to reduce data and reporting inconsistency and redundancy across systems, as well as promote collaboration and(or) interoperability between systems (Biology et al., 2000;Hartwell et al., 2012;MacCarthy et al., 2018). Promoting such broad use could allow for improved data mapping in clinical registries, improving data quality and interoperability (Rastegar-Mojarad et al., 2017). A given standard may be more widely adopted if advocated or endorsed by "omics" databases, funding bodies and scientific journals, geared towards stroke research, specifically. To promote the adoption of the reporting guideline, we hope to employ the reporting guideline within our own consortia studies, and advocate use on an international platform.
In the future, the H3ABioNet's Data & Standards Work Package aims to develop more domain-specific reporting guidelines which are relevant to both African health and the H3Africa consortia. We also aim to align our efforts with the standardization efforts driven by GA4GH. This will include further refining elements such as ethnicity, diet and prescribed medicine to accommodate African-specific considerations. The Minimum Information Required Guideline: Stroke Research and Clinical Data Reporting aims to promote FAIR reporting and will therefore be added to the FAIRsharing database, as the database provides curation support to resource maintainers, as well as a point of contact for the standard, and related support material (Wilkinson et al., 2016). Bearing in mind the diverse target group the reporting standard aims to accommodate, various methods of implementation will be investigated in the future, to provide comprehensive solutions for collaborative efforts and increase the research data value. Education and training in the use and implementation of these standards will be of high importance to supplement use. Furthermore, additional elements will be investigated for incorporation into the standard, including various environmental factors. Ultimately, the reporting guideline has the potential to support both the H3Africa community as well as the stroke research community at large with current and future research.

Data Accessibility Statement
All data referenced in the article can be found in Supplementary File 2.

Additional Files
The additional files for this article can be found as follows: