Enhancing the Research Data Management of Computer-Based Educational Assessments in Switzerland

Since 2006 the education authorities in Switzerland have been obliged by the Constitution to harmonize important benchmarks in the educational system throughout Switzerland. With the development of national educational objectives in four disciplines an important basis for the implementation of this constitutional mandate was created. In 2013 the Swiss National Core Skills Assessment Program (in German: ÜGK – Überprüfung der Grundkompetenzen) was initiated to investigate the skills of students, starting with three of four domains: mathematics, language of teaching and first foreign language in grades 2, 6 and 9. ÜGK uses a computer-based test and a sample size of 25.000 students per year. A huge challenge for computer-based educational assessment is the research data management process. Data from several different systems and tools existing in different formats has to be merged to obtain data products researchers can utilize. The long term preservation has to be adapted as well. In this paper, we describe our current processes and data sources as well as our ideas for enhancing the data management.


Introduction
Good research data management poses a challenge to all special fields. In each discipline the challenges turn out to be different. In this paper, the research data management in the special field of educational science and computer-based assessment is discussed on the basis of a huge Swiss school assessment project. Chapter 2 describes the assessment project and gives information on its political and legal background. Chapter 3 describes the current processes and data sources relating to the data management and gives an outlook on possible solutions for its enhancement.

Coordination of the Swiss Educational System
Switzerland, as a multilingual and federally structured country made up of 26 cantons, has a decentralized educational system. The consequence is, that the main responsibility for education and culture lies with the cantons. For fulfilling tasks that cannot be performed by the regions or cantons -like harmonizing important structures and goals at the various educational levels or fostering mobility within the country as a whole -the cantons coordinate their work at national level. To this end, the 26 cantonal ministers of education constitute a political body: The Swiss conference of cantonal ministers of education (in German: Schweizerische Konferenz der kantonalen Erziehungsdirektoren -EDK, http://www.edk. ch) which is not a national education ministry but a coordination body. These ministers are known as " directors of education".
The intercantonal Agreement on Education Coordination (dating from 1970) forms the legal foundation for the collaboration between the cantons in the area of education. In addition to this agreement, the work of the EDK is also based on other intercantonal agreements which are also referred to as "concordats".

Harmonization of the compulsory school
On the 21st of May 2006, 86% of Swiss voters accepted a revision of the educational article in the Swiss Federal Constitution. Since then, the education authorities have been obliged to harmonize important benchmarks in education throughout Switzerland. For the compulsory schools, the cantons have to fulfil this constitutional mandate.
The Intercantonal Agreement on the Harmonization of the Compulsory School (HarmoS concordat) contains provisions on the duration and objectives of educational levels, the language lessons as well as lesson times and day structures. At the same time, the existing national solutions are updated in the school regulations of 1970 regarding school age and compulsory schooling. The concordat took effect on the 1st of August 2009. In the years to come, the verification of the achieved national educational objectives will be one of the priorities of the HarmoS concordat.

National educational objectives
With the development of national educational objectives in four disciplines, the EDK has created an important basis for the implementation of this constitutional mandate to harmonize the objectives of educational levels. At their plenary session on 16th of June 2011, the 26 cantonal education directors have approved the educational objectives. They define the four "core skills" children and young people should acquire during school. Three of these skills, language, mathematics and natural science are assessed after two, six and nine years of school, and the fourth skill, foreign languages after six and nine years of school, since foreign language teaching usually only takes place after the 2nd grade.
The core skills do not cover all the teaching and learning content of compulsory school or respective school subjects. However, they represent the "core" of school education. They include basic skills as well as basic knowledge in four subjects that are essential for further school education. The core skills have been incorporated into the new curricula of each Swiss language region. In the curricula the whole set of educational objectives are defined.

Organization and implementation of the assessment
The EDK plenary assembly of 20th of June 2013 has agreed on a moderate approach towards the verification of the core skills. By 2018 two assessments with representative student groups will have taken place. The first one took place in May/June 2016 and concerns the special field of Mathematics at the end of the obligatory school (9th grade). The second will take place in May/June 2017 and assesses the school language as well as the first foreign language at the end of primary school (6th grade). They will be the first national skills assessments of the compulsory school outside of PISA in Switzerland. In contrast to PISA, they are carried out with Swiss measuring instruments and thus will lead to more meaningful results for Switzerland than the results of PISA.
All cantons will participate with a cantonal sample of about 1000 students, 25.000 students in the whole of Switzerland. The surveys will provide information on the performance of the educational system, up to the level of the cantonal school systems. As in PISA, no statements about the performance of individual schools are made (no school rankings) and, the results cannot be attributed to individual students, teachers or classes. The results will be published in the Swiss Education Report 2018. The national core skills assessments will not take place in the same years as the PISA surveys.
The EDK has commissioned various institutions with the planning, organization and implementation of the study. The assessments are conducted in German-speaking Switzerland by the University of Teacher Education St. Gallen (PHSG, http://www.phsg.ch), in the French-speaking part of Switzerland by the Service de la recherche en éducation (SRED, https://www.ge.ch/sred) and in the Italian-speaking canton of Ticino by the University of Applied Sciences and Arts of Southern Switzerland (SUPSI, www.supsi.ch). The University of Applied Sciences HTW Chur (HTW Chur, http://www.htwchur.ch) is responsible for the management of the work package "data management and IT", the Item Database at the Swiss Coordination Centre for Research in Education (SKBF-ADB, http://www.skbf-csre.ch/skbf/adb/) for the work package "item and test development". Further institutions participate in the different work packages. A prospective scientific consortium will take over the project lead.

Reasons for good research data management
A good research data management of Swiss National Core Skills Assessment data is essential for several reasons. The main intention of the study is to get knowledge about the core skills of Swiss students. The immediate analysis of the data after the collection is mandatory for political reasons. Therefore, researchers belonging to the national consortium have to be provided with the complete, processed and well-documented data promptly after the collection to build the basis for the national educational report to be published in 2018. Until then, the data is not publicly available (e.g. for external researchers). After this embargo, it will be available via FORSbase (https://forsbase.unil.ch), for scientific use with the purpose of replication and secondary analysis. For these purposes, sufficient documentation and metadata has to be included as well.
A further reason is that the assessments will be repeated in the upcoming years. The aim is that analysis of individual learner progresses or trends in education over longer periods would be possible after several waves. The special challenge here is the long-term preservation of the data. Items and results from previous rounds have to be processible in later rounds as well. A problem of computer-based assessments is that the different data formats of the different software versions (e.g. item production tools) used over time are not compatible among each other, thus preventing a comparison and merging of data collected in different waves that used different software versions. That means if technology progresses the test is lost for future research and reuse. That is why improvements especially on backwards compatibility of software versions and standardized data and metadata formats are necessary.

Kinds of data
The Swiss National Core Skills Assessment uses a computer based test to investigate the skills of students. As primary schools in Switzerland usually do not have a separate computer room but rather 1-2 computers in every class, two different modes of data collection have to be employed. Data collection in primary schools is performed by bringing in a special trolley containing 20 tablets and a small server while data collection in secondary schools (where designated computer class rooms with 20-25 computers exist) is done via a cloud-based online test. Also, the used software and as a result the output differs.
The online test in secondary schools uses SPSS Dimensions for the questionnaire and the open source TAO platform for the cognitive part, resulting in the following metadata and data formats: The management of students and schools is currently done by using a Filemaker database which is converted into Excel tables. The test administrators then have print-outs of those lists for the testing situation in the classroom.

OAIS compliance
After an embargo until 2018 the research data will be publicly available for researchers in the long term. Responsible for this task is FORS (http://forscenter.ch) the Swiss Service Center for the Social Sciences. As FORS aims at becoming a trustworthy long term archive for the research community they are geared to the reference model for an open archival information system (OAIS). (FORS, 2017b) In chapter 3.1 the OAIS defines some mandatory responsibilities an organization must discharge in order to operate an OAIS Archive (CCSDS, 2012). How do we perform these tasks in the Swiss National Core Skills Assessment Program? The following subsections give an overview on this plan, together with the associated OAIS responsibilities quoted in italics.

Negotiate for and accept appropriate information from information Producers.
Obtain sufficient control of the information provided to the level needed to ensure Long Term Preservation.
As described above these tasks are very challenging in the area of computer-based assessments for varies reasons. Two things are not clear yet. What is the information researchers need for further analysis? Existing documentation standards are not suitable for educational computer based assessments (Barkow, 2016). And how do we get enough control about the processes, workflows, formats and data end-products to ensure long term preservation? Our first thoughts are described in chapter 3.4.
Determine, either by itself or in conjunction with other parties, which communities should become the Designated Community and, therefore, should be able to understand the information provided, thereby defining its Knowledge Base.
Ensure that the information to be preserved is Independently Understandable to the Designated Community. In particular, the Designated Community should be able to understand the information without needing special resources such as the assistance of the experts who produced the information.
The designated community was defined as the research community especially researcher from the social including educational sciences. Our aim is to provide them with all data and metadata they need for the purpose of replication and secondary analysis.
Follow documented policies and procedures which ensure that the information is preserved against all reasonable contingencies, including the demise of the Archive, ensuring that it is never deleted unless allowed as part of an approved strategy. There should be no ad-hoc deletions.
In its mission statement (FORS, 2017a), its preservation policy and further documents FORS illustrates its sustainability and guarantees the long-term usability and comprehension of its archived data. This includes documentation of the IT infrastructure as well as backup and risk management strategies for the physical security of the data.
Make the preserved information available to the Designated Community and enable the information to be disseminated as copies of, or as traceable to, the original submitted Data Objects with evidence supporting its Authenticity.
FORS will make the data available via FORSbase -its' online access platform. Before receiving the data, users must accept a user contract, legally binding them to particularly respecting the confidentiality of individual study participant's data.

Enhancement of the processes
The Swiss National Core Skills Assessment Program has a long-term vision of using only open source software and open metadata and data standards. But the realities of having to conduct the survey on a yearly basis already lead to a set of pragmatic solutions resulting in rather manual data management processes and use of proprietary formats. Unfortunately, like in many other research projects, data management was not initially considered during the study planning phase of the Swiss National Core Skills Assessment Program. The changes in processes described below will take effect at the earliest in 2019.
To get a common processed dataset e.g. for a scientific-use-file, currently a lot of manual data merging and cleaning is needed. This work is currently done by the German Institute for International Educational Research (DIPF, http://www.dipf.de) for merging and FORS for data cleaning. The question is now how can we enhance the processes and tools of the whole assessment project to that effect, that the data which is ingested at FORS is in an state which enables long term preservation without extensive manual postprocessing and programming. A huge challenge in the overall process is for example that a lot of tools have their own proprietary standards which have to be converted into proper metadata standards (e.g. Limesurvey questionnaire into Data Documentation Initiative Lifecycle v3.2, http://www.ddialliance.org/Specification/ DDI-Lifecycle/3.2).
In the long run our vision and expectation is to have a complete workflow within a data management platform. A beginning is the development of a school survey management system starting in the second half of 2017 which will replace the print-outs of student lists and support the management of students and schools as well as the whole field work and monitoring process. Before its development could start a detailed analysis of the as-is state and the target state of processes has to be done.
Every step of computer based assessments produces data and metadata. For that reason, we would like to use the Generic Longitudinal Business Process Model (GLBPM) (Barkow et. al, 2013) as means for the analysis. It was developed at a DDI Alliance Dagstuhl Workshop in 2011 and specifies the business processes in a longitudinal study in the social sciences that means in the light of permanent repetitions. Although the model does not include processes regarding cognitive item construction or the handling of computer based assessments, it seems to be the most suitable model for our purposes.
Our analysis will be conducted considering the following questions: • Who are the stakeholders in the whole data life cycle of the Swiss National Core Skills Assessment?
• What roles, tasks and functions do they have?
• What are they doing in which phase of the GLBPM?
• Where in the GLBPM is data and metadata produced?
In a second step, we have to analyze the target state and find answers to the following questions: • Do we have to modify the GLBPM for our purposes?
• How can we optimize the processes?
• How can the optimized processes be modelled and represented in software tools?
• Which metadata standards would be appropriate as exchange format and for the long-term preservation? • Which metadata supports the reuse and replication of the data in the best way? • Which fields have to be added to the most fitting metadata standard? • Which data products do we need in which phase of the GLBPM that means during and after the data collection and for the long-term preservation?

Conclusions
The Swiss National Core Skills Assessment is of national importance. Its aim is to make statements about the achievement of the national educational goals and to further modify the curricula in Switzerland. For this reason, it is all the more important to provide a quick and high quality long term access for researchers to the data and its related documentation. The enhancement of the research data management which is planned for the next two years will contribute to these quality requirements.