CERIF: THE COMMON EUROPEAN RESEARCH INFORMATION FORMAT MODEL

With increased computing power more data than ever are being and will be produced, stored and (re-) used. Data are collected in databases, computed and annotated, or transformed by specific tools. The knowledge from data is documented in research publications, reports, presentations, or other types of files. The management of data and knowledge is difficult, and even more complicated is their re-use, exchange, or integration. To allow for quality analysis or integration across data sets and to ensure access to scientific knowledge, additional information – Research Information – has to be assigned to data and knowledge entities. We present the metadata model CERIF to add information to entities such as Publication, Project, Organisation, Person, Product, Patent, Service, Equipment, and Facility and to manage the semantically enhanced relationships between these entities in a formalized way. CERIF has been released as an EC Recommendation to European Member States in 2000. Here, we refer to the latest version CERIF 2008 – 1.0.


INTRODUCTION
Research is becoming more and more data intensive.With increased computing power more data than ever are being and will be produced, stored and (re-) used.Data are collected in databases, computed and annotated, or transformed by specific tools.The knowledge from data is documented in research publications, reports, presentations, or other types of files.The management of data and knowledge is difficult, and even more complicated is their re-use, exchange, and integration.To allow for quality analysis or integration across data sets and to ensure access to scientific knowledge, additional information -Research Information -has to be assigned to data and knowledge entities.Especially with recent developments in national assessment and performance exercises, Research Information as an asset is gaining ground.The applied evaluation methods depend upon formalized information, and information quality becomes a critical issue (Asserson & Simons, 2006;Bosnjak & Stempfhuber, 2008).Not only at a national level but also at the European scale, Research Information is being recognized as a player alongside publication repositories to improve the access to scientific knowledge (Driver, 2008) and as an enabler for largescale data integration and data management (Joint, 2008;Carpenter, 2008).Most European countries collect and store their Research Information in digital repositories; these may be national, regional, institutional, functional, or thematic in their range, where each system builds upon a particular format or structure to serve for special requests.In order to gain additional value from data and knowledge distributed across systems, the information assigned has to be integrated.That is, the individual information structures and information system formats have to be mapped towards an agreed format within a target system for further analysis and access.Information integration is not an easy task, difficult at the national level and quite a challenge at the European scale (i.e., Jörg et al., 2008) or beyond.However, analysis of and access to scientific data, knowledge, and the information assigned, is an essential requirement in the ERA1 , for innovators, academics, decision makers, media, and the members of the society in general.It is realized that research and development leads to wealth creation and improvement in the quality of life.Because public funding is involved, it is necessary for there to be appropriate governance and also for the information to be available to the public.
CRIS and CERIF approaches into this direction are not new (Asserson et al., 2002).In the 1970s serious efforts for international cooperation among research information systems were made to survey a country's scientific and technological potential and to use such information in the formulation of the science policy on a national level 2 .In 1971, Unisist 3 published a "Study report on the feasibility of a world science information system" (Unisist, 1971).In 1987 the European Working Group on Research Databases held a workshop and, as a result, recommended CERIF to be used as a standard format to permit exchange of records among different European member countries and to serve as a basis for setting up a network among research databases.
Each nation state has similar research processes: strategic planning; program announcement; call for proposals; proposal evaluation and awarding; project result monitoring; and project result exploitation.However, research is international.A research project in one country is likely based on previous research in several other countries.Many research projects are transnational.Knowledge about the research activity in one country may influence the strategy towards the research, including priorities and resources provided, in another country.Thus, there is a need to share such information across countries or even between different funding agencies in the same country.Research Information is used by researchers (to find partners, to track competitors, to form collaborations); research managers (to assess performance and research outputs and to find reviewers for research proposals); research strategists (to decide on priorities and resourcing compared with other countries); publication editors (to find reviewers and potential authors); intermediaries/brokers (to find research products and ideas that can be carried forward with knowledge/technology transfer to wealth creation); the media (to communicate results of R&D in a socio-economic context); and the general public (for interest).Research Information is relevant for actors in scientific environments as well as for decision makers to support related organization, management and planning.We consider Research Information as the transmitter between Science and Society and, as such, as a powerful instrument for governance.Having such an impact, Research Information has to be collected carefully and preserved systematically, in order to most effectively support society and the individuals within (EuroHORCS, 2008).

CURRENT RESEARCH INFORMATION SYSTEMS (CRISs)
Research Information is managed in research information systems.They allow for a coherent view over information about research actors, their activities and their environments (Jeffery & Asserson, 2006a).
Research Information Systems are built upon conceptual domain models to capture the meaning of the domain by structuring it into entities and their relationships (Wand & Weber, 2002).As entities we consider the objects, such as Person, Project, Organization, Publication, Patent, Product, Funding, Equipment, and Facility, relevant in the Research domain.An entity can be represented by attributes and by the relationships it maintains with other entities at a time.The relevant entities, their attribute and relationship descriptions as such, compose the model of the domain for setting up a particular information system.In the CRIS community, we preferably talk of Current Research Information Systems (CRISs) to indicate their dynamics and timeliness (Jeffery & Asserson, 2006b).Some example questions that may be answered from a CRIS are: • Which related project exists within the research group or organization or scientific network researcher X is part of?• By which funding agencies or sponsors is research project A financed?
• How often have articles by author X been cited?• Did author X publish with institutionally external authors?
• In how many FP7 projects does organization Z participate?
• How many publications have resulted from project Y? • How many women have been involved in FP5 or FP6 projects?

THE COMMON EUROPEAN RESEARCH INFORMATION FORMAT (CERIF)
CRIS activities and developments in Europe are tightly interrelated with CERIF.CERIF is considered a standard recommended by the European Union to its Member States4 .The physical CERIF model is a relational database model available as SQL scripts based on common ERM (Entity Relationship Model) constructs (Chen, 1976).The latest releases include a formalized, so called "Semantic Layer," and an XML interchange format (Jörg et al., 2009b).

Figure 1. CERIF Entities and their Relationships
Figure 1 shows the CERIF entities considered relevant to represent the research domain and some of the relationships among them.

Conceptual CERIF Entity Types and Features
The CERIF model is conceptually structured into entity types and features.Among the types, we distinguish core, result, link, and 2 nd level entities.As features, we consider multilinguality and semantics.
• CERIF Core Entities (core): The core entities are Person, OrganisationUnit, and Project.They allow for the representation of scientific actors.Figure 1 shows them in the bottom center, indicating their recursive (circles) and linking relationships.Each core entity links to itself and maintains relationships with many other entities.
• CERIF Result Entities (result): The result entities are ResultPublication, ResultPatent, and ResultProduct.They allow for the representation of research output.Figure 1 shows them in the upper center, indicating their relationships.
• CERIF 2nd Level Entities (2nd): The 2nd level entities are, i.e., Funding, Facility, Equipment, Prize, CV, Expertise, Qualification, Citation, Metrics, Event, PostalAddress, and ElectronicAddress.They allow for the representation of the research environment.Figure 1 shows the 2nd level entities surrounding the core and result entities.
• CERIF Link Entities (link): The link entities are considered a major strength of the CERIF model.Link entities are the reified relationships between core, result, and 2nd level entities.A link entity always connects

CERIF Example Records
The two tables below show examples of CERIF-driven database records: the first representing a Project in context, the second representing a Publication in context.  1 represents a CERIF project record where the common (core) and multilingual (lang) attributes are stored in upper rows.The lower rows show some releationships (link), including their formalized contextual semantics.Linkage is physically established by ids (cfClassId, cfResPublId, cfOrgUnitId, cfFundProgId) as indicated in the Attribute column.The Type column indicates the conceptual entity type (core, link, lang); the formalized semantic values ("2004-IST-3","is originator of", "is coordintated by", "is funded by") are stored in the Classification column, which belongs to the Semantic Layer, where each value is assigned to a predefined scheme ("FP6-IST","PROJ-PUBL","PROJ-ORG","PROJ-FUND").
In the same way, Table 2 represents a CERIF publication record where the common (result) and multilingual attributes (lang) are stored in the upper rows.The lower rows again show some relationships (link), including their formalized contextual semantics.The physical linkage is again established by ids (cfClassId, cfResPublId2, cfPersId, cfOrgUnitId, cfProjId, cfEventId), as indicated in the Attribute column.The Type column indicates the conceptual entity type.The formal semantic values ("Conference Proceedings Article", "is part of", "is author 1 of", "is publisher of", "is originator of", "is presented at") are stored in the Classification column, where each value again belongs to a predefined scheme ("CERIF2008-RESPUBL-TYPES", "RESPUBL-RESPUBL-ROLES", "PERS-RESPUBL-ROLES", "ORG UNIT-RESPUBL-ROLES"). From the two examples, it becomes clear that each CERIF entity record is composed from different entity types and features.The separation of link entities from core, result, and second level entities allows for a rich flexibility with respect to semantic coverage, information integration, and thus applications.

RELATED ACTIVITIES
A survey about standards and formats in the digital library community revealed that there are many different schemas (standards) available in the library domain.Each schema was singularly developed and not designed as an overall architecture to cover integrated object entities.For interoperability and networking in the digital age, the issues of duplicate information, overlap in sections of metadata, need rules that are currently being addressed by good practise guidelines.The resulting report recommends overcoming the problem by best practice guidelines and by pragmatic applications.The report proposes to structure metadata into: • Descriptive: intellectual content • Administrative: technical (file formats), rights management, provenance (creation, subsequent treatment, responsibility, ...) • Structural: internal structure of items (page, order, ...) Data Science Journal, Volume 9, 24 July 2010

CRIS29
With the survey, it was recognized that a combination of metadata standards will be messier than the utilization of a single standard to combine taxonomic powers and to resolve potential clashes or duplications among them.Furthermore, the report revealed that integration by itself would be of little consequence if a common standard fails to address the metadata needs of the digital library community (Gartner, 2008).
CERIF allows for the representation of different standards and structures and, at the same time, enables their integration and mapping towards a common format.

CONCLUSION
The results from the above survey within the library community show that there is increased need for an overarching format to enable quality data integration and interoperability.An overarching standard is advantageous not only for information management but furthermore for advanced data analysis and to grant access to the data, information, and knowledge.The CERIF format offers a model to structure the research domain into relevant objects and their relationships.Moreover, with the Semantic Layer it provides a powerful means for the management of contextual semantics.The current interest, usage, and applications of the CRIS concept and the CERIF model and interchange format encourage further developments.The latest release incorporates a formalization of Publication types and Publication-related links (Jörg et al., 2009c).The priority for formalizing further contexts, i.e., for Funding or Patents will again emerge from ongoing community and task group activities.

Table 1 .
CERIF Project Example Record