Ontology Usability Scale: Context-aware Metrics for the Effectiveness, Efficiency and Satisfaction of Ontology Uses

Ontology Usability Scale: Context-aware Metrics for the Effectiveness, Efficiency and Satisfaction of Ontology Uses Xiaogang Ma1, Linyun Fu2, Patrick West3 and Peter Fox3 1 Department of Computer Science, University of Idaho, 875 Perimeter Drive, MS 1010, Moscow, ID 83844-1010, US 2 Twitter, Inc., 1355 Market Street, Suite 900, San Francisco, CA 94103, US 3 Tetherless World Constellation, Rensselaer Polytechnic Institute, 110 Eighth Street, Troy, NY 12180, US Corresponding author: Xiaogang Ma (max@uidaho.edu)


Introduction
Along with the rapid development of the Linked Open Data, ontologies have been increasingly built and used in various fields (Bikakis et al., 2013). Ontology usability arises as an issue of interest for many stakeholders in the Linked Open Data campaign, including both ontology builders and users. For instance, ontology builders may want to hear evaluation and feedback on the usability of their ontologies and then take actions on the revision. Such evaluation and feedback may also be beneficial to ontology users because they help identify the suitable ontologies and estimate the costs of using them for specific applications. In a broader perspective, the evaluation of ontology usability is a way of communication for developing better ontologies.
We need a set of criteria for the evaluation of ontology usability, through which people will be able to describe and assess an ontology from different aspects. The items in the criteria, however, are decided by the understandings about the meaning of usability. We adopt the definition given in the international standard ISO 9241-11 (ISO, 1998), which has received endorsements from various domains: "[Usability is] the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use." The background of this definition is Human-Computer Interaction and the definition represents a user-centered point of view (Jokela et al., 2003). The definition indicates that usability is not about the product itself (or, its quality), but about the activity of a user using it, so usability depends on the goal of the user and the context of the use. With satisfaction as one of its attributes, defined as "freedom from discomfort, and positive attitude to the use of the product" (Jokela et al., 2003), usability is inevitably subjective.
Therefore, existing criteria for assessing ontology quality do not necessarily fit in the evaluation of ontology usability. Nevertheless, some of them can serve as strong usability indicators. For example, Gruber (1995) discussed several elements for ontology design: clarity, coherence, extendibility, minimal encoding bias, and minimal ontological commitment. Those elements are all good principles for ontology quality evaluation. In a recent work, Fox and Lynnes (2015) added a few other items, namely contextual relevance, maturity, intended use, and fitness for use, to Gruber's list to cover the contextual and subjective aspects of ontology usability. In this paper we propose the Ontology Usability Scale (OUS) for the evaluation of ontology usability. The evaluation metrics are reflected in a short list of statements, which were derived from an online poll in the Semantic Web community. In the reminder of the paper, we will introduce our thoughts on semiotics when preparing a long list of statements for the poll. By using the outputs of the poll, we will also analyze the community's concerns on ontology usability.

Related Work
Although it is hard to find related work specifically on evaluating ontology usability, the general evaluation of ontologies, however, has already received attention even before the introduction of Resource Description Framework (RDF) (Miller, 1998) and Web Ontology Language (OWL) (McGuinness and Van Harmelen, 2004). For example, Gruninger and Fox (1995) proposed a method to check the completeness of an ontology with respect to a set of competency questions. Competency questions are the questions that the ontology is designed to answer, so this method checks whether "the right things are done", and does not benefit people who later want to reuse the ontology.
With the major Semantic Web standards and heavily reused upper ontologies being available, representing domain knowledge with ontologies became a common practice and it became more and more likely to see several different conceptualizations of the same domain. Therefore, ontology evaluation methods for the purpose of selecting and reusing existing ontologies for new applications were greatly needed.
A popular approach to do this kind of evaluation is to define several criteria for decision making, evaluate the ontology in question on each criterion by giving a numerical score, and then compute the overall score for the ontology as a weighted sum of the per-criterion scores. Such methods are called multiple-criteria approaches in Brank et al. (2005). For example, Fox et al. (1995) proposed a set of criteria including generality, completeness, perspicuity, etc. Gomez-Perez (2001) in her paper published in 2001 pointed out the lack of interest in evaluation issues in the ontological engineering community at that time. She also pointed out that tools, tutorials and case studies are critical for ontology engineers to assess the usability of an existing ontology. Nevertheless, the paper does not give an example of ontology usability evaluation, it instead evaluates the Standard-Unit Ontology in terms of consistency, completeness and conciseness, so the evaluation result is not directly related to the effectiveness, efficiency and satisfaction of the users when they reuse a certain ontology. An ontology may be consistent (i.e. without any contradictory assertion), complete (i.e. without any missing definition) and concise (i.e. without any unnecessary definition), but still be unusable or very cumbersome to use (e.g. due to bad documentation).
Lozano-Tello et al. presented the ONTOMETRIC method (Lozano-Tello and Gomez-Perez, 2004;Lozano-Tello et al., 2003), which compares ontologies with a taxonomy of 160 characteristics organized in a multilevel tree-shaped framework. The final score for an ontology is calculated as the weighted sum of the scores given to each of the leaf node characteristics, through aggregations at each of the internal nodes governing aspects and sub-aspects of the ontology characteristics. It could be imagined that the scoring and weight assignment would take a lot of time and be easily biased by the viewpoint of the scorer, as pointed out by Hartmann et al. (2005). Similar with ONTOMETRIC, our approach also requires a scorer to get a single numerical score for each ontology in question, but we aimed to provide the scorer a Likert scale (a questionnaire asking for degrees of agreement with a list of statements) consisting of around 10 items, instead of a huge scoring form with pending weights for each item. We found a small portion of the 160 characteristics directly related to usability, and drew them out as our candidate usability metrics.
Different from the studies mentioned above, Burton-Jones et al. (2005) proposed a set of 10 attributes that fall into four metrics suites (i.e. syntactic, semantic, pragmatic and social quality) of a semiotic framework. They tried to find objective indicators to assess each recognized metric to eliminate the need of human reviewers. For example, the Lawfulness dimension of the Syntax metric is indicated by the percentage of correct syntax per class and property. The authors suggested that the evaluation could be extended to cover application-centered assessment of the quality of an ontology for use in a specific task. We create our list of candidate usability metrics by thinking in the aspects of syntax, semantics and pragmatics. We assume that the difference of social quality (authority and history) between two ontologies would dwarf the overall impact of these three other aspects. In fact, only ontologies of similar social quality are worth comparing. If one of them has significantly more authority (i.e. more ontologies rely on it) or longer history and more usage, it would be an obvious winner. We also found it hard, if not impossible, to assess some of the important metrics with objective indicators. For example, the quality of documentation is very important to the usability of the ontology, but it is hardly possible to be measured in any objective way, so we argue that it requires the participation of human reviewers to get meaningful usability assessments.
The semiotic framework was also used in Gangemi et al. (2006). The authors organized the criteria for ontology evaluation and selection with a semiotic meta-ontology called O2 and the oQual evaluation ontology which involves concepts and relations relevant to ontology evaluation and selection. Therefore, evaluation based on the oQual ontology goes beyond the mere calculation of a weighted sum, but also contains reasoning based on the evaluation ontology. Goals of reusing the ontology and trade-off rules among conflicting goals must be defined formally and in a way that connects with O2 and oQual in order to follow their approach, which limits its application to ontology experts only. The target user group of our metrics is anyone who wants to select an ontology suitable to an application (meaning the user can apply the selected ontology to his/her use case with the most aggregated effectiveness, efficiency and satisfaction among all the candidate ontologies), so we require much less ontological expertise of our users than their approach. Casellas (2009) used the System Usability Scale (SUS, (Brooke, 1996)) to evaluate the usability of the Ontology of Professional Judicial Knowledge (See Chapter 2 of Casellas, 2009) for an introduction of the ontology). Our approach differs with Casellas (2009) in how we select the statements in the questionnaire. Casellas directly tailored the 10 items in SUS, but we think that the set of statements that best indicate the usability of ontologies may need to consider more aspects. We created a pool of 29 candidate statements by adapting several resources, including those used in Casellas (2009). Then we built the usability scale from those candidate statements through some community efforts, i.e. we gathered preferences among the Semantic Web community through an online poll. The result of the poll verified our thoughts since it differed a lot from the statement set in Casellas (2009).

Approach
The intuition behind our approach to evaluating ontology usability comes from SUS, a ten-item Likert scale whose usage is recommended by the UsabilityNet project as "it is very robust and has been extensively used and adapted. Of all the public domain questionnaires, this is the most strongly recommended" 1 . We hope to have such a concise scale that is applicable for ontology usability evaluation.
A direct adaptation of SUS for ontology (i.e. replace "system" with "ontology" in the questionnaire) does not cover all the aspects in our understanding of the ontology usability. Therefore, besides those items adapted from SUS, we collected usability evaluation statements from ONTOMETRIC (Lozano-Tello and Gomez-Perez, 2004) and Casellas (2009) to set up a large pool of statements. When collecting those statements we also refer to the semiotic framework discussed in Burton-Jones et al. (2005) and Gangemi et al. (2006), but our understanding of the syntax, semantics and pragmatics is slightly different because in our work the focus is the usability. We consider syntax is relevant to the machine-readable encoding and logic of the content of an ontology; semantics is relevant to the conceptual model and documentation; and pragmatics is relevant to the first hand experience of using the ontology in practice. Table 1 shows the grouped statements in the large pool. Note that we changed some statements originally in negative forms to positive forms so that all the statements are desired features of a highly usable ontology. Each feature represented by one of the statements can be represented in either the positive form or the negative form. Both forms are found in our original statement set. Two of these statements, numbered 23 and 26 in Table 1, are even in the two forms of exactly the same feature. To eliminate bias caused by the statement representation form and avoid listing the same feature twice (in both positive and negative forms) in the questionnaire, we decided to normalize every statement to its positive form.
The long list of statements can help provide a comprehensive usability evaluation, but the burden caused by the number of statements can be an issue, so we reached out to the semantic web community to ask for a poll for selecting 10 representative statements. We sent out invitations to the semantic web working group of the Federation of Earth Science Information Partners, the semantic web group on Facebook, and also colleagues at Tetherless World Constellation at Rensselaer Polytechnic Institute. To avoid confusion in reading the statements, we changed the forms of a few of them (see notes in Table 1) to make all the statements have a positive form, i.e. towards the goodness of an ontology instead of shortcomings. Moreover, we mixed the sequence of those statements in the survey and did not show the three groups of syntax, semantics and pragmatics of those statements in order to avoid any bias that may be led to the survey participants. We received 18 valid responses in 7 days, and the top 11 statements from the poll and the votes they each received are shown in Table 2.
In Table 2 we can see that 5 of the top 11 statements are about semantics, 4 for syntax and the left 2 for pragmatics, which is a strong indication that usability of ontology is mostly about semantics and syntax. This Syntax (Content structure) 1. I found the various concepts in this ontology well integrated 2. The ontology misses some important concepts -Changed to "The ontology has all the important concepts included" in the survey 3. The ontology has unnecessary concepts -Changed to "The ontology does not have unnecessary concepts" 4. I found the ontology unnecessarily complex -Changed to "I found the ontology brief but comprehensive" 5. I thought there was too much inconsistency in this ontology -Changed to "I found the various parts of this ontology well integrated" 6. I found the formal specification of concepts in this ontology coincides with their descriptions in natural language 7. I found the formal specification of relations in this ontology coincides with their descriptions in natural language 8. I think the attributes in this ontology describe the concepts well 27. I think that I would need the support of a person experienced with this ontology to be able to use it -Changed to "I do not need the support of a person experienced with this ontology to be able to use it" 28. I need some more examples than provided in the documentation to make sure how to use the ontology -Changed to "I think the documentation provides sufficient examples for me to make sure how to use the ontology" 29. I think that I would like to use this ontology frequently 30. I think that I could contribute to this ontology   I found the formal specification of concepts and relations in this ontology coincides with their descriptions in natural language.
10 I do not need the support of a person experienced with this ontology to be able to use it. also supports our above discussion that the SUS probably would not work well on ontologies because it is mostly about pragmatics. We compiled our 10-item Likert scale for ontology usability evaluation based on the above result, as shown in Table 3.
In Table 3, closely related statements about concepts and relations were merged together. i.e., statements 3 and 9 in Table 2 were merged to "I found the concepts and relations in this ontology properly described in natural language", and statement 10 was changed to "I found the formal specification of concepts and relations in this ontology coincides with their descriptions in natural language". We did this based on the assumption that concepts are equally important as and considered together with relations in an ontology.
A scale from 1 to 5 was used to indicate "strongly disagree", "disagree", "neutral", "agree" and "strongly agree" against each statement in Table 3, so higher scores mean better usability. To assess the usability of an ontology using this form, a scorer needs to give a score indicating his/her degree of agreement for each statement, denoted as s 1 , s 2 , . . ., s 10 , then the total score s t is calculated as: , which ranges from 10 to 50. To further improve the questionnaire, we use positive and negative forms of statements in Table 3 alternatingly to make scorers more attentive when they fill out the form. The adapted and reorganized statements are shown in Table 4.
In Table 4, the statements at odd numbered positions are all in a positive form and those at even numbered positions are all in a negative form. To use this form, Equation 1 needs to be changed to: , which still ranges from 10 to 50. A higher score indicates a higher usability.

Case Study and Evaluation
We used the developed OUS, i.e. statements in Table 4 for a case study of ontology usability evaluation within the Tetherless World Constellation at Rensselaer Polytechnic Institute. The case study was carried out as an anonymous online survey, in which each participant was asked to choose an ontology and assign a score to each statement. The outputs of the survey are listed in Table 5. Since revisions are currently undergoing to update the ontologies of Deep Carbon Observatory (DCO) and Global Change Information System (GCIS), we will be able to apply the developed OUS to their later versions to check if the revisions are effective in terms of usability. Comparisons among ontologies with similar intended uses and different versions of the same ontology are made easy with OUS since each ontology is given an overall score calculated from the answers given by each reviewer. Scoring an OUS form ( Table 4) does not require much of the reviewers' time, but the collected answers provide simple yet comprehensive assessments of the usability of the ontologies in question. In addition to the total score, we can also analyze the scores on the same statement from several evaluation cases of a same ontology.

Discussion
The resulting ontology usability scale of this study covers topics of syntax, semantics and pragmatics and addresses the issue of evaluating the usability of an ontology in a certain context. The list of statements in the current scale (Table 4) is based on a survey. It has a concise structure and is easy to use in practice. The scale can be used by all stakeholders who participated in the development, application, and revision of an ontology, and the result can be used to improve the ontology. Semiotics is the study of signs. It is applicable to ontologies because ontologies are sign systems to represent knowledge. The division of semiotics into semantics, syntactics and pragmatics was contributed by Morris (1938). According to Morris, semantics is the study of the relation of signs to the things they refer to (their designata); syntactics is the study of the relation of signs to one another; pragmatics studies the relation of signs to their interpreters. In the case of ontology, we define semantics as the mapping of domain knowledge to ontological elements such as classes and relations, or the meaning of these elements, conveyed through the conceptual model and documentation. Syntactics (or syntax according to Burton-Jones et al., 2005) is defined as the way the ontological elements are organized, usually with terms in RDF Schema (RDFS) (Brickley and Guha, 2014) or OWL. Pragmatics is the relation of ontologies to their users, so it is about the activity of a user using an ontology.
According to the above definitions, it seems only the pragmatical dimension of semiotics is relevant to ontology usability since it is the only dimension about the ontology using activity rather than the ontology itself, and usability is about certain attributes of the activity of using a certain product rather than attributes of the product itself according to ISO 9241-11 (ISO, 1998). However, unlike simple signs which their users interpret out of intuition and experience, ontological terms require users to learn their semantical and syntactical features in order to interpret, typically through learning the term organization and reading the documentation. Therefore, all the three dimensions of semiotics are relevant to ontology usability. In fact, the survey result shown in Table 2 even indicated that the semantical and syntactical aspects may be more important than the pragmatical aspect, since only 2 out of the 11 top selections fall in the pragmatics group.
ISO 9241-11 (ISO, 1998) listed the following three aspects of usability, effectiveness is about accuracy and completeness of the result of that activity, efficiency is the resources expended during the process, such as time and effort of learning and creating solutions, and satisfaction is the freedom from discomfort and the positive attitude towards the use of the product. As we tried to decide which semiotical aspect impact which usability aspect, we found that each semiotical aspect may impact every usability aspect. For example, missing or insufficiently illustrated ontological terms (semantical aspect) may cause the user unable to complete his or her tasks (effectiveness), to spend more time learning the conceptualization (efficiency), and/or to feel discomfortable (satisfaction). Inconsistent or counter-intuitive organization of the ontological elements (syntactical) may have the same effect in terms of effectiveness, efficiency and satisfaction, and the pragmatical aspect aligns well with the overall usability. Therefore, the semiotical aspects and the usability aspects are closely related, so it is reasonable to classify usability criteria with semiotical aspects. The survey result brings feedback and inspiration to ontology developers. As indicated by the top selections in Table 2, stating the purpose of the ontology explicitly, describing classes and relations in detail and providing abundant examples in the documentation will greatly help users understand and use the ontology. The purpose of an ontology is important probably because of the close relationship between usability and purpose. Usability is inherently associated with fitness, and fitness, as summed up by Terry Pratchett in his novel "Moving Pictures", means "appropriateness to a purpose" (Pratchett, 1990) (quoted in Brooke, 1996). Besides the statements pool in Table 1, in the online survey we also invited participants to write down any additional statements from their point of view. One suggestion is about the provenance of components in an ontology, such as the cited source in the definition of a class or property, and the person who asserts the definition, etc. Another issue is about the serialization language of ontologies. It was mentioned that if an ontology is not serialized in a simple format such as Turtle, it can be difficult for a user to read and understand. Another participant proposed the issue of ontology maintenance/sustainability, such as the stability of the ontology over time and the level of maintenance support that the ontology has. In a previous publication (Ma and Fox, 2013) we discussed that to achieve better ontology applications, people need to balance the expressivity, implementability and maintainability of the ontology. The maintainability or sustainability is relevant to the usability of an ontology in a long-term period.
In the online survey we also received active feedback about the organization and form of the statements. In the survey preparation we tried to group the statements into three categories following a semiotic framework, while in the survey we did not show the groups and listed the statements in a random order. Our intention is to avoid any bias that may be caused by those pre-defined groups. It was interesting to see that several participants suggested the statement pool should be categorized, especially from the point of view of an ontologist. Another participant suggested that there can be a separation of statements for subject matter experts and end users.
Survey participants also commented on the positive and negative forms of the statements. From the comments we could see that it is okay to use both forms in a questionnaire, but we should avoid duplicated statements, such as "I thought the ontology was easy to use" and "I found the ontology very cumbersome to use." Considering the feedback, we organized two sets of statements for ontology usability evaluation in Tables 3 and 4, one with only statements in positive form and the other with both positive and negative forms.
Besides the top 11 mostly voted statements in Table 2, we also took a review of the least voted statements in the online poll ( Table 6). Most of them have either an extreme or vague meaning. Statement 1 in Table 6 is adapted from its negative form "I needed to learn a lot of things before I could get going with this ontology"; statement 2 was originally "The ontology has unnecessary concepts"; statement 3 overlaps a lot in meaning with "I found the concepts/relations in this ontology properly described in natural language" but is less clear; statement 4 is very similar to "I would imagine that most domain experts would understand this ontology very quickly", and statement 5 similar to "I found the various concepts in this ontology well integrated" (which was selected 4 times).
We also collected information about what ontologies the surveyees had worked with, the top ones are shown in Table 7.
We found that users of domain ontologies such as SWEET have different preferences on statements from users of general ontologies such as Dublin Core. Top 12 selections for SWEET users listed in Table 8 are very similar with those in Table 2, with only a few minor differences. For example, statement 6 in Table 2 ("I would imagine that most domain experts would understand this ontology very quickly") is missing in Table 8, but it ranks 13th among SWEET users; statement 10 and 12 in Table 8 is missing in Table 2, but they rank 12th and 13th among all the statements so did not make it to the top 11 in Table 2.    But among the top three selections by Dublin Core users shown in Table 9, two of them ("I think that I would like to use this ontology frequently" and "It is clear to me how to use this ontology") are missing in Table 2, so for ontologies designed to be used across different domains, statements 2 and 3 in Table 9 could be considered for the usability scale.
There are several issues that can be explored in future works. The first work is to have more case studies using the statements in Table 3 and Table 4. In this study we only carried out case studies of the ontology usability scale in the GCIS and the DCO projects. Although the feedback is positive, we want to hear more feedback and suggestions on the statements themselves, including both their forms and the orders, as well the topics covered. The second potential work is relevant to the ontology types. As Table 7 shows, among the ontologies that the survey participants had worked with, there are both upper ontologies (e.g. Dublin Core, PROV-O) and domain ontologies (e.g. SWEET, GCIS). The former are applicable across a range of domain and the latter are only used for a specific application or domain. Therefore, we may organize corresponding statements for the usability evaluation of those two types of ontologies, and we can take further surveys to see if there are any differences between the community's concerns on the two ontology types. In the current survey we did not ask the participants to specify their roles and experiences in the ontology work. The third potential work is that, if we are going to have new surveys, we can ask people about their roles (e.g. ontology developer, database curator, application developer, etc.) and their experience with ontology use (e.g. number of years). Last, the fourth potential work is to enrich, update and reorganize the statement pool from the point of view on expressivity, implementability and maintainability of ontologies. This framework is slightly different from the semiotic framework on syntax, semantics and pragmatics and covers new aspects of ontology usability.

Conclusions
In this paper, we followed the approach presented by Brooke (1996) to create a usability scale for ontologies. We considered candidate statements for this Likert scale from the syntactical, semantical and pragmatical aspects and conducted an online poll among the Semantic Web community to decide which are the most important statements. The usability scale was then used to evaluate domain ontologies under revision for the sake of comparing with their updated versions in the future. The goal of our work is to create a robust ontology usability scale. The evaluation of the scale itself, however, requires it being used and even adapted extensively across different domains. In this sense, this work is quite preliminary and without much data to validate its effectiveness, but we expect much more usage data from ontology users since the proposed usability scale is easy to score and is applicable to any domain ontologies.