1. Introduction: Digital Heritage and the Semantic Web

The care and availability of Cultural Heritage (CH), as officially managed by memory organisations that largely comprise the GLAM sector (Galleries, Libraries, Archives and Museums), are increasingly being shifted toward more interdisciplinary, post-custodial and data-driven approaches. CH assets are forwarded by digital means as a strategic necessity for the GLAM sector, in order to keep pace with the digital transition worldwide and across disciplines.

To this end, cultural data is considered challenging, due to its specific features; in particular, CH data is distinguished as multi-formatted (content exists in various media as audio or video records, text documents, images, physical and digital objects), multi-topical (topics include art, anthropology, archaeology, literature), multi-lingual (content exists in different languages as well as extinct languages), multicultural (content relates to and is being interpreted by different cultures) and multi-targeted (content is targeted to laymen as well as experts, different age groups and social classes) (Hyvönen, 2012).

Going beyond an object-centered CH conceptualisation, CH can be understood as collective actualities that connect diverse aspects of social organisation, systems of thought and actions. These actualities form a shared landscape of smaller or larger communities, shaping a certain material, mental or experiential space, that is being formed and transmitted over time through cultural mechanisms (Bouchenaki, 2003; Carboni & de Luca, 2016; Lemonnier, 2012).

Memory institutions are progressively complying with up-to-date international standards, good practices and legislation toward their digital transformation. IT-informed frameworks and administrative processes currently applied in the Digital Cultural Heritage (DCH) field take into account the evolving Semantic Web, which designates a data-centered processing of web information that upgrades the previous “Web of Documents”, merging the web of human-readable documents with the web of machine-understandable data, also known as the “Web of Data”. (Sack & Koutraki, 2017).

In this light, the GLAM sector is urged to manage CH content by applying models that express the semantic power of CH data. The idea of the Semantic Web has intensified the application of computational methods in the management and study of CH and has communicated the value of providing open-access, structured, interoperable data. However, the multi-perspective nature of CH content management and the multi-modality of CH data pose challenges to interlink and integrate the heterogeneous content and data. In order to further develop the knowledge field of CH and advance the output to end-users within the Semantic Web, Ruben Verborgh suggests new keywords as cleaning, reconciliation, enrichment and linking (Van Hooland & Verborgh, 2014). Thus, good practices in the CH field should ensure semantic interoperability and facilitate the reuse of the applied models, creating the global cultural Semantic Web in an optimal way.

2. Conceptualisation of Linked Data

The Web of Data is being conceptualised as a large decentralised knowledge infrastructure of both, human and machine-accessible data, operating through information exchange that is explicit (well-defined concepts), formal (standardised) and distributed (nodal interconnected), with the use of open, interoperable and shared metadata models. Linked Data is based on the semantic metadata model of the Resource Description Framework (RDF), the principal vocabulary of the semantic web which is machine readable and codifies exchangeable web data in simple triple forms of subject-predicate-object.

From an information science perspective, the definition of metadata is structured data associated to a designated entity (Greenberg, 2003). A classification of metadata proposed by the Library of Congress distinguishes three metadata types: descriptive, structural and administrative (Giannoulakis, Tsapatsoulis, & Grammalidis, 2018); in particular, descriptive data contains information that identifies the referred entity, structural data defines the relations of the entity to other entities or parts of its own data and administrative data helps managing the referred resource through such information as data formats, access rights and history of legacy data.

Linked Data forms the basis of the Semantic Web and can be further classified in five distinct levels of representation: real world, data, metadata, ontology and metaontology levels (Hyvönen, 2012). Specifically, a real world consists of a referred area of concern of co-existing elements, which is not necessarily linked to the physical world, but can refer to imaginary, literary worlds as well. Data is the informational representation of the elements of the real world and metadata is structured, encoded data about these representations. Ontologies define the terms by which metadata can be classified and designated through vocabularies. Ontologies can thus classify “real world” entities with logic-based axioms. In this view, ontologies can define associative, partitive relations among elements. A key standard model of ontology for Linked Data and Semantic Web is OWL (Web Ontology Language), a language with high expressive power for associating metadata, which enables rich semantic features and reasoning possibilities.

Finally, meta-ontologies designate logic and rule systems that apply a certain reasoning to vocabularies and ontologies, extending knowledge capacity. Rules linked to meta-ontologies can thus be applied to knowledge bases, which are constituted by vocabularies (RDF) and their related ontologies (OWL). SKOS (Simple Knowledge Organisation System) can be designated as a common meta-ontology. In this context, reasoning models can express either closed or open systems. In open systems new entries are possible and thus new relationships can be created, resembling the function of a container. In closed systems no extension is possible and there is a limitation to a set of predefined relationships, resembling the function of a collection (Decourselle, Vennesland, Aalberg, Duchateau, & Lumineau, 2015). Moreover, open systems are connected with the Open World Assumption, a system of logic in which unknown facts are assumed to be correct by default. In contrast, the Closed World Assumption regards as false all statements that are not explicitly stated. Although an open solution can become more interesting, it can become also less intuitive, more difficult to implement and consequently error prone, as everything can be assumed to be valid if there is no disjointedness stated (Hitzler, Krötzsch, & Rudolph, 2010).

3. Linked Data for DCH

Extending Linked Data and the aforementioned levels of representation to the DCH field, some practical paradigms considered good practices can be outlined. In this light, the “real world” level can refer to the field of cultural heritage in general as the domain of interest. The next level with CH-related “data” can comprise a diverse set of elements as cultural objects, textual records, images as well as born-digital art, creative software or video recordings of performances and transcribed community interpreted songs. The last two examples of CH data can be linked to the subdomain of Intangible Cultural Heritage.

The “metadata” level for DCH comprises data that describes cultural items in GLAM collections. Memory institutions are working to provide semantic-based data access in larger or smaller units through API’s and digital collection datasets respectively, aiming for good quality metadata by balancing between professional metadata management, professional-amateurs (pro-ams) collaborations and semantic-aware automated technologies for dealing with bulk data (Mosca, Remesal, Rezk, & Rull, 2015), as data mining and information extraction. Furthermore, CH data is described through metadata schemas, which refer to specific formats in which metadata is presented. In particular, web schemas are metadata models developed for describing CH data on the web. A widely used web schema among organisations that is also suitable for use in the GLAM sector is the Dublin Core (DC), consisting of 15 metadata elements; however, it can be extended to include more.

On the ontology-level, CH applications make use of the extended related terminologies and concepts that are being organized via classes, relations and instances. As argued by the W3C (World Wide Web Consortium), ontologies and vocabularies within the Semantic Web can be regarded as synonymous, although ontologies may denote more complex conceptualisations (W3C, 2015). Major types of ontologies specified for the CH domain include ULAN (Unified List of Artist Names), an actor-ontology type that represents relations and groupings of artists, cultural related organisations and other important figures in art. ULAN is part of the Getty Vocabularies, together with the AAT (Art & Architecture Thesaurus) and the TGN (Thesaurus of Geographic Names), three semantically rich and encompassing ontologies of cultural and art related concepts, managed by the Getty Research Institute. In regard to the matching terms and the various levels of complexity when describing formal models in vocabularies, ontologies, thesauri and dictionaries, it can be argued that expressive intensity is a factor for designating formal models as ontologies. In this respect, TEI (Text Encoding Initiative), a metadata standard for the representation of texts, can be considered close to an ontology when the encoding is more formalized and semantically rich, although XML schemas, as the ones used in TEI, are usually less expressive (Eide, 2014).

Regarding the application of meta-ontologies in CH data, a compilation of generic rules based on reasoning systems can be applied. CIDOC-CRM, the Conceptual Reference Model of ICOM’s Committee for Documentation (Comité International pour la Documentation), is a widely spread meta-ontology, designed for use in the museological and cultural heritage sector. Closed world assumption, as outlined before, is usually preferred for handling CH data. Rules can be applied in order to produce semantic recommendations when content is being queried (through a SPARQL endpoint). In addition, recommendation systems can become explainable by conveying algorithmic reasoning as human-readable output (Antoniou, Groth, van Harmelen, & Hoekstra, 2012).

4. Toward a Conceptualisation of Digital Intangible Cultural Heritage

Intangible Cultural Heritage (ICH) has been introduced as a cultural concept and Convention by the United Nations Educational, Scientific and Cultural Organisation (UNESCO, 2003), designating expressions and forms of everyday culture known as tradition or “Living Heritage”. The term had been previously reified along international forums for applied sustainable development and ideas of ‘dematerialisation’ in the 1980s and 1990s. ICH may be understood as one of the three major parts that complement CH (together with tangible heritage and natural heritage).

ICH encompasses immaterial cultural actualities as ephemeral, collective expressions of the everydayness, embodied and expressed by smaller or larger communities. The Convention of UNESCO attempts to safeguard and thus raise awareness, ensure respect and promote practices, knowledge and techniques, performing arts and ceremonies, as well as tools, processes and places that are associated with and recognized by communities. Furthermore, the World Intellectual Property Organization has recently pointed the value and benefits of traditional cultures, on an individual as well as collective level (World Intellectual Property Organization, 2017). ICH describes living cultural expressions and practices that are community-based and collectively experienced, e.g. from performative acts and mythological systems, to communal compositions and biocultural sensibilities. Its multivalent manifestations may combine i.a. sound, movement, spatial densities, synergies and radial properties that often deviate from object-centred approaches, allowing the mapping of more processual, affective and technical ensembles.

In the last two decades, ICH is gaining momentum through official institutional support (transnational, governmental and local agencies i.a. ICOM, World Intellectual Property Organization, national ICH inventories, union directorates). In addition, ICH is being revitalised through a notable effervescence in artistic, cultural practice and proactive community participation. However, many forms of living heritage are in a critical state of urgent safeguarding, as a result of factors like decontextualisation, environmental degradation or community loss. At the same time, media archives of ICH are at risk or have already gone missing, as a result of analog/digital media obsolescence, as well as technical, conceptual and systemic impediments that hinder its proper integration in memory institutions.

ICH can be further linked with digital cultural heritage, through its documentation, representation and preservation by digital means. However, ICH poses rising conceptual and technological challenges, in regard to its theoretical modelling and digital documentation (Carboni & de Luca, 2016; Giannoulakis et al., 2018; Kettula & Hyvönen, 2012; Wijesundara & Sugimoto, 2018). The dichotomous view of CH as a concept of discerned tangible and intangible assets is often pointed as problematic, since “real world” cultural entities incorporate typically both elements to various proportions.

Moreover, CH has been interpreted on the basis of an object-centred approach in both, traditional cataloguing practices and digital collections management. Although ICH can be strongly connected to tangible artifacts as well, more often it deviates from materialities and manifests through event and process-based structures. From an information perspective that transforms features of “real world” entities to formal models and symbol structures, such manifestations can be challenging because more abstract, implicit, performative, processual and symbolic elements are attached.

From a theoretical and curatorial point of view, ICH is linked to such concepts as transient community memory and imaginary, re-interpretation, re-creation and re-enactment of cultural legacies, participatory storytelling and cross-country narratives, anonymous improvisation based on popular motifs and shared aesthetics (Ziku, 2018). However, there is a certain limitation of formal and logic systems to interpret the many nuances of a “real world” that needs to be taken into account, as more expressivity comes at the expense of more complexity which might lead to ambiguity. In this respect, the focus in the formalisation of ICH is on providing a useful, practical approach.

The objective in the case of ICH and its semantically-aware digital documentation is the convergence between critical-informed contemporary theory and technical practice. Moreover, moving away from object-centred structures toward process-based models, with a focus on semantic interoperability that is reinforced by standardising the documentation of ICH as a whole and not as case-based singular solutions. In addition to these, Web schemas should be light-weighted, however, they should be versatile in order to capture valuable inferences, probably adopting the logic of an open system that is able to capture and define new information.

5. Linked Metadata Schemas for ICH

As pointed out, a dichotomous interpretation of tangible and intangible heritage can become problematic for digital documentation, as tangible artifacts can encompass intangible elements at variance, as in the case of symbolic paintings, while intangible assets can be highly attached to material objects, as in the case of traditional craftsmanship. In addition, the dynamics of cultural objects overall are expressed to a great extent through relationships, processes, tacit knowledge and performative acts. Hence, a constitutive metadata schema of ICH should be capable to integrate a combination of metadata standards, in order to encode the multitude of its elements, which would not be successful to cover with a single schema.

The linked metadata schemas for ICH reviewed below, compile the existing academic literature on the topic which has limited contributions so far. Discussed from a digital documentation perspective, the models are either applied solutions or conceptual frameworks. In general, the proposed linked data models of ICH combine event-centred rather than object-centred conceptualisations and more versatility in their demonstrated applications. This is particularly evident for traditional expressions that manifest spatial and time-dependent narratives, as in dance, rituals and craftsmanship. In this context Kettula and Hyvönen (2012) use as their working model a video record of traditional shoemaking, proposing a process-oriented analysis and cataloguing of video documentations, where parts of the film can be indexed as annotated video, in regard to sub-processes that successively take place within the video.1 This approach uses the Finnish Semantic Web publication system CultureSampo based on the FinnONTO ontology, which allows interrelationships between records in ICH. According to Carboni and de Luca (2016), CultureSampo is an outstanding example of data integration and harmonisation in CH, which brings forward the concept of collective semantic memory through a collaborative infrastructure (Hyvönen et al., 2008).

An in-depth introduction of Linked Data by Oldman, Doerr and Gradmann (2015) addresses the importance of the micro level and the capacity of Linked Data under specific contextual models to encode lineage of knowledge provenance, as well as stratification layers of knowledge abstraction. The concept of the micro level would be particularly valuable for the documentation of ICH, as it can build more sophisticated representations of the local context and trace historical and location-based routes of documentations. CIDOC-CRM (Conceptual Reference Model) is a suitable meta-ontology that can express micro level sensitivity. Since it is designed specifically for CH data, the framework of CIDOC-CRM can reasonably deal with some of the domain’s challenges, as describing the explicit and implicit relationships of CH.

However, as stated above, multi-faceted expressions of ICH may require more ontologies to define all its metadata elements, dealing for example simultaneously with movement, musical motifs and related lyrics, location-based performance, wearable artifacts and other cultural objects used. These can respectively be encoded with MovementXML, which can create larger units by arraying smaller movement parts, encoding dance annotation. Moreover, MusicXML can encode musical motifs, whereas lyrics can be encoded as text with the TEI. Wearable artefacts as costumes can be described by VRA standards, as proposed by Giannoulakis et al. (2018), based on a fashion-objects cataloguing project that marked VRA as a suitable metadata schema for this domain.

Although a compilation of ontologies, as the aforementioned, can be utilised in order to document the diverse data of ICH in a more optimal way, many important aspects are missing; such aspects can be facial expressions, the environment as a meaningful scenography, geographical regions beyond country-borders, extended to encodings of binaural and visual 3D recordings. Since existing metadata schemas do not cover these elements, it is vital to develop new, interoperable and integrative ontologies. In recent years a few intriguing studies have proposed novel approaches to the documentation of ICH, by suggesting computing methodologies in order to deal with its complexities. Aalberg, Vennesland and Farrokhnia (2015) present a pattern-based framework inspired by the use of design patterns in software-engineering. This approach could be very useful in terms of organising the compilation of different metadata schemas, as a catalogue of schema patterns that can be arrayed in recombinant ways.

In a similar direction, probabilistic ontological frameworks deal with the documentation and decoding of repeatable patterns in a compound manner. Chantas, Karavarsamis, Nikolopoulos and Kompatsiaris (2018) proposed a probabilistic, ontological framework based on Multi-Entity Bayesian Networks (MEBNs), as a way to document and decode folk dance patterns and their relation to rhythm. In their examples Chantas, Nikopoulos and Kompatsiaris (2014) and Chantas et al. (2018) demonstrate a working model for representing the non-deterministic quality of successive pattern formation in ICH. An ontological mapping of ICH concepts can thus be based on MEBNs, that are capable to consolidate the multimodality of ICH to a great extent, including different styles (e.g. dance, rhythm, singing) and a variety of formats (e.g. audio, video). In this way certain modules of a particular style, for example a series of steps from a traditional dance, can be modelled as arrangeable blocks on the basis of probabilistic inference. To this end, the affordances of more adaptable ICH knowledge representations schemas can be further explored in semantic web ontologies.

In addition to these, administrative issues related to a critical-informed documentation of ICH need to be considered; ICH is typically catalogued in local, national and international repositories, whereas many ICH expressions have been catalogued simultaneously but separately in several countries, resulting in overlapping or conflicting conceptualisations. Thus, rules i.e. metaontology systems for smooth data alignment and linking of the datasets must be foreseen, in order to overcome merging problems that can typically occur.

6. Discussion

The essay reviewed current working models and conceptualisations of the semantic documentation for cultural heritage, focusing on the interdisciplinary field of ICH and Linked Data, a field with limited contributions so far. Six assorted studies have been included that introduce technical practice and working models of Linked Data for ICH. A process-oriented analysis has been developed for ICH as a good example of data integration using the publication system and collaborative infrastructure CultureSampo. Next, the concept of the micro-level to encode lineage of knowledge provenance for cultural data has been stressed by Oldman et al. (2015), pointing to CIDOC-CRM. The need for more ontologies to capture the multi-faceted expressions of ICH has been stressed by Giannoulakis et al. (2018), in particular referring to MovementXML, MusicXML and VRA Standards for the case of semantically documenting folk dances. Furthermore, computing methodologies are being introduced, as new media are used for documentation (binaural and visual 3D recordings) and nuanced aspects are still missing from documentation (environment, expressions, geographical regions beyond borders). Aalberg et al. (2015) propose a pattern-based framework, while Chantas et al. (2018) describe a probabilistic ontological model.

Further research is needed toward these directions for extending ontology representations for more flexible, nuanced and accessible models. The development of integrated research infrastructures specifically designed to deal with the semantic documentation of ICH can be proposed for supporting the working processes. Apart from the semantic modelling of data, integrating open practices is critical toward the sustainability and affordances of CH data management. Research institutions and the GLAM sector are extending their ongoing digital transformation toward data-driven, open use of their content. In the previous decade cultural organisations have been working in long-term digitisation projects grounded in high-end imaging techniques, preservation and documentation of their collections. These are substantial precedents for forwarding the use of heritage assets by digital means. However, the promotion of post-custodial open access publishing models, the enhancement of digital literacy among research cultures and progress within digital humanities, have resulted in an imperative for more interdisciplinary and elaborative interactions between users and GLAM institutions. Furthering access to their collections in ways that allow their computational and creative reuse is becoming a growing need for the emerging data-driven scholarship and for the development of more versatile projects (Warwick, 2017).

An indicative overview of the current open cultural data publishing landscape is provided in the “Survey of GLAM open access policy and practice”2 (McCarthy & Wallace, 2018). A data analysis and visualisation3 of selected survey facets revealed that museums are prevalent over all other memory institutions in providing open data (205 museums), whereas libraries come next (148 libraries). The countries with the most participating institutions by now are Germany, U.S.A. and Sweden. However, certain limitations are acknowledged; the survey is an active running project that does not represent a full-fledged scholarly mapping of the field. In addition, listed links to open data platforms have no indication to data quality (i.a. level of digital curation, indication of linked data), whereas policy statements and terms of use are often characterized as “caveat emptor” due to their generic formulation and data. The survey has been uploaded to the open platform Copyright Cortex4 (June 2019), which functions as a resource for issues related to digital cultural heritage and copyright, with expert commentary and information targeted for use by memory institutions.

The focus toward openness standards brings forth a new inventory of participatory-based practices, stressing in particular the concepts of decentralized curation, radical user orientation and enhanced contextualisation of archival processes (Huvila, 2008). In this light, decentralised curatorial practices are linked to broader networks of specialists and pro-ams who collectively contribute to responsibilities. Radical user orientation takes into account user involvement and usability of the archive as a priority, whereas the conventional approach assesses preservation as first issue.

However, platform insularity may remain a major challenge, resulting to data silos that fail to pull together corpora in coalescing structures. For example, GLAM collections represent in most cases only partly a particular subject or artist, failing to aggregate the entire corpus in one place. An initiative that tried to deal with this issue was the Open Culture Data API,5 proposing a digital infrastructure in order to pull together cultural resources from across many institutions, nevertheless limited only to Dutch organisations. Linked Open Data can overcome data silos in a greater extent, further reinforcing the ethics of open data sharing by taking into account the FAIR principles6 (Findable, Accessible, Interoperable, Reusable), which represent a set of four-pillared ground rules that aim to enhance knowledge discovery in data repositories by humans and machines respectively. A set of recommended guidelines that are discipline-specific to the GLAM sector and that FAIRify data management in this context, can be found in the PARTHENOS consortium publications (PARTHENOS, 2018).

In the case of ICH’s digital documentation, successful community engagement is valuable on account of the critical role communities play in conceptualising, remediating and utilising their content. It is thus meaningful to integrate the complex actors’ network of ICH as part of the documentation process, as it can provide the necessary domain knowledge. Moreover, solutions that are acknowledged by a community are more likely to be accepted (Aalberg et al., 2015). In this context, mindful integration of curatorial practices for digital documentation that take place “in the wild” can be critical (Dallas, 2016), encompassing curatorial practices outside of supervised professional environments and specialised infrastructures, i.a. content curation, web curation, crowdsourcing, professional-amateur (pro-am) digitisation, pro-am curators and communities, communities-based curation, user-generated content, personal digital archiving and more. Although automated processes could be used, ICH requires rather a detailed process of annotation and modelling, thus human resources are essential to be utilised toward this endeavour.