Our Sponsors

Hosting Sponsors

 

Linking Sponsors

 

Supporting Sponsors

 

 

 

 

Media Sponsors

Challenge Entry: ReLOAD – Repository for Linked Open Archival Data

Posted by
|

Voting closed 15 Dec. 2012. 227 Liked

Title: ReLOAD – Repository for Linked Open Archival Data

Team: Reload

Short Description:
ReLOAD (Repository for Linked Open Archival Data) project aims to experiment Semantic Web’s standards and LOD technologies in archival domain in order to share archival resources in the web of data.
ReLOAD is sponsored by Archivio Centrale dello Stato, Istituto Beni Culturali Regione Emilia Romagna, and Regesta.exe

Long Description:
ReLOAD Project: an overview
ReLOAD (Repository for Linked Open Archival Data) is a project of Archivio Centrale dello Stato (ACS), Istituto per i beni artistici culturali e naturali della Regione Emilia-Romagna (IBC) and Regesta.exe.
– ACS (http://www.acs.beniculturali.it/) is a body of the Ministry of Cultural Heritage and activities endowed with special autonomy, and it is the archival depository Institute of the Unified State’s documentary heritage. The institutional tasks of ACS are: to preserve and make accessible to researchers the archives deriving from central bodies and offices of Italian State, since its unification (1861); to supervise the archives developed by State’s central bodies; to deliver educational workshops about archival research and records management.
– IBC (http://ibc.regione.emilia-romagna.it) was founded in 1974 and it’s the scientific and technical instrument for the Emilia-Romagna regional planning in the field of artistic, cultural and environmental heritage. IBC develops the IT facilities that convey archives, libraries and museums data to institutions and the general public, promotes and coordinates the census and the description of archival, book and museum material, grants the readability of specific DBs on the web and at present IBC’s working on the standards for interoperability through the use of semantic web technologies. All activities are carried out co-operating with national and international public bodies/organisations when necessary and there is a well established and long-running experience with the management of infield European projects, funded and co-funded by the EU. The Soprintendenza regionale per i beni librari e documentari has been part of IBC since 1983, with the specific task of co-ordinating the regional policy addressed to libraries and archives.
– Since 1996, Regesta.exe (http://www.regesta.com) provides to nay type of cultural institution with services and tools, allowing to create, manage, retrieve and access online documentary, iconographic and audiovisual resources. Regesta.exe has a great experience in the development of multimedia, multi-user and multi-archive systems. In fact, technological (i.e. Java standards as platform development, and XML as encoding and storage data) and methodological (i.e. archival description standards) skills and know how has been merged and used to realise xDams (http://www.xdams.org), an open source and web-based platform for document management and delivering.

The goal of this project is to experiment through Semantic Web’s technical standards and methods relating to Linked Open Data in order to share all data between a broad range of archives and other cultural institutes. In details, this project is designed to verify the possibility to create a “”web of archival data””, by exploring Semantic Web technologies to link common resources, like places, persons and organizations, themes, etc.. This experience aims at applying Semantic Web technology to create Linked Open Data of archival descriptions and of entities related to them. This project takes into consideration:
• the need to describe the resources in a format that can be shared and approved by the international scientific community;
• the choice of above standards allows to process, integrate and deal with data according to standardized rules which are supported by large number of communities;
• the opportunity to connect with other web resources, in compliance with other standard vocabularies.

The project is divided into different phases.

The first step is related with the definition of an RDF data model for description of archival resources.

The second phase consists in the transformation of resources in LOD, and the last step deals with the semantic alignment of its archival resources datasets together with other national and international ones.

The first experimental phase makes use of W3C Semantic Web standards, mash-up techniques, and specific softwares to link and define semantics of data in selected databases. Since 2011 IBC and Regesta.exe published an ontology for EAC-CPF (Encoded Archival Context – Corporate Bodies, Person, and Families) standard, and an other ontology to represent EAC-CPF records containing descriptions of archival creators. These two ontologies are complementary and closely related: the first one provides the basis to define the approach to devise and use the second one. The first ontology is a different formalization of the XML Schema of EAC-CPF standard, and it is useful for Italian archivists to promote and foster a better comprehension of its structure and properties. The second ontology allows to “open” descriptions of Corporate Bodies, Persons and Families as entities related to archives as creators or any other relationship, essentially because authority records – by their nature – are connection points between different resources.

In 2012 Archivio Centrale dello Stato, IBC and Regesta.exe developed an Ontology for Archival Description (OAD) using the Web Ontology Language (OWL). Taking as a basis data and description standards analysis, OAD is a synthesis of commonly used metadata elements employed in archival description. The first step was the definition of the archival description “”things””, in order to define them as classes using all the same standard ontology language. That analysis was then extended to define the necessary descriptive properties of each class. This ontology represents the classes and properties needed to expose the archival resources as linked data and it is integrated with another ontology of the standard ISAD(G), in SKOS format.

Furthermore, to provide the knowledge organization aspect of the archival resources under investigation, some subject classification schemes and lists (that are used in the databases selected for this test) were analysed: that of the Office of Agriculture, which has been in use since 1960 in the province of Piacenza; the Astengo scheme from 1897 that is often used to classify the papers of the city administration; and some other classification tool used by city administrations to describe the document content. These subject lists have also been coded in the standard Semantic Web language for thesauri and subject lists, the Simple Knowledge Organization System (SKOS). The SKOS-defined terms were then used in an automated extraction of key concepts from the available databases, and in this way it was possible to discover common topics and themes existing within the archives.
At present, ReLOAD activities are going ahead with the use of the EAC-CPF Ontology for the institutions that maintain historical archives, and its integration with other classes and properties, if necessary.

The second and third phases consist of the transformation of resources in LOD and the semantic alignment of archival resources datasets with other national and international. The partner institutions have made portions of their data available and these will be used to test methodologies to expose the resources as Linked Open Data.
XSLT files has been created to convert XML into RDF the archival data mentioned above, while SILK framework (http://lod2.eu/Project/Silk.htm) has been used in order to set explicit RDF links between data items within different data sources. Finally, STANBOL (http://stanbol.apache.org/) components has been applied to include semantic annotation, and LODLIVE (http://en.lodlive.it/.) is the tool used for connecting RDF resources based solely on SPARQL endpoints, and allowing the user to pass from one endpoint to another by making use of LOD interconnection capacities.

FAQ SECTION
1. What is ReLOAD?
ReLOAD (Repository for Linked Open Archival Data) is an experimental project sponsored by Archivio Centrale dello Stato, Istituto Beni Culturali Regione Emilia Romagna, and Regesta.exe aiming at testing semantic web and standard technologies for Linked Open Data on archival assets.

2. Which are its purposes?
The project is designed
– to verify the possibility to create a “”web of archival data”” by exploring in details how much the use of Semantic Web technologies would facilitate the integration of diverse archival collections in a single web of data, and to link common resources (places, persons, organizations, themes, etc.) as well
– to set up procedures and technologies transforming the archival description information and providing it with explicit links (URI’s), as Link Open Data
Such an approach aims at sharing detailed archival descriptions; in this phase there is no plan to create an access “portal” to archival resources.

3. Who are its partners?
This project is supported by the following organizations: Archivio Centrale dello Stato, Istituto Beni Culturali Regione Emilia Romagna, and Regesta.exe.
Archivio Centrale dello Stato ACS (State Central Archives, http://www.acs.beniculturali.it/) – a body of the Ministry of Cultural Heritage and activities endowed with special autonomy – is the archival depository Institute of the Unified State’s documentary heritage. The institutional tasks of ACS are:
– to preserve and make accessible to researchers the archives deriving from central bodies and offices of Italian State, since its unification (1861);
– to supervise the archives developed by State’s central bodies;
– to deliver educational workshops about archival research and records management.
Istituto Beni Culturali Regione Emilia Romagna IBC (http://ibc.regione.emilia-romagna.it) was founded in 1974 and it’s the scientific and technical instrument for the Emilia-Romagna regional planning in the field of artistic, cultural and environmental heritage. IBC develops the IT facilities that convey archives, libraries and museums data to institutions and the general public, promotes and coordinates the census and the description of archival, book and museum material, grants the readability of specific DBs on the web and at present IBC’s working on the standards for interoperability through the use of semantic web technologies. All activities are carried out co-operating with national and international public bodies/organisations when necessary and there is a well established and long-running experience with the management of infield European projects, funded and co-funded by the EU. For any futher information please visit http://ibc.regione.emilia-romagna.it/en/the-institute/about-us.
Since 1996, regesta.exe (http://www.regesta.com) provides to any type of cultural institution with services and tools, allowing to create, manage, retrieve and access online documentary, iconographic and audiovisual resources. regesta.exe has a great experience in the development of multimedia, multi-user and multi-archive systems. In fact, technological (i.e. Java standards as platform development, and XML as encoding and storage data) and methodological (i.e. archival description standards) skills and know how has been merged and used to realise xDams (http://www.xdams.org), an open source and web-based platform for document management and delivering.

4. Which archival data are available in ReLOAD? From which archives are they coming from?
In ReLOAD are available the following archival data:
– the finding aid of General Direction of Agricolture’s archives (Ministry of Agricolture, Industry and Commerce) (1861to 1916), provided by Archivio Centrale dello Stato (ACS);
– the authority records of IBC’s previously listed archives’ creators
– the authority records of the institutions collecting IBC archives previously listed.
The procedure is being tested also on the finding aids of Historical Archives of Alfonsine municipality, Piacenza Province, Mr. Andrea Costa, and Mr. Giovanni Codronchi Jr., provided by Istituto Beni Culturali Regione Emilia Romagna (IBC)
All of these resources are stored in XML repositories; the archives use EAD standard to encode their information; the authority records of creators use EAC-CPF elements, while the information about the institutions with archival holdings are encoded with the EAG standard.

5. Under which licence is ReLOAD publishing its data?
All data are published under CC-by licence.

6. Is ReLOAD an archival portal?
Reload hopes to become the central point for storage and access to distributed archival resources, using LOD as its technology. This initial phase of the project will emphasize the development of a shared space for archival description metadata, and is not addressed to the creation of a “”portal”” for the access to archival materials.

7. Is ReLOAD a SPARQL endpoint of archival data in a LOD format?
ReLOAD in not an endpoint but provides a SPARQL endpoint to query archival resources.

8. Which technology and tools have been used?
ReLOAD was designed to experiment semantic technologies on archival assets through W3C standards and other tools (all of them open source) aiming at mash-up, alignment and automatic recognition of entities.
The first step of this project was to define an ontology for the description of archival data (OAD, “”Ontologia per la Descrizione Archivistica””) using Web Ontology Language (OWL). This ontology represents the classes and properties needed to expose the archival resources as linked data. Starting from EAD and ISAD(G) standards for archival description, OAD refers to the following RDF Vocabularies: SKOS, FOAF, DC, BIO, VIAF, GN, and the EAC-CPF Ontology.
RDF is the language to describe structured information; OWL to define ontologies; XSLT to convert data in RDF; triplestore and SPARQL endpoint to store and query RDF resources; SILK Framework to align different dataset; STANBOL for semantic annotation and LODLIVE to display graphs.

Comments

  1. Gimme

    December 16, 2012

    More likes than views? Now that’s strange ;)

  2. Silvia

    December 17, 2012

    We hope that votes are not only for the video, but for the project!
    Maybe people vote the project and not the video ;-)

Add a comment

You must be logged in to comment.

  1. Announcing Heat 1 LODLAM Challenge Finalists12-21-12

Photo Credits

Montreal skyline photo CC BY from Flickr Manu_H
BAnQ elevator/stairs CC BY-NC-SA from Flickr 917Press