Information Extraction and Semantic Annotation
Archaeological reports contain a great deal of information that conveys facts and findings in different ways. This kind of information is highly relevant to the research and analysis of archaeological evidence but at the same time can be a hindrance for the accurate indexing of documents. Information Extraction as a Natural Language Processing can unlock and surface such information by analysing a textual input and producing a structured textual output that is suitable for further manipulation. In this process Semantic Annotation links ontological definitions to natural language text by providing class information for textual instances. Described as a mediator platform between concepts and their worded representations, semantic annotation as metadata can automate the identification of concepts and their relationships in documents. It is proposed as a mechanism for connecting natural language and formal conceptual structures to enable new information access methods and to enhance existing ones. The annotation process enriches documents and enables access on the basis of a conceptual structure. This aids information retrieval from heterogeneous data sources, empowering users to search across resources for entities and relations instead of words.
The seminar will present the semantic annotation system (OPTIMA) which performs the tasks of Named Entity Recognition, Relation Extraction, Negation Detection, and Word-Sense Disambiguation over Archaeological Excavation reports (Grey Literature). The system employs rule-based Information Extraction techniques to deliver interoperable semantic abstractions (semantic annotations) with respect to the CIDOC Conceptual Reference Model (CRM) and relevant Cultural Heritage thesauri.
Dr. Andreas Vlachidis is a Research Associate at the UCL Department of Information Studies. He currently contributes to the cultural heritage data modelling and semantic enrichment aims of the EU Horizon 2020 CROSSCULT project. He holds a PhD on Semantic Indexing of Archaeological Grey Literature, and he is a certified text analyst of the General Architecture for Text Engineering GATE, a fellow of the Higher Education Academy (FHEA) and a member of the British Computing Society (BCS). In the past, as a member of the Hypermedia Research Group (USW) he has worked with Prof. Douglas Tudhope in the AHRC funded project STAR and in the EU FP7 funded project Ariadne. He has also received a grant from the Welsh government for developing a suite of open source natural language processing modules for the Welsh Language and worked with Prof. Hamish Fyfe in the Digital R&D fund for the arts in Wales and in the Creative Wales Exchange Network, providing research and managerial support to knowledge exchange activities.