Bonn, 22-27 October 2011.
Report by Mike Jackson, The Software Sustainability Institute, EPCC, The University of Edinburgh
I attended the Terra Cognita workshop and the International Semantic Web Conference 2011 (ISWC2011) at the request of Arif Shaon of the GeoTOD-II project team. ISWC2011 is the major international Semantic Web and linked data conference, dealing with research into Semantic Web and linked data access, integration, inferencing and reasoning. ISWC2011 brings together delegates at all levels from academia, industry, standards bodies and the public sector. The Terra Cognita – Foundations, Technologies and Applications of the Geospatial Web workshop was a full day event which brought together 50 researchers in geolocation and linked data. It was motivated both by the proliferation of geospatial data available on the web (via various technologies from dedicated map servers to social networks, and provided by everyone from professionals to the general public) and by the increasing number of linked open data resources with geospatial properties.
Funded via OMII-UK, GeoTOD-II ran from July to December 2010 and developed an open-source linked data framework for publishing environmental data under the UK Government’s Location Strategy. The GeoTOD-II team had submitted a paper to the ISWC2011 Terra Cognita workshop but the team members all had prior commitments. As the institute had provided consultancy on the use of the OGSA-DAI distributed data management framework within their framework, Arif asked if I could present their paper. This would be of benefit to both GeoTOD-II, participating in a cross-disciplinary workshop, and the institute, promoting a project that had benefited from the institute’s consultancy. So, never one to turn down a trip to Germany, I escaped wintry Edinburgh for bonny Bonn’s autumnal climes and, fuelled with coffee, pretzels and cake, attended the workshop and conference.
Terra Cognita at ISWC2011 - Highlights
Interest in GeoTOD-II focused on the not only technical aspects of the GeoTOD-II framework, but also on the concepts underlying it (especially the UK Cabinet Office recommendations on URIs for location) and legal aspects around making data open in the UK. I forwarded a Q/A transcript to Arif who was happy with my answers.
Sven Tschirner gave a presentation the EU INSPIRE spatial information directive. The UK Cabinet Office recommendations on URIs for location, implemented by GeoTOD-II, aim to conform to INSPIRE. Sven was interested in GeoTOD-II’s tool for converting INSPIRE UML to RDFS. I introduced, via e-mail, Sven to Arif, and forwarded him the link to the online tool.
It would be useful, for both GeoTOD-II and the Software Sustainability Institute, to do a blog article on GeoTOD-II’s papers at this workshop and their best papers strand presentation at AHM 2011, and to mention the use of the GeoTOD-II framework in STFC’s ACRID project.
Terra Cognita at ISWC2011 - Workshop report
Work on linked data and geolocation falls into a number of areas:
Exposing existing geospatial data sets and ontologies as linked data.
Modelling textual geospatial information e.g. the concept close to.
Aligning schemas from distributed data sets that represent geolocation information in different ways.
Analysing existing resources (e.g. travel blogs or Wikipedia pages), extracting location information and then linking these resources to related resources (e.g. Flickr images) or augmenting these resources (e.g. colour coding a map according to visitor opinions or showing locations, events or places from movies, books and music, that are near to a specified location).
Work on linked data and geolocation is very much application-centric. The community makes extensive use of established online geospatial and linked data resources e.g. Dbpedia, Geonames and Yahoo! Placemaker, and social media e.g. Twitter, Flickr, and blogs.
Social media e.g. Twitter, Flickr, and blogs provide a wealth of information of use in geospatial applications. This includes extraction and exploitation of data about opinions (“sentiment”).
Disaster management is an area which demonstrates the power of combining social media, news resources, online reporting and geospatial information to deliver applications that integrate data from myriad sources in a manageable way.
Technical questions on GeoTOD-II included:
The overhead of converting data from legacy formats to linked data on a per-request basis and when it might become preferable to use snapshot of the data pre-converted and dumped into a triple store. Whether the overhead is acceptable is very much user- and application-specific.
Whether the framework could expose data about collections of points rather than points, whether it used triple stores and SPARQL processors optimised for geospatial data e.g. geospatial indices and geospatial extensions to SPARQL, and whether PostGIS geospatial extensions were exploited when querying legacy relational data. At present, GeoTOD-II does not augment the data it exposes with any geospatial-specific features. One could imagine extending it or layering tools on top of it to provide more advanced geospatial-specific functionality.
A question arose as to whether there is a legal framework in the UK for making data open. At present, there are recommendations but no legal requirement. Likewise, public bodies are encouraged to, but not required to, expose data as linked data conforming to guidelines published by the UK Cabinet Office.
A conceptual question arose from GeoTOD-II’s implementation of the UK Cabinet Office recommendations on URIs for location. It was felt that the notion of spatial things (e.g. “river Thames”) with associated collections of spatial objects (e.g. “Ordnance Survey’s representation of the river Thames”), and the implied management of the relationship of the latter with the former by “some agency” went against the spirit of linked data. This could impose a barrier to publishing (as either spatial object publishers need to notify spatial thing publishers, or spatial thing publishers need to be aware of spatial object publishers). I didn’t know enough of the details but agree in principle that this could pose a such a barrier.
A delegate from Helsinki, working on science museum data, liked the notion (from the UK Location Strategy) of taking a location (e.g. a road) and, from this, accessing information about that location. He imagined exploring a landscape and discovering the history of features in that landscape.
For future workshops, the attendees agreed that it would be useful to see more work relating to efficiency, performance, reasoning and visualisation.
This workshop would have been a good target for Kings College, London’s PELAGIOS and SPQR projects with which I was involved. These focused on converting ancient world data to linked data with a particular emphasis on location. Both SPQR and PELAGIOS made extensive use of Geonames, a popular online gazetteer cited many times during this workshop. The joy of hindsight.
The workshop proceedings, including the GeoTOD-II paper, are temporarily available. They will be published at CEUR Workshop proceedings.
ISWC2011 - Highlights
It can be useful for researchers producing software have a ready answer to the question “is your software available and open source?” Licencing should be considered by projects at the outset - see our guide.
Similarly, it is necessary for publishers of linked data to consider the sustainability and longevity of any fresh URIs they mint, having a ready answer to the question “how sustainable is your domain name?”
The authors of FedX cited SPARQL-DQP (under development by Universidad Politécnica de Madrid as an extension to OGSA-DAI) as a “competitive system” to their own but could not obtain it for a performance comparison. I forwarded the details to Carlos Buil Aranda, SPARQL-DQP developer, who commented that he was aware of FedX, a SPARQL-DQP release is planned for March 2012 and that a full performance comparison will be done (once concern is that FedX, unlike SPARQL-DQP, is not scalable to large data sets).
An “outrageous ideas” strand allows for the presentation of more “out there” ideas, to be considered without the standard empirical justifications. This might be something the institute might want to suggest in conferences it plays a part in organising?
Alex Pentland’s keynote described “The New Deal on Data” (http://citysense.com/press/wef_globalit.pdf) which promotes personal data as liquid assets, under the full control of individuals, shared via distributed, trusted personal data stores and licenced for use, by the individuals themselves, to companies. An interesting example of technology being used to realise a philosophy, driven by the non-profit organisation, ID3 (http://idcubed.org).
ISWC2011 - Conference report
ISWC has a “10 year award” for the most influential/cited paper from 10 years ago. This year the winner was “DAML-S: Semantic Markup for Web Services”.
As for the TerraCognita workshop, there was an application-centric focus to the community (though maybe that was the sessions I attended). However, Frank van Harmelen’s keynote did comment that their community was focused on engineering and building and posed the question “has any science been done?”, or, as a cartoon put it “very impressive, but does it work in theory?” van Harmelen’s conclusion was “yes” citing examples including: factual knowledge is a graph; terminological knowledge is a low-complexity hierarchy; the latter is significantly smaller than the former; heterogeneity is unavoidable but solvable primarily socially, culturally, and economically; publication should be distributed, computation should be centralised; the web is a good data publication platform but distribution makes it difficult to consume; parallelisation doesn’t always work but it does for the Semantic Web.
The winner of the “outrageous ideas” strand was Christophe Guéret, Stefan Schlobach, Victor De Boer, Anna Bon and Hans Akkermans with “Is data sharing the privilege of a few? Bringing Linked Data to those without the Web”. This was motivated by the fact that 4.5 billion people have no, or cannot afford, web access (which we often don’t consider). They propose a move towards decentralised data served by multiple peers in “micro-grids”, developing applications that can run on the One Laptop Per Child XO-1 laptop and supporting vocal interfaces to address the fact that not everyone can read or write.
Christian Seitz and René Schönfelder presented “Rule-based OWL Reasoning for specific Embedded Devices” in which a product would maintain information across its complete supply chain lifecycle, interacting with its environment (e.g. air-conditioning systems) if necessary. This has potential application from electronics products to perishable goods.
During a presentation of work by Matthias Hert, Giacomo Ghezzi, Michael Würsch and Harald Gall on “How to ‘Make a Bridge to the New Town’ using OntoAccess” the presenter was asked if it was open source and available to download. The reply was that a closed source prototype was available with a full release next year, licence to be determined. An argument for projects to consider licencing at the outset? They have developed a framework to expose legacy relational data as linked data which, unlike existing, popular, read-only solutions such as Virtuoso, D2R or R2RML, delivers a wrapper that allows updates to the legacy data sources.
Similarly, during Xing Niu, Xinruo Sun, Haofen Wang, Shu Rong, Guilin Qi and Yong Yu’s “Zhishi.me - Weaving Chinese Linking Open Data” a question arose as to whether their zhishi.me domain name was sustainable. The authors faced a similar challenge to the SPQR project I was involved with, how to use non-ASCII Unicode in URIs. They adopted IRIs – Internationalised Resource Identifier – but, like us, faced the issue that IRIs are non-HTML4 compatible.
Two challenges are open every year. An Open Track challenge for an application that must access and manipulate distributed heterogeneous substantive real world linked data, and a billion triples challenge in which competitors must do “something” with 2 billion triples. The Open Track challenge was won by Irene Celino, Daniele Dell’Aglio, Emanuele Della Valle, et al. for BOTTARI: Location based Social Media Analysis with Semantic Web. This provides geocontextual information for smart phone users about points of interest based on their location and what they are pointing their phone at. It mines tweets and blogs (“reality mining”) and uses machine learning to try and customise its information depending on what the user has tweeted or blogged about in the past. The billion triples challenge was won by Mathias Konrath, Thomas Gottron and Ansgar Scherp: for SchemEX – Web-Scale Indexed Schema Extraction of Linked Open Data. This focused on creating a Yellow Pages for the linked open data cloud which uses stream-based schema extraction while crawling data, to avoid having to pull it all in locally, and delivers a means to find data sources to answer queries that can run on a single CPU with 4GB memory.
Igor Popov, mc schraefel, Wendy Hall and Nigel Shadbolt presented “Connecting the Dots: A Multi-pivot Approach to Data Exploration”, an example of a Semantic Web browser (Visor - http://visor.psi.enakting.org/) which attempts to provide a more manageable way of navigating large volumes of data. Single pivot exploration is searching then view instances. Multi-pivot exploration involves selecting multiple collections and view connections and intermediate nodes. This was subject to a usability evaluation (10 participants of whom 2 were Semantic Web aware).
I spoke to Anja Jentzsch about Berlin’s LDIF – Linked Data Integration Framework. This is for existing RDF data sources with heterogeneous ontologies. It, therefore, is not as flexible as GeoTOD-II in this respect, though it has a vocabulary matcher which GeoTOD-II doesn’t. This would not have been of use in SPQR since we wrote relational-linked data convertors though might have been applicable to Kings College’s PELAGIOS project which had an ontology with which heterogeneous data providers had to comply.
I was surprised by the close relation to AI (knowledge and rule-based systems, inference, reasoning) but shouldn’t have been! Linked data and it’s subject-predicate-object is, after all, not dissimilar to Prolog facts, and SPARQL, for example, looks not unlike Prolog rules.
ISWC2012 will be held in Boston, ISWC2013 in Sydney and ISWC2014, Europe
The conference proceedings are available online via the conference program.