Semantic Modelling of Smart City DataStefan Bischof 1 , Athanasios Karapantelakis†2 , Cosmin-SeptimiuNechifor‡3 , Amit Sheth§4 , Alessandra Mileo¶5 and PayamBarnaghik61Siemens AG Österreich, Vienna, AustriaEricsson Research, Stockholm, Sweden3Siemens SRL, Romania4Kno.e.sis Center, Wright State University, Dayton, OH, USA5Digital Enterprise Research Institute, National University ofIreland, Galway6University of Surrey, Guildford, Surrey, United Kingdom21IntroductionRecent advancements in communication technologies for providing ubiquitousInternet access as well as advancements on reduction of cost and form-factorof mobile devices and sensors are seen as an enabler for the Internet of Things(IoT). The industry predicts an interconnected world of 50 billion devices by20201 . The Web of Things (WoT) relies on the connectivity service of IoT tocreate services and applications exploiting the IoT data [1].Cities present an opportunity for rendering WoT-enabled services. Accordingto the World Health Organization, population in cities will double by themiddle of this century2 , while cities deal with increasingly pressing issues such asenvironmental sustainability, economic growth and citizen mobility. In this paper,we propose a discussion around the need for common semantic descriptions forsmart city data to facilitate future services in “smart cities”. We present examplesof data that can be collected from cities, discuss issues around this data and putforward some preliminary thoughts for creating a semantic description model todescribe and help discover, index and query smart city data. [email protected][email protected][email protected]§ [email protected][email protected] [email protected] health/situation trends/urban population growthtext/en/1

2Description of Smart City DataTable 1: Information that could be potentially retrieved about cities.Data CategoryOwner (DataPublisher)Data DescriptionSamplingTransportTraffic AuthorityMaps of Cities (Roads, StreetNames, POIs, subway and busstations, etc.) 3StaticMunicipalityPublic Transport SchedulesTraffic AuthorityTransport Authority Updates(Roadwork, traffic status, etc.)4SemiDynamicDynamic56Air QualityEnv. AgencyParticle concentrationTrafficTraffic AuthorityNumber of vehicles passingbetween two points, speed 7DynamicCity EventsCultural GroupsEntertainment nicipalityLibrary DataCitizen dataHealth data8DynamicDynamicWaste Collection Data9DynamicPrivate CompanyParking Meters10DynamicPrivateIndividualsSocial Media Information: Tweets,Status updates and blog posts,popular places (“check-ins")SemiDynamicHousehold Energy ConsumptionSemiDynamicRelevant information aboutpotential or confirmed sources ofhealth threatsDynamicPrivate andPublic3 Openstreetmap offers geographical data under the Open Data Commons Open DatabaseLicense (ODbL): The Danish authority “Rejseplanen” offers an API for timetables of public transport such asbuses and trains: documentation latest.pdf5 The Swedish transport authority Trafikverket issues updates on the status of the transportnetwork as RSS feeds: RSS-floden/6 The city of Brasov in Romania offers precipitation measurements as open data: The city of Aarhus in Denmark offers traffic data measurements as open data: Books on loan measurement from the city of Aarhus: ommunes-biblioteker9 Waste collection data from households, provided from the city of Aarhus ingen-af-affald10 Parking meter data from the city of Aarhus

Table 1 shows an example of the type of data that can be collected from cities.This data is currently collected from cities and in companies participating in theEU FP7 Citypulse project11 . The sampling column relates to the periodicityof the incoming data. Static indicates that the data is never updated and thedataset is used as reference (any updates are manual). Semi-dynamic samplingmeans that the data is updated periodically, whereas dynamic means continuousupdating.3Challenges and Issues of Smart City DataSmart city data sources offer a variety of data to be processed. The observation and measurement data is usually aggregated and filtered after collection.The data is then transferred often over several systems and transformed todata representation useful for interoperable publication. These published datasources are then made discoverable and become access-able via query and/orpublish/subscribe facilities. Over these access interfaces the data is eventuallyintegrated into higher-level services and applications [2]. The heterogeneitycreated on the different levels of processing this data gives rise to challenges inseveral dimensions.Sensory devices devices measure different types of observations such as light,temperature, or sound. The different sensors and devices will provide data ofdifferent and even changing quality. The data is often continuous and over timedata quality, data validity and device availability can change, thus resulting inhighly dynamic data streams. Applications may select data sources by dataquality or device reliability but also ignore data sources which are untrustworthy.Since sensory devices will also record sensitive or private data the issuesof privacy and security need to be considered and addressed over the wholeprocessing pipeline. Encryption of sensitive data is the accepted best practice toprotect data from unauthorized access during transmission and storage. Anothermean to protect privacy of individuals is data aggregation.Even devices of the same type will deliver data in heterogeneous formats ordifferent units of measurements. The heterogeneity issues can partly be addressedby meta-data and semantics. Semantic annotations can also help mapping databetween differing schema models on a higher-level. When semantically annotatingdata streams one has to carefully weigh expressibility vs. complexity as well asthe sheer volume of the generated data which has to be processed. In typicalscenarios the meta-data is larger than the actual measurement data. The datavolume makes early aggregation and filtering necessary. Additionally the wholedata processing pipeline has to be designed with scalability in mind.Users and services will often want to get data from some specific (spatial)area and a certain period of time. In a large-scale distributed environment withhighly dynamic resources such as sensors delivering a large amount of data, theusual steps of discovering, indexing, and efficiently querying data are complex11

tasks. Semantic Web technologies can help to some extent and are currentlyextended in this direction by the CityPulse project.Eventually the data provided by semantically annotated streams, which isnow integrated, aggregated, filtered, and combined via querying, usually need tobe interpreted, combined with other data sources and analyzed. In this step theusual data integration issues arise in another incarnation: we have to integratedata with meta-data and also with different types of data from other sourcessuch as static databases, Semantic Web knowledge bases or social web APIs.Again a semantic model can help to create an interoperable representation ofdata is provided by various heterogeneous resources.4Semantic InteroperabilityAs described in the previous section, smart city data are heterogeneous in nature(delivery format, point of origin, periodicity, etc.) and have different privacy,security and quality requirements. To realise the potential of a smart city,multiple of these data sources have to be combined. To address the issues ofinteroperability at sensor level, a W3C incubator group developed the SemanticSensor Network (SSN) ontology [3].SSN describes not only sensor device capabilities, but also organises thesensors into systems and describes processes that model sensor operations andcan work across multiple domains. The goal is to correlate measurement datawith capabilities of sensors (and sensor systems), however the descriptions aboutobservation and measurement data are generic and cannot be used to annotatethe data with domain knowledge - specific to applications. Therefore, SSN byitself cannot be used to describe smart city services (scenarios) in detail, as eachservice has its own quality requirements, relies on its own set of sensors, hasdifferent demands on data ownership (security, privacy concerns) etc.Previous research has suggested building a linked-data approach for streamannotation [4]. According to this approach, external domain knowledge aboutthe data can be provided on request - and can be specific per service rendered(e.g. quality description, sensor capabilities, etc.). The model proposed in [4]describes some basic, common attributes on the data stream but delegates detailsabout the specific streams to other models (linked-data models).5Suggestions for DiscussionWe are currently building a linked-data model for semantic annotation of datastreams in smart city environments. The EU FP7 CityPulse project is workingon linked-data descriptions for smart city data . The project also provides aset of smart city data access and processing scenarios. This can help to identifya set of common properties among smart city data (see table 1) that can beused for semantic modelling and description of multi-modal data in smart cityapplications and services. Going beyond the details of the model design, thecommon properties identified can help contextualise smart city data and simplify4

the connection between the descriptions in the model and data stream operationssuch as discovery, indexing and querying from applications, services or systemsusing these data. We therefore suggest the following topics of discussion aroundthe design of a model for smart cities: Smart city data stream annotation: descriptions for data privacy, security,quality. Flexibility to support heterogeneity in observations, dynamicity ofdata streams - scalability. Data contextualization for optimised data stream discovery, indexing andquerying. To start with, we may consider categorization of smart citydata in hierarchical form, from general domain of observation (transportation, events, healthcare, municipal services, etc.) to observed physicalphenomena (traffic, theatre plays, hospitals, water level, etc.) down tounits of measurement (cars per minute, event timestamp, hospital capacity,percentage of water reserves, etc.). Outcome of the discussion will alsoinfluence the design of the model in the previous step. Refer to CityPulse(and other sources) for smart city scenarios to understand requirementsfrom a smart-city service perspective (top-down approach). Use of linked-data for enriched processing of the annotated data. Consider extending similar approaches such as SECURE system, which usesbackground data for event detection [5].Acknowledgments. This work is supported by the EU FP7 CityPulse Projectunder grant No.603095. http://www.ict-citypulse.euReferences[1] S. Gustafson and A. Seth, “The web of things,” Computing Now, vol. 7, no. 3,2014. [Online]. Available: chive/march2014[2] P. Barnaghi, A. Sheth, and C. Henson, “From data to actionable knowledge: Bigdata challenges in the web of things,” Intelligent Systems, IEEE, vol. 28, no. 6, pp.6–11, Nov 2013.[3] M. Compton, P. Barnaghi, L. Bermudez, R. Garcıa-Castro, O. Corcho, S. Cox,J. Graybeal, M. Hauswirth, C. Henson, A. Herzog, V. Huang, K. Janowicz, W. D.Kelsey, D. L. Phuoc, L. Lefort, M. Leggieri, H. Neuhaus, A. Nikolov, K. Page,A. Passant, A. Sheth, and K. Taylor, “The ssn ontology of the w3c semantic sensornetwork incubator group,” Web Semantics: Science, Services and Agents on theWorld Wide Web, vol. 17, no. 0, 2012.[4] P. Barnaghi, W. Wang, L. Dong, and C. Wang, “A linked-data model for semanticsensor streams,” in Green Computing and Communications (GreenCom), 2013IEEE and Internet of Things (iThings/CPSCom), IEEE International Conferenceon and IEEE Cyber, Physical and Social Computing, Aug 2013, pp. 468–475.[5] P. Desai, C. Henson, P. Anantharam, and A. Sheth, “Secure: Semantics empowered rescue environment (demonstration paper),” 4th International Workshop onSemantic Sensor Networks 2011 (SSN 2011), pp. 115–118, 2011.5

Semantic Modelling of Smart City Data Stefan Bischof 1, Athanasios Karapantelakis†2, Cosmin-Septimiu Nechifor‡3, Amit Sheth§4, Alessandra Mileo¶5 and Payam Barnaghik6 1Siemens AG Österreich, Vienna, Austria 2Ericsson Research, Stockholm, Sweden 3Siemens SRL, Romania 4Kno.e.sis Center, Wright State U