Business Scenario: Open Public Sector Data – Views of Environments and Processes

 

Business Environment and Process

The overall business environment and process for re-use of public sector information is shown in the figure.

Business Environment and Process

Public sector bodies collect the information in the course of their activities, primarily for their own use, and make it available for use by consumers, including other public sector bodies, commercial product and service providers, social enterprises, political interest groups, and citizens. The commercial product and service providers may use the information in their products and services, which they make available to other consumers.

Public Sector Bodies

Public sector bodies perform a range of services for their communities and the people in them. These services involve the collection of information of various kinds.

The information topics in question can include, for example: agriculture; businesses and companies; crime and justice; demographics; education; energy, resources, and utilities; environment; geospatial and mapping; government operations, spending, and business; health and social care; housing; personal finance; social and community; and transport [Deloitte].

For the purposes of this Business Scenario, public sector bodies are as defined in EU Directive 2003/98/EC on the re-use of public sector information [EU PSI Directive]. They are state, regional, or local authorities, bodies governed by public law, and associations formed by one or several such authorities or one or several such bodies governed by public law.

The boundary between public and private sector activity is not well defined in functional terms. It can vary from time to time and from place to place. For example, 50 years ago, most post and telecommunications services worldwide were performed by public sector bodies, but they are now mostly performed by companies in the private sector. Today, healthcare is largely carried out by the public sector in some countries, and by the private sector in others. Also, it can sometimes be hard to determine whether a particular body is in the public or the private sector. For example, the UK Ordnance Survey is a government agency responsible for producing maps, but it operates in many respects as a commercial organization.

The fact that information is collected in the course of public sector activities has implications for how access to it is governed. The information can be regarded as public property. Access to it can be charged for, but should be offered on fair and equal terms. A private sector company can arbitrarily make information that it collects available to some parties but not others; a public sector body should not do this.

Commercial Product and Service Providers

These are enterprises that provide products or services to make a profit.

Provision of a product or service typically requires an initial investment in its development. The enterprise concerned assesses the investment required and the expected return, and makes a risk-based decision. The expected return depends on a number of factors, including the value of the product or service to consumers, and the possible competition from similar products or services.

There are many kinds of product and service that can use public sector information. They include journalism, insurance, credit-rating, market analysis, sales lead generation, freight logistics, and journey planning.

A study of companies developing such products and services in Spain in 2012 [Infomediary] showed that the most frequently used kinds of public sector information were geographic/cartographic information (used by 51% of companies), business/financial information (47%), social demographic/statistical information (30%), and legal information (28%). Note that over 70% of the companies combined two or three types of information in their products. The companies were of a mixture of sizes. Only about a quarter of them classed re-use of public sector information as their main activity. Use of data no doubt varies between countries, but this is probably a reasonably representative pattern. For comparison, a UK study found that the most popular, and potentially most valuable, datasets include geo-spatial, environmental, transport, health, and economic data [Deloitte].

Many information-based products take the form of computer applications, and many information-based services take the form of web services.

The income that a company obtains from its products or services traditionally comes from straightforward sales, but there are a number of other commercial models that can apply to computer applications and web services. (See, for example, Free [Anderson].) The trend for these models to replace the traditional sales model is particularly marked for applications that run on mobile devices (apps). An app typically makes little money through direct sales [Louis]. The freemium model, involving free download with in-app purchases, is the most successful monetization strategy for app developers today, followed by in-app advertising [App Annie].

Social Enterprises

A social enterprise is an organization that applies commercial strategies to maximize improvements in human and environmental well-being, rather than maximizing profits for external shareholders. Its activities thus fall somewhere between public services and commercial product and service providers.

There are many such enterprises today. It is estimated, for example, that there are 70,000 of them in the UK, employing nearly a million people, and with a combined turnover of over £50 billion [UK Social Enterprises].

Such enterprises are important users of public sector information, and can be considered as contributing to the economic benefits of making that information openly available.

Political Interest Groups

Political interest groups also fall between public services and commercial product and service providers. A political interest group is an organization that applies political strategies to bring about (or in some cases to prevent) social or environmental change. Such organizations include political parties, pressure groups, and parliamentary lobbyists. They also include groups formed to address local issues; for example, to object to particular road or housing developments.

Such a group has a particular interest in public sector information that relates to its specialist topic. A group that objects to a road development, for example, may wish to research traffic levels in the area, noise levels associated with similar roads elsewhere, and the interests and voting records of the politicians who will make the decision.

Such a group may also re-publish the information (generally with added opinion), in the form of newsletters, leaflets, etc.

Citizens

Citizens are in theory responsible for taking the decisions in a democracy. In practice, many of them appear to take little interest in direct political activity. For example, the average turnout in the 2014 EU parliamentary elections was less than 45% [EU 2014]. The typical citizen does not engage in significant study of public sector data.

Most citizens do, however, follow public affairs through newspapers, broadcast media, and, increasingly, online. They rely on these channels to filter and organize their input information, and to bring to their attention anything that they might feel to be of major importance.

When they do engage in significant political activity, it is likely to be as members of political interest groups.

Technical Environment and Process

Public sector information can be published, and historically usually was published, using printed paper, but digital technology is expected to be mainly used in the foreseeable future. This Business Scenario only considers use of digital technology.

The overall technical environment and process is shown in the next figure.

Technical Environment and Process

The public sector body responsible for the information holds it as data in a data store, using provider applications to create, manage, and process it, and makes it available for re-use through a portal. The consumers, including other public sector bodies, commercial product and service providers, social enterprises, political interest groups, and citizens, access it using browsers and consumer applications. Commercial product and service providers may use such applications to implement products and services, or may supply them as products to other consumers.

Provider Applications

The provider applications can be implemented using any of the wide variety of programming languages and environments available today. They can run on systems operated by the public sector body itself, or using cloud computing. They can gather information in a number of ways, including by direct user input, by transcription of written input (possibly using optical character recognition), and by using sensors in the Internet of Things.

Data Stores

Data that is primarily intended for machine processing is most commonly held in relational databases that can be queried using SQL. Such data is often referred to as structured data.

Another kind of data store that is sometimes used is the triple store, which can support semantic processing based on W3C’s Resource Description Framework (RDF) [RDF] and Web Ontology Language (OWL) [OWL], and can be queried using the RDF Query Language [SPARQL].

Data that is primarily intended to be used by people can be held in the fields of relational databases or triple stores, but is most commonly held in flat files, using a variety of text, graphic, audio, and video formats. Such data is classed as unstructured data.

Portals

A portal is a computer system through which the public sector body makes data available to consumers. The web is the normal way of making such data available today, and its use is assumed in this Business Scenario.

A portal will generally act as a server of HTTP [HTTP]. This is the generic data access protocol of the web. Other data access protocols such as FTP [FTP] are sometimes used instead.

A portal may support a web service API that provides for specific data items to be retrieved in response to specific requests, in a defined way.

Documents containing unstructured data that are published on the web in HTML can be linked together and can have added metadata that facilitates its categorization and discovery. Search engines can use this metadata, but are also able to discover and index unstructured data in other formats, such as Portable Document Format (PDF), by analyzing its content.

The term linked data refers to a set of best practices for publishing and interlinking structured data on the web [HEATH]. These practices facilitate the processing of such data by computers, and provide for the addition of metadata so that the data can more easily be discovered and integrated.

A portal may use syndication protocols to advise consumers of updates. ATOM [ATOM] and RSS [RSS] are the principal standard web syndication protocols.

CKAN is an open source data portal platform used by many public-service data publishers [CKAN]. It supports publishing, sharing, finding, and using data.

Browsers

A web browser is a program that enables its users to read documents on the web. It is designed particularly to read documents in HTML, but can typically cope with most formats in which unstructured data is stored on the web by invoking other programs that are designed specifically for those formats. It can also cope with some structured data, particularly spreadsheets in CSV format, in this way.

Web browsers are universally available, and are used by consumers of all kinds, including citizens who have no other data reading or processing capabilities.

Consumer Applications

These are applications that consumers use to retrieve and process data that is available from portals.

They are most frequently used to process structured data. Through sophisticated analysis techniques, they are sometimes able to interpret and process unstructured data.

Like provider applications, they can be implemented using any of the wide variety of programming languages and environments available today. They can run on systems operated by the consumers, or using cloud computing.

They can incorporate internal data stores. For structured data, these typically use relational database technology.

They can use “big data” technology to process datasets that are large or that change frequently.

In some cases they, or components of them, run on mobile devices.