Business Scenario: Open Public Sector Data – Business Scenario Problem Description
Governments worldwide are making much of their data publicly available. They have two main reasons for this: to improve the functioning of government and society as a whole, and to stimulate economic development.
There are directives on this topic from many governments, including the European Parliament and Council [EU PSI Directive] and the US Government [US 2009]. There are also economic analyses from such august bodies as OECD [OECD] and the World Bank [World Bank].
The goals have so far been achieved only partially. The problem is to find ways of improving the process of publishing public sector data so that it can be achieved more completely.
The Currency of Democracy
The idea that “information is the currency of democracy” is often attributed to Thomas Jefferson, although there is no evidence that he actually said or wrote those words. Whoever originated them, they capture perfectly the importance of information as an underpinning of democratic society.
People use economic and social statistics to measure the performance of national governments. At a local level, they want to know about proposals that affect them; for example, for new roads or for building developments. At all levels, they are interested to see how the actual performance of politicians compares with the promises.
Publication of information about how public money is used helps ensure that it is used properly. For example, Supervizor is an online application that provides information on business transactions of public sector bodies in Slovenia. In 2012, it brought to light 68 cases of violation of a restriction of doing business between public institutions and private entities. This led to effective enforcement, and there were no new cases in 2013 [Supervizor].
Transparency of government operation leads to improved services and service delivery. It can also cut costs; for example, through reduced personal interaction with citizens, including reduced freedom of information requests, and through data sharing within government.
When people lack information about matters that affect them, they feel disenfranchised and excluded. When they have that information, they feel involved, and are more likely to express their views.
The guiding principle of democracy is that the citizens take the decisions. To do this, they must understand the important factors and underlying considerations. Public availability of good information enables a well-functioning, transparent, and democratic society, with increased citizen participation in government, and more efficient administration.
The New Oil
The second major goal of making public data openly available is to stimulate economic activity.
This goal was well expressed by Neelie Kroes [Kroes] when, as the European Commissioner for the Digital Agenda, she said that:
“Data really is the new oil. Data is a raw material for information businesses, just as oil is a raw material for fuel and plastics businesses. Data is also everywhere, it is cheap, and it can deliver huge rewards both in terms of services and financial returns.”
There are some notable examples of economic activity enabled by data. The Global Navigation Satellite System market is estimated to be worth €100 billion per year, growing at 13% [GNSS]. This is in part based on availability of geographic map data, provided in many cases by government or government-sponsored bodies. A second example is that online purchasing is routinely simplified by the use of postcode data. The economic benefit of this is hard to quantify, but is clearly very large.
Open public sector information should lead to new business opportunities and jobs in the private sector, through stimulation of existing businesses (e.g., increased tourism), and through companies making new products and services (e.g., information-based “apps”).
Information-based products and services are increasingly important in today’s digital societies. When public sector information is openly available, private sector companies can provide products and services that use it and add value, contributing to the growth of national economies.
There are, however, a number of problems that limit the realization of these goals.
Cost of Acquisition and Processing
It costs money to acquire data, just as it costs money to extract oil from the earth.
Much public sector data is obtained through people filling in forms, in the course of administrative transactions, or in response to censuses and surveys. The forms have to be designed, printed, and distributed. The input has to be checked. Often, manual input is transcribed to machine-readable format. Action may be needed to ensure that everyone who should do actually supplies the information.
Online input is increasingly possible. This is often cheaper, but there is still the cost of development of data collection applications, and of data processing and storage.
In some cases, specialized people or equipment are needed to collect the information. Geographic data for maps, for example, requires both.
Quality of Data
As with data of any kind, ensuring the quality of public sector data can be difficult. Problems described at the Samos workshop [Samos] included duplication, lack of updates, lack of completeness, and incorrect manual input.
Data is collected for administrative purposes; collection is not an end in itself. Complete records are only available in cases where some legal entity is the subject of inspections.
Measures can be taken to improve quality, but they are often expensive. There is a trade-off to be made between quality and cost.
Quantity of Data
While the quantity of published open public sector data is not large currently, there is a data explosion in prospect. For instance, considering the public administration of Trentino, Italy, it is estimated that the trend for growth will be up to hundred times per year, easily reaching terrabytes of data by 2018 [Trentino].
This expected growth, not only in the amount but also in the variety and rate of change of open public sector data, presents challenges in using traditional data processing systems and applications.
Ability to Find Data
Data is published by the responsible administrative departments, usually acting independently. Descriptions of what is published are not always produced. Where they are produced, they are not co-ordinated, so that similar data published by different administrations may be described in quite different terms. There generally are no catalogs or indexes showing the totality of what is available.
This makes it hard to establish whether the data required for a particular purpose is available, and to obtain it where it is available.
Lack of Confidence in Data Providers
Some potential data re-users fear that the data will not be truly open, and that the providers will apply discrimination in various subtle ways; for example, in the speed with which they notify different users of updates, or by endorsing some apps but not others. There are particular concerns where the data provider is also involved in downstream service provision.
Also, it is difficult to bind governments to behaving in certain ways. A government may set up commercial expectations by following a particular policy, but cannot guarantee not to change the policy, with unfortunate consequences for companies that rely on the expectations. The desire to serve the best interests of the voters often overrides the desire to treat business partners consistently and fairly.
So, for example, a policy of providing a particular kind of data free-of-charge, or at a low cost, could be changed, and this could destroy the commercial viability of products and services that use the data. This is a risk that a commercial re-user must assess before investing in product or service development.
Ability of Computers to Process Data
Much public sector data is published in formats that are designed to be interpreted by people (for example, as text, graphics, or video) rather than formats that are designed for machine processing (for example, as database tables). Even where it is provided in machine-processable form, there is a lack of metadata that would enable computer applications to interpret the data.
Different administrations may make machine-processable data available in different ways. For example, one might supply it in CSV files, while another makes the same data available in JSON through a web service API.
A particular issue is that location information is represented differently in different datasets.
As noted above, similar data published by different administrations may be described in quite different terms, and this also applies to any supplied metadata. This makes it hard for computer applications to integrate and use machine-processable data published by different bodies.
The variety of publication formats and metadata makes it hard to develop large-scale applications that can be used across administrative areas. For example, it is easy to develop an application to plan bus journeys for a particular city, but hard to develop one that can be used in any city in Europe. It also makes it hard to integrate data produced by different bodies (as, for example, in an application that planned bus journeys between any two points in Europe).
Existing Administration Practices and Systems
Open data implies a radical change to the approach to data and to working with it inside public administrations. Public sector bodies today often do not have a culture of making data available, publish data independently of each other, and have “stovepipe” systems that do not easily share data.
When publication reveals inaccuracies in the data, or inconsistencies in administrative procedures, this may be seen as reflecting badly on the people concerned, rather than as an opportunity for improvement.
When administrations publish independently, they will be likely to use different formats and to describe their data in different ways, with the consequences noted above.
Use of “stovepipe” systems makes it costly to share or integrate data, despite the complexity of modern public services which rely on interoperability. This is a general problem, not restricted to the public sector. The problem caused by the lack of the right information to the right person at the right time, preventing organizations from achieving their business objectives, is described in The Open Group Business Scenario: Interoperable Enterprise [Interoperable Enterprise].
Obtaining the desired benefits from open public sector data also implies some changes to the attitudes of citizens towards their governments and administrations. They are often unaware of public decision-making processes, of the possibilities for their involvement in these processes, and of the desirability of being involved.
Defining Commercial Products and Services
While data is a raw material for information businesses, just as oil is a raw material for fuel and plastics businesses, it is not easy for those businesses to define commercial information-based products and services.
This is reflected in a point made at the Samos SHARE-PSI meeting [Samos] that, even though valuable data is released, additional effort is required to energize external stakeholders to create something with it. For commercial companies, there is little point in making attempts to energize by explanations or requests; they will be energized only if they are paid directly to do something, or see that a profit is to be made.
While the released public sector data may have value, it is the value added by the information business that is the basis for a viable commercial product or service. Just re-publishing the data cannot be the basis for a product that costs money, because the customers can obtain the data for free from its original source. Also, because the data is equally available to other companies that can create competitive products, a company is unlikely to invest in developing a product that uses open data unless there are factors that make it hard for other companies to develop similar products.
In fact, the perceived value of the original data is often low. Few people, for example, are interested in reading the minutes of the meetings of their local authority. They might, however, be very interested in a newspaper story about a local politician breaking election promises. The story is based on the politician’s actions recorded in the minutes, but it is because of the value added by the journalist in writing the story that people read the article.
In the case of satellite navigation systems, the products are based not only on the geographical information, but also on GPS reception hardware and routing algorithms. Also, the geographical information is often obtained from a variety of sources, including commercial ones as well as the public sector. The product is not based on a simple addition of value to public sector information, but on a combination of inputs and processes. Its value is defined in simple terms related to the user (“find your way from place to place with ease”) rather than in terms of the original information.
These characteristics are shared by many of the successful information-based products and services. Defining such products and services is not an easy matter. Looking at what public sector information is available and trying to work out what products and services it could be used for may not be the best approach. It is more likely that the successful companies start by looking at market needs, and happen to notice that public sector information could help to satisfy them.
Because of these problems, the aims of making public sector data openly available have not been fully achieved. It is difficult to quantify how far functioning of government and society as a whole could be improved by more effective publication, but there is some indication of how far the expected economic benefits have failed to materialize.
The economic benefits of open government data could be huge. The McKinsey Global Institute estimates a potential of between three and five trillion dollars annually [McKinsey]. Yet the direct impact of open data on the EU economy in 2010, seven years after the directive was issued, is estimated at only about 1% of that [Capgemini], although the EU accounts for nearly a quarter of world GDP. This suggests that only a small fraction of the economic potential has been realized.
There are a number of reasons why the potential has not yet been realized, including:
- Incomplete availability of data, due to costs of acquisition and publication, or provision of data in unusable formats, and inability to find data even when it is available in a usable format
- Lack of confidence in the data, due to actual or perceived lack of quality, or lack of confidence in government as a provider
- Fragmented markets due to lack of data, metadata, and interface standards
- Difficulty of identifying unique value that a commercial company can add
These reasons explain the lack of commercial products and services currently using public sector data, in comparison with the large expected potential.