Business Scenario: Open Public Sector Data – Requirements

 

A common set of standards and best practices for publication of public sector information will contribute significantly to the goals of improving the functioning of government and society as a whole, and stimulating economic development. It will do this by making it easier for political interest groups, social enterprises, commercial companies, and citizens generally to understand their social and political environments, and to interact with governments, and by making it easier for commercial companies to define, develop, and supply products and services that use public sector information.

The following requirements are for the form that those standards and best practices should take in order to optimize the achievement of these goals.

Human Readability and Machine Processing

The standards and best practices should cover publication of information for human readers and for machine processing.

Human readership and machine processing are both important but require different publication formats, typically including unstructured and structured data formats.

Ease of Publication

The standards and best practices should facilitate the publication of information by public sector bodies in the forms in which it is available to them, rather than requiring significant transformation.

It is difficult for public sector bodies to estimate the commercial or other value of their information. The risk of deciding what publication form will best deliver that value, and the work of converting it to that form, should be left to commercial product and service providers, and other consumers.

Cost of acquisition and processing is one of the identified problems that limits achievement of the goals.

Interaction with Subjects and Subject Owners

The standards and best practices should not only cover publication, they should also address input of information, input of requests for the visibility of information (particularly for privacy), and input of feedback on the quality of published information.

Visibility of information is a concern of subjects and subject owners. Quality of information is a concern of all stakeholders, and is one of the identified problems that limits achievement of the goals. Improving quality and addressing concerns over visibility will also mitigate another of these problems: lack of confidence in the information. Finally, increased interaction will contribute directly to the quality of government and the health of democratic society.

Rights and Concerns of Stakeholders

The standards and best practices should cover the storage with the information of the rights of subjects, subject owners, creators, and other stakeholders, and of requested levels of visibility.

For systems to take account of stakeholders’ rights and concerns, they must be aware of what the rights and concerns are.

Optimization for Search Engines

Publication formats should facilitate the operation of search engines on public sector data.

Ability to find data is one of the identified problems that limits achievement of the goals. Discovery by human readers can best be achieved through search engines.

Publication with Common Metadata

Information should be published with common metadata to facilitate discovery and integration.

Common metadata means that public sector bodies use the same terms, or terms that can be translated to each other directly, to describe the same information. It may not be possible to achieve a single standard terminology, because different bodies have historically used different terms, and because different countries use different languages, but it should at least be possible to map the used terminologies to each other so that translation is possible.

The Open Group UDEF standard [UDEF] provides a framework for the development of common metadata of this kind.

Search engines can use metadata, so its inclusion will assist optimization for them.

Most discovery of data for machine processing is currently performed by people, but it is desirable that computer applications should be able to discover the data that they process. Machine-processable metadata increases the possibility of this. Improved discoverability will help to address the identified problem of it being hard to define commercial products and services.

As well as enabling computer applications to discover data, machine-processable metadata improves their ability to process data and to integrate data from different sources. Inability of computers to process data is another of the identified problems that limits achievement of the goals.

Catalogs and Indexes

Publication on the web of catalogs and indexes of public sector information can make a significant contribution to discoverability, and should be considered as a recommended best practice.

The catalogs and indexes should include metadata for the documents that they reference. This will facilitate discovery of those documents even when they themselves do not include metadata.