SOA Reference Architecture – Information Layer

 

Overview

Context and Typical Flow

The Information Layer is responsible for manifesting a unified representation of the information aspect of an organization as provided by its IT services, applications, and systems enabling business needs and processes and aligned with the business vocabulary – glossary and terms. Associated with the primary objective of this layer are a number of capabilities. This layer includes information architecture, business analytics and intelligence, metadata considerations, and ensures the inclusion of key considerations pertaining to information architectures that can also be used as the basis for the creation of business analytics and business intelligence through data marts and data warehouses. This includes metadata content that is stored in this layer. It also supports the ability for an information services capability, enabling a virtualized information data layer capability. This enables the SOA to support data consistency, and consistency in data quality.

In particular, this layer can be thought of as supporting multiple categories of capabilities of the SOA RA:

  • Ability to support information services capability, critical to support a shared, common and consistent expression of data
  • Ability to integrate information across the enterprise in order to enable information services capability
  • Ability to define metadata that is used across the SOA RA and in particular the metadata that is shared across the layers
  • Ability to secure and protect information
  • Ability to support business activity monitoring and critical to the usage of the SOA RA and its realization

In particular, an information virtualization and information service capability typically involves the ability to retrieve data from different sources, transform it into a common format, and expose it to consumers using different protocols and formats.

Capabilities

There are multiple set of categories of capabilities that the Information Layer needs to support in the SOA RA. These categories are:

  • Information Services: This category of capabilities addresses the support of information services. Information services provide a uniform way of representing, accessing, maintaining, managing, analyzing, and integrating data and content across heterogeneous information sources. There are primarily two approaches to achieving that. First approach focuses on building a single view of business-critical data for customers, products, location, and others delivered in context; i.e., single view of enterprise (MDM) approach. The second approach focuses on integrating the appropriate information in a timely and consistent manner, analyzing and attempting to improve the quality of data, and ensuring consistency and integrity of business-critical data and facts across the enterprise. This approach is known as the Information as a Service (IaaS) approach.
  • Information Integration: This category of capabilities addresses the support of information integration and enables capabilities for information services.
  • Basic Information Management: This category of capabilities addresses basic information management concerns such as metadata and unstructured data management.
  • Information Security and Protection: This category of capabilities addresses the support of information security and protection concerns.
  • Business Analytics: This category of capabilities addresses the support of business analytics and business activity monitoring. It enables organizations to leverage information to better understand and optimize business performance. It supports entry points of reporting to deep analytics and visualization, planning, aligned strategic metrics, role-based visibility, search-based access and dynamic drill-through, and alert and detect in-time actions.
  • Information Definition and Modeling: This category of capabilities defines fundamental constructs of SOA information and events.
  • Information Repository: This category of capabilities addresses support of the information repository in order to persist data such as metadata, master data, analytical data, operational data, and unstructured data.

This layer features the following capabilities:

Information Services

  1. Ability to expose data as services, to add/remove/manipulate data entries in different services or service components, and to disable some data from outside access
  2. Ability to interface with the Integration Layer in multiple ways such as message-based, service call, batch interface
  3. Ability to handle representing data from various data sources in a unified data format; ability to transform and map data from one format to another and align data from different resources
  4. Ability to manage the lifecycle of business entities
  5. Ability to manage the hierarchy and relationship among data
  6. Ability to validate records against defined business rules
  7. Ability to validate and enforce data quality rules
  8. Ability to notify and trigger actions based upon events detected within the data

Information Integration

  1. Ability to perform Extract-Transform-Load (ETL capabilities) data from one source to other; ability to extract relevant information from sources, transform the information into the appropriate integrated form, and load the information into the target repository
  2. Ability to perform Enterprise Information Integration (EII) capabilities, such as access to federated query to structure and unstructured data
  3. Ability to virtualize data representing actual data from the actual data repositories of various types, such as a DB2 database in the Operational Systems Layer, or an Excel file
  4. Ability to handle data transformation (including transformation of data types and contents) and to aggregate data from multiple data sources
  5. Ability to perform data standardization and perform data reconciliation including semantic reconciliation
  6. Ability to cleanse and match inbound records to existing data
  7. Ability to cache data in support of the data virtualization/information services capability

Basic Information Management

  1. Ability to manage and maintain metadata in a common metadata repository for the enterprise
  2. Ability to capture, aggregate, and manage unstructured content in a variety of formats such as images, text documents, web pages, spreadsheets, presentations, graphics, email, video, and other multimedia
  3. Ability to author, configure, manage, customize, and extend metadata

Information Security and Protection

  1. Ability to handle access privileges of various participants to data
  2. Ability to control access on individual data items
  3. Ability to monitor and manage data usages using a log-like facility; typical traceability log includes: who has accessed the data, when, and what part of the data has been accessed

Business Analytics

  1. Ability to analyze data access history and provide optimization algorithms and business intelligence for data optimization
  2. Ability to query and search capabilities for enterprise information
  3. Ability to visualize interactively the results from business analytics and data analysis
  4. Ability to interface with the Integration Layer and obtain events from the Integration Layer; ability to analyze this event information, both in real-time/near real-time, as well as stored (warehoused) events
  5. Ability to review and assess inbound service activity in the form of event information and determine responses or issue alerts/notifications

Information Definition and Modeling

  1. Ability to define business vocabulary – glossary, terms, business entities
  2. Ability to define a common information model as leveraged by IT such as entity relationships, logical data model for information repositories, and message model for service definition and specification
  3. Ability to define business events

Information Repository

  1. Ability to store operational and reshaped information (structured and unstructured) that adds business value including common model of data; used to share canonical forms (common data models) between SOA Integration Layer elements and also other SOA layer elements – typically invoked by other components (including information virtualization capabilities)
  2. Ability to store instance and definition of master data and historical data that records changes to master data
  3. Ability to store analytical data

Architecture Building Blocks (ABBs)

The ABBs responsible for providing these sets of capabilities in the Information Layer are:

Capability Category

ABB Name

Supported Capabilities

Information Service

Information Services Gateway

1, 2

 

Data Aggregator

3

 

Data Validator

6

 

Information Lifecycle Manager

4

 

Hierarchy and Relationship Manager

5

 

Data Quality Manager

7

 

Quality of Service Layer: Event Manager

8

Information Integration

Information Integrity Manager

 

 

Data Cleanser

14

 

Data Rationalization Manager

13

 

Data Matcher

14

 

Data Virtualization Manager

 

 

Data Representation Manager

11

 

Data Sourcing Manager

11

 

Data Cache

15

 

Integration Layer: Data Transformer

12

 

Data Consolidator

9

 

Data Federator

10

Basic Information Management

Information Metadata Manager

16

 

Content Manager

17

 

Master Data Authoring Environment

18

Information Security and Protection

Quality of Service Layer: Access Controller

19

 

Quality of Service Layer: Data-Driven Access Controller

20

 

Traceability Enabler/Auditor

21

Business Analytics

Data Miner

22

 

Query, Search, Reporting Engine

23

 

Analytics Visualization Engine

24

 

Quality of Service Layer: Business Activity Monitor

25

 

Quality of Service Layer: Business Activity Manager

25

 

Quality of Service Layer: Activity Correlation Manager

26

Information Definition and Modeling

Business Information: Business Glossary and Terms, Business Entities

27

 

Common Information: Entities and Data, Messages

28

 

Business Events

29

Information Repository

Data Repository

30-32

Details of ABBs and Supported Capabilities

Details of ABBs

This section describes each of the ABBs in the Information Layer in terms of their responsibilities.

Information Service Gateway

This ABB can be thought as a service container enforcing and supporting exposure of services, with all the associated supporting capabilities. In particular, it has three main responsibilities:

  • To expose Information as a Service (IaaS)
  • To manipulate data entries in different services and service components
  • To control access to certain selected aspects of data; disable some data parts from outside access

This ABB acts as the gateway to the Information Layer. This ABB enables the hosting and exposure of information services by the SOA RA, forming a virtual data layer. It thus supports interfacing between the Information Layer and consumers of information services and is critical to expose Information as a Service (IaaS). It provides a consistent entry point to the Information Layer through multiple mechanisms such as messaging, service calls, and batch processing. This ABB leverages capabilities and ABBs from the Integration Layer.

Data Aggregator

This ABB is responsible for efficiently joining information – for example, structured and unstructured data – from multiple sources without creating data redundancy to help form a unified data view/model supported by the Data Virtualization Manager ABB.

Its responsibilities include:

  • Dispatching requests to other ABBs in the Information Layer
  • Invoking the Data Virtualization Manager ABB for handling data transformation (including transformation of data types and contents); and aggregating data from multiple data sources to provide a unified format and model to other ABBs and consumers of information services
  • Invoking the Data Validator ABB to validate against business rules
  • Invoking the Data Quality Manager ABB for enforcing data quality rules
  • Invoking the Event Manager ABB in the Quality of Service Layer for triggering event notification based on data

Data Validator

This ABB is responsible for validating records against defined business rules.

Information Lifecycle Manager

This ABB is responsible to providing lifecycle management support for data; e.g., CRUD and to apply business logic based upon the context of that data.

Hierarchy and Relationship Manager

This ABB is responsible for managing the data hierarchies, groupings, relationships such as parent-child relationships, and relationships between enterprise data. This ABB is leveraged by the Data Virtualization Manager to build the relationships.

Data Quality Manager

This ABB is responsible for validating and enforcing data quality rules, standardizing the data for both value and structure, and performing data reconciliation including semantic reconciliation. It leverages the Information Integrity ABB to fulfill its responsibilities.

Quality of Service Layer: Event Manager

See Event Manager ABB in the Quality of Service Layer.

Information Integrity Manager

This ABB is responsible for data profiling, analysis, cleansing, data standardization, and matching. Data profiling and analysis services are critical for understanding the quality of data across enterprise systems, and for defining data validation, data cleansing, matching, and standardization logic required to improve data quality and consistency.

Data Cleanser

This ABB is responsible for cleansing and applying data quality rules. It enables detection and correction of corrupted or incorrect data.

Data Rationalization Manager

This ABB is responsible for performing data rationalizing and reconciliation.

Data Matcher

This ABB is responsible for matching inbound records to existing data. It supports deterministic matching and probabilistic matching of records.

Data Virtualization Manager

This ABB is responsible for providing virtual access and unified representation of enterprise data sources.

Data Representation Manager

This ABB is responsible for handling representation of data from various data sources in a unified data format and for creation of unified views of data. In other words, this ABB intends to hide various data sources and present data in uniform formats to other ABBs for data handling. This ABB may link to various data sources and handle relationships between the data sources. This “virtualization” of the data makes consumers of information services (exposed through the Information Services Gateway) and other ABBs independent of the source and supports consistency in data.

Data Sourcing Manager

This ABB is responsible for enabling access to different data sources using different protocols. It provides unified access to data in files, databases, etc. It uses an Adapter ABB from the Integration Layer to provide the ability to integrate with data sources in different solution platforms (external data sources).

Examples may be relational sources (e.g., DB2, Oracle, or SQL Server databases), other structured data (e.g., Excel .CSV, web service request responses in XML format, and hierarchical stores on mainframes such as IMS), as well as unstructured data stores (such as images and documents). It manages interactions with the data sources in the Solution Platform and other SOA RA layers, but it is not responsible for addressing data and protocol transformation. This ABB represents the actual data repositories in various types, such as a DB2 database in the Operational Systems Layer, or an Excel file. It should be noted that this ABB in the Information Layer refers to high-level links associated with metadata to real data sources in the Operational Systems Layer. This ABB enables optimization of the data access by lazy loading or on-demand access of information. For example, instead of containing (e.g., attaching) a huge document, this ABB typically contains an on-demand link to the original document, together with some metadata describing the document (e.g., goals, purposes, and short descriptions) that help users decide whether they need to access the original document (e.g., a CEO may decide not to download a detailed design document while a project architect may decide to download and review). In addition, it should be noted this ABB typically represents industry-specific data structure; therefore, transformation may be needed for further processing.

Data Cache

This ABB is responsible for the caching of data in support of the data virtualization/information services capability. It enables addressing variations in temporal availability of data as well as improvement of performance. The variance in temporal availability of data is an issue associated with different data sources having different schedules for data being available; for example, one data source could be a time-based file feed, the other a mainframe batch program, and the third a real-time relational database. In such a scenario, for the consistent update and availability of data, it is useful to be able to cache it in some form. The data cache may use persistent data or non-persistent caching of data, which are implementation aspects.

Integration Layer: Data Transformer

See Data Transformer ABB in the Integration Layer.

Data Consolidator

This ABB is responsible for extracting relevant information from sources, transforming the information into the appropriate integrated form, and loading the information into the target repository. This ABB supports Extract-Transform-Load (ETL) from one or more source systems into a target system. It is also responsible for supporting real-time ETL capabilities with the initial or incremental ETL of volume data into a target repository (e.g., data warehouse or master data repository).

Data Federator

This ABB is responsible for providing Enterprise Information Integration (EII) capabilities for federated query access to structured and unstructured data.

Quality of Service Layer: Access Controller

See Access Controller ABB in the Quality of Service Layer.

Quality of Service Layer: Data-Driven Access Controller

See Data-Driven Access Controller ABB in the Quality of Service Layer.

Traceability Enabler/Auditor

This ABB is responsible for monitoring and managing data usage using a log-like facility. It interprets log information and stores it in databases to analyze the data and initiate threat alerts. This ABB supports the ability to know who has accessed data, when it has been accessed, and what has been accessed and also supports data privacy through the obfuscation of sensitive data.

Information Metadata Manager

This ABB is responsible for managing and maintaining metadata in a common metadata repository for the enterprise, including structured and unstructured data; for example, metadata that describes the master data taxonomies and XML schemas and rules for business logic and data validation. It stores information regarding transformation of data types and content and the ability to aggregate data from multiple sources. It is used to share canonical forms (common data models) between SOA Integration Layer elements and other layers of the SOA RA. It supports, in particular, the ability to store, retrieve, and translate metadata into forms that can be effectively consumed by repositories local to other layers in the SOA RA. It facilitates re-use for metadata assets, semantics, models, templates, rules, etc. across the enterprise. Information integration capabilities are used to support the replication of changes to metadata that is contained in systems across the enterprise.

Content Manager

This ABB is responsible for capturing, aggregating, and managing unstructured content in a variety of formats such as images, text documents, web pages, spreadsheets, presentations, graphics, email, video, and other multimedia. It provides the ability to search, catalog, secure, manage, and store unstructured content to support the creation, revision, approval, and publication of content. It provides the ability to identify new categories of content and create taxonomies for classifying enterprise content. This ABB is also responsible for managing the retention, access control and security, auditing and reporting, and ultimate disposition of business records. It provides for the policy-driven movement of content throughout the storage lifecycle and the ability to map content to the storage media type based on the overall value of the content and context of the business content.

Master Data Authoring Environment

This ABB is responsible for authoring, configuring, approving, managing, customizing and extending master data as well as the ability to add or modify instance master data, such as product, vendor, and supplier. These services support the MDM collaborative style of use and may be invoked as part of a collaborative workflow to complete the creation, updating, and approval of the information for definition or instance master data.

Data Miner

This ABB is responsible for analyzing data access history as well as providing optimization algorithms and business intelligence for data optimization. It enables building of descriptive and predictive models by uncovering previously unknown trends and patterns in vast amounts of data from across the enterprise, in order to support decision-making.

Query, Search, Reporting Engine

This ABB is responsible for supporting ad hoc queries, search, reporting, slicing/dicing/drill-downs, and Online Analytical Processing (OLAP) capabilities for enterprise information.

Analytics Visualization Engine

This ABB is responsible for providing interactive visualization of analytics results and data analysis leading to better analyses, faster decisions, and more effective presentation of analytic results. It provides charting and graphing functionality, spatial dashboard reporting such as for scorecard reporting, spatial analysis, and rendering for interaction with components that provide user presentation.

Quality of Service Layer: Business Activity Monitor

See Business Activity Monitor ABB in the Quality of Service Layer.

Quality of Service Layer: Business Activity Manager

See Business Activity Manager ABB in the Quality of Service Layer.

Quality of Service Layer: Activity Correlation Manager

See Activity Correlation Manager ABB in the Quality of Service Layer.

Business Information

This ABB represents business vocabularies – business glossary and terms and key business entities of the organization and their definition.

Common Information

This ABB represents definition of entities and their relationship, logical data definition for database design, and message model for service definition and specification. Information is one of the fundamental constructs of an SOA solution and analysis and design based on the service-oriented paradigm.

Business Event

This ABB represents definition of business events. Event is one of the fundamental constructs of an SOA solution and analysis and design based on the service-oriented paradigm.

Data Repository

This ABB provides the essential foundation for the storage of operational and reshaped information that adds business value. The core data repositories are Analytical Data, Operational Data. Master Data, Unstructure Data, Metadata.

  • Analytical Data Repository: Includes operational data stores, data warehouse, data mart, staging areas, and ad hoc workspaces.
  • Operational Data Repository: Includes transactional hub, ERP, Supply Chain, CRM, etc.
  • Master Data Repository: Stores instance and definition of master data and historical data that records changes to master data.
  • Unstructured Data Repository: Includes textual objects, graphical objects, multi-media objects, etc.
  • Quality of Service Layer: Metadata Repository: Enables storage of metadata. It stores metadata that describes the master data taxonomies and XML schemas and rules for business logic and data validation. It stores information regarding transformation of data types and content and the ability to aggregate data from multiple sources.

Structural Overview of the Layer

The ABBs in the Information Layer can be thought of as being logically partitioned into the following categories which support:

  • Ability to support information services capability, critical to support a shared, common, and consistent expression of data
  • Ability to integrate information across the enterprise in order to enable information services capability
  • Ability to define metadata and master data that is used across the SOA RA and in particular the metadata that is shared across the layers
  • Ability to secure and protect information
  • Ability to support business analytics and business activity monitoring critical to the usage of the SOA RA and its realization
  • Ability to define information and events that are some of the fundamental constructs of SOA
  • Ability to persist and store data

ABBs in the Information Layer

Inter-Relationships between the ABBs

The relationship among these ABBs is shown for different scenarios.

The first scenario is for Information as a Service (IaaS), where information is retrieved from multiple sources.

Key Interactions among ABBs in the Integration Layer in an IaaS Query Scenario

The second scenario relates to adding and updating information in the context of master data management.

Key Interactions among ABBs in the Integration Layer for an Add/Update in an MDM Scenario

The third scenario is updating MDM by extracting deltas from source systems.

Key Interactions among ABBs in the Integration Layer for a Delta Extract and Update in an MDM Scenario

Significant Intersection Points with other Layers

Certain relationships exist between the ABBs in the Information Layer with those in other cross-cutting and horizontal layers:

  • The Information Service Gateway ABB interacts with the Consumer Layer, Business Process Layer, Services Layer, Service Component Layer, Operational Systems Layer, Integration Layer, Quality of Service Layer, and Governance Layer.
  • The Traceability Enabler/Auditor ABB interacts with the Quality of Service Layer.
  • The Data Sourcing Manager ABB interacts with the Operational Systems Layer and the Governance Layer.

Interaction with Cross-Cutting Layers

  • This layer leverages the Event Manager ABB in the Quality of Service Layer for notifying and triggering actions based upon events detected within the data. Events can be defined to support data governance policies, based upon business rules, or can be time and date scheduled.
  • This layer leverages the Data Transformer ABB in the Integration Layer for transforming and mapping of data from one format to another and aligning data from different resources.
  • This layer leverages the Access Controller ABB in the Quality of Service Layer to enforce security policies and access privileges.
  • This layer leverages the Data-Driven Access Controller ABB in the Quality of Service Layer to enforce access privileges on individual data items.
  • This layer leverages the Business Activity Monitor ABB, Business Activity Manager ABB, and Activity Correlation Manager ABB in the Quality of Service Layer to monitor events, business activities, Key Performance Indicators (KPIs), to interface with the Integration Layer for event notification and propagation, and to analyze the event information, both in real-time/near real-time, as well as stored (warehoused) events, and decide responses to triggered events.
  • The Policy Enforcer ABB from the Quality of Service Layer is leveraged by the Governance Layer to enforce governance policies, by all other layers to enforce security policies, and by the Integration Layer to enforce policies during mediation, by the Services Layer to enforce service policies, and by the Business Process Layer to enforce of polices on business processes.

Key Interactions of the Information Layer with Cross-Cutting Layers

Interaction with Horizontal Layers

The four horizontal layers that are logically more functional in nature in the SOA RA – namely, Consumer Layer, Business Process Layer, Services Layer, and Service Component Layer – require information (structure and unstructured data, metadata, and messages) to fulfill their respective responsibilities and, therefore, rely on the Information Layer to access information. These horizontal layers are dependent on the ABBs of the Information Layer to fulfill their information needs.

Key Interactions of the Information Layer with Horizontal Layers

Usage Implications and Guidance

Especially, for industry-specific SOA solutions, this layer captures all the common cross-industry and industry-specific data structures, XML-based metadata architectures (e.g., XML schema), and business protocols for exchanging business data. Some discovery, data mining, and analytic modeling of data are also covered in this layer. These common structures may be standardized for the industry or organization.