GoldeGate Architecture

Components of GoldenGate :-

1) Extract Process :-

- Extract process runs on Source system and is the Capturing mechanism of Goldengate.It captures all the changes that are made to the objects for which it is configured for synchronization.
- It sends only the data from commited transactions to the trail for propogation to the target system.
- When extract captures the commit record of a transaction,all the log records for that transaction are written to the trail as a sequentially organized transaction unit. This maintains both speed and data integrity.
- Multiple Extract processes can operate on different objects at the same time. For example,one process could continuously extract transactional data changes and stream them to a decision-support database, while another process performs batch extracts for periodic reporting.
Or, two Extract processes could extract and transmit in parallel to two Replicat processes (with two trails) to minimize target latency when the databases are large. To differentiate among different processes, we can assign each one a group name.

Extract can be configured in two ways :-

- Initial Loads :- For initial data loads, Extract extracts a current set of data directly from their source objects.

- Change Synchronization :- To keep source data synchronized with another set of data,Extract extracts transactional changes made to data (inserts, updates, and deletes) after the initial synchronization has taken place. DDL changes and sequences are also extracted, if supported for the type of database being used.

Extract obtains the data from a data source in one of the following ways :-

- Database Transaction Logs :- (such as the Oracle redo logs or SQL/MX audit trails) This method is known as log-based extraction. When Extract is installed as a log-based implementation, it can read the transaction logs directly.

- GoldenGate Vendor Access Module (VAM) :- VAM is a layer for communication that passes data changes and transaction metadata to the Extract process. The database vendor provides the components that extract the data changes and pass it to Extract.

2) Data Pump :-

- Data pump is a secondary Extract group within the source GoldenGate configuration.If a data pump is not used, Extract must send data to a remote trail on the target.
- If we are using a configuration that includes a data pump,the primary Extract group writes to a trail on the source system.The data pump reads this trail and sends the data through the network to a remote trail on the target.
- Data pump adds the flexibility of storage and help to eliminate the load on the primary Extract process from TCP/IP activity.

Like a primary Extract group, a data pump can be configured in two modes :-

- Online or Batch processing :- Can perform data filtering, mapping, and conversion.

- Pass-through mode :- Data is passively transferred as-it is, without any manipulation.It increases the throughput of the data pump as the functionality of that looks up object definitions is bypassed.

In most business cases, it is best practice to use a data pump.

Some reasons for using a data pump include the following :-

- Protection against network and target failures:- In a basic GoldenGate configuration,with only a trail on the target system,there is no place on the source system to store data that Extract process continuously extracts into memory. If becuase of any reason the network or the target system becomes unavailable, the primary Extract could run out of memory and abend.
However,with a trail and data pump on the source system,captured data can be moved to disk,preventing the abend.When connectivity is restored,the data pump extracts the data from the source trail and sends it to the target system(s).

- Several phases of data filtering or transformation :- When using complex filtering or data transformation configurations, you can configure a data pump to perform the first transformation either on the source system or on the target system,and then use another data pump or the Replicat group to perform the second transformation.

- Consolidating data from many sources to a central target :- When synchronizing multiple source databases with a central target database, you can store extracted data on each source system and use data pumps on each of those systems to send the data to a trail on the target system. Dividing the storage load between the source and target systems reduces the need for massive amounts of space on the target system to accommodate data arriving from multiple sources.

- Synchronizing one source with multiple targets :- When sending data to multiple target systems, you can configure data pumps on the source system for each target. If network connectivity to any of the targets fails, data can still be sent to the other targets.

3) Replicat Process :-

- Replicat process runs on the target system. Replicat reads extracted data changes and DDL changes (if supported) that are specified in the Replicat configuration, and then it replicates them to the target database.

Replicat process can be configured in one of the following ways:-

- Initial loads: For initial data loads, Replicat can apply data to target objects or route them to a high-speed bulk-load utility.

- Change synchronization: To maintain synchronization, Replicat applies extracted transactional changes to target objects using native database calls, statement caches,and local database access. Replicated DDL and sequences are also applied, if supported for the type of database that is being used. To maintain data integrity, Replicat applies the replicated changes in the same order as those changes were committed to the source database.

- You can use multiple Replicat processes with multiple Extract processes in parallel to increase throughput.Each set of processes can handle different objects.To differentiate among processes, you assign each one a group name.
- You can delay Replicat so that it waits for a specific amount of time before applying data to the target database.A delay may be useful for such purposes to prevent the propagation of bad SQL, to control data arrival across different time zones, and to allow time for other planned events to
occur.DEFERAPPLYINTERVAL parameter can be used for controlling length of the delay.

4) Trails :-

- Trail is a series of files to store the supported database changes temporarily on disk to support the continuous extraction and replication.
- It can exist on the source or target system, or on an intermediary system, depending on how you configure GoldenGate.
- On the local system it is known as an extract trail (or local trail). On a remote system it is known as a remote trail.
- By using a trail for storage, GoldenGate ensures data accuracy and fault tolerance.
- The use of a trail is that it allows extraction and replication activities to occur independently of each other.
- With these processes separated,you have more choices for how data is delivered. For example, instead of extracting and replicating changes continuously, you could extract changes continuously but store them in the trail for replication to the target later, whenever the target application needs them.
- Only one Extract process can write to a trail.

Processes that read the trail are :-

- Data-pump Extract: Extracts data from a local trail for further processing,and transfers it to the target system or to the next GoldenGate process downstream in the GoldenGate configuration.

- Replicat: Reads a trail to apply change data to the target database.

For information on other components please read another topic.

OSS/BSS - Telecommunications

Tuesday, November 3, 2009

GoldeGate Architecture

No comments:

Post a Comment

Total Pageviews

Followers