Tuesday, November 3, 2009

GoldenGate Architecture

Overview of GoldenGate Components Continued

5) Extract Files :-

- When you want to do processing a one-time run,such as an initial load or a batch run that synchronizes transactional changes,GoldenGate stores the extracted changes in an extract file instead of a trail.
- The extract file mostly is a single file but can be configured to split into multiple files in anticipation of limitations on the size of a file that are imposed by the operating system. In this sense,it is similar to a trail,except that checkpoints are not recorded.
- The file or files are created automatically during the run.Versioning features that apply to trails also apply to extract files.

6) Checkpoints :-

- Checkpoints store the current read and write positions of a process to disk for recovery purposes.Checkpoints ensure that database changes marked for synchronization are extracted by Extract and replicated by Replicat to prevent redundant processing.
- They provide fault tolerance by preventing the loss of data when the system,the network,or a GoldenGate process need to be restarted.For advanced synchronization configurations, checkpoints enable multiple Extract or Replicat processes to read from the same set of trails.
- The read checkpoint of a process is always synchronized with the write checkpoint.Thus,if GoldenGate needs to re-read something that it already sent to the target system (for example,after a process failure) checkpoints enable accurate recovery to the point where a new transaction starts, and GoldenGate can resume processing.
- Checkpoints work with inter-process acknowledgments to prevent messages from being lost in the network.Extract creates checkpoints for its positions in the data source and in the trail. Replicat creates checkpoints for its position in the trail.
- A checkpoint system is used for Extract and Replicat processes that operate continuously,but it is not required for Extract and Replicat processes that run in batch mode. A batch process can be re-run from its start point, whereas continuous processing requires the support for planned or unplanned interruptions that is provided by checkpoints.
- Checkpoint information is maintained in checkpoint files within the dirchk sub-directory of the GoldenGate directory. Optionally,Replicat’s checkpoints can be maintained in a checkpoint table within the target database,in addition to a standard checkpoint file.

7) Manager :-

- Manager is the control process of GoldenGate. Manager must be running on each system in the GoldenGate configuration before Extract or Replicat can be started, and Manager must remain running while those processes are running so that resource management functions are performed.

Manager performs the following functions:
- Monitor and restart GoldenGate processes.
- Issue threshold reports, for example when throughput slows down or when synchronization latency increases.
- Maintain trail files and logs.
- Allocate data storage space.
- Report errors and events.
- Receive and route user requests from the user interface.

One Manager process can control many Extract or Replicat processes. On Windows systems, Manager can run as a service.

8) Collector :-

- Collector is a process that runs in the background on the target system. Collector receives extracted database changes that are sent across the TCP/IP network, and it writes them to a trail or extract file.
- Manager starts Collector automatically when a network connection is required. When Manager starts Collector, the process is known as a dynamic Collector, and GoldenGate users generally do not interact with it. However, you can run Collector manually. This is known as a static Collector.
- Not all GoldenGate configurations use a Collector process.When a dynamic Collector is used, it can receive information from only one Extract process,so there must be a dynamic Collector for each Extract that you use. When a static Collector is used, several Extract processes can share one Collector. However, a one-to-one ratio is optimal.
- The Collector process terminates when the associated Extract process terminates.By default, Extract initiates TCP/IP connections from the source system to Collector on the target, but GoldenGate can be configured so that Collector initiates connections from the target. Initiating connections from the target might be required if, for example, the target is in a trusted network zone, but the source is in a less trusted zone.

GoldeGate Architecture



Components of GoldenGate :-

1) Extract Process :-

- Extract process runs on Source system and is the Capturing mechanism of Goldengate.It captures all the changes that are made to the objects for which it is configured for synchronization.
- It sends only the data from commited transactions to the trail for propogation to the target system.
- When extract captures the commit record of a transaction,all the log records for that transaction are written to the trail as a sequentially organized transaction unit. This maintains both speed and data integrity.
- Multiple Extract processes can operate on different objects at the same time. For example,one process could continuously extract transactional data changes and stream them to a decision-support database, while another process performs batch extracts for periodic reporting.
Or, two Extract processes could extract and transmit in parallel to two Replicat processes (with two trails) to minimize target latency when the databases are large. To differentiate among different processes, we can assign each one a group name.

Extract can be configured in two ways :-

- Initial Loads :- For initial data loads, Extract extracts a current set of data directly from their source objects.

- Change Synchronization :- To keep source data synchronized with another set of data,Extract extracts transactional changes made to data (inserts, updates, and deletes) after the initial synchronization has taken place. DDL changes and sequences are also extracted, if supported for the type of database being used.

Extract obtains the data from a data source in one of the following ways :-

- Database Transaction Logs :- (such as the Oracle redo logs or SQL/MX audit trails) This method is known as log-based extraction. When Extract is installed as a log-based implementation, it can read the transaction logs directly.

- GoldenGate Vendor Access Module (VAM) :- VAM is a layer for communication that passes data changes and transaction metadata to the Extract process. The database vendor provides the components that extract the data changes and pass it to Extract.

2) Data Pump :-

- Data pump is a secondary Extract group within the source GoldenGate configuration.If a data pump is not used, Extract must send data to a remote trail on the target.
- If we are using a configuration that includes a data pump,the primary Extract group writes to a trail on the source system.The data pump reads this trail and sends the data through the network to a remote trail on the target.
- Data pump adds the flexibility of storage and help to eliminate the load on the primary Extract process from TCP/IP activity.

Like a primary Extract group, a data pump can be configured in two modes :-

- Online or Batch processing :- Can perform data filtering, mapping, and conversion.

- Pass-through mode :- Data is passively transferred as-it is, without any manipulation.It increases the throughput of the data pump as the functionality of that looks up object definitions is bypassed.

In most business cases, it is best practice to use a data pump.

Some reasons for using a data pump include the following :-

- Protection against network and target failures:- In a basic GoldenGate configuration,with only a trail on the target system,there is no place on the source system to store data that Extract process continuously extracts into memory. If becuase of any reason the network or the target system becomes unavailable, the primary Extract could run out of memory and abend.
However,with a trail and data pump on the source system,captured data can be moved to disk,preventing the abend.When connectivity is restored,the data pump extracts the data from the source trail and sends it to the target system(s).

- Several phases of data filtering or transformation :- When using complex filtering or data transformation configurations, you can configure a data pump to perform the first transformation either on the source system or on the target system,and then use another data pump or the Replicat group to perform the second transformation.

- Consolidating data from many sources to a central target :- When synchronizing multiple source databases with a central target database, you can store extracted data on each source system and use data pumps on each of those systems to send the data to a trail on the target system. Dividing the storage load between the source and target systems reduces the need for massive amounts of space on the target system to accommodate data arriving from multiple sources.

- Synchronizing one source with multiple targets :- When sending data to multiple target systems, you can configure data pumps on the source system for each target. If network connectivity to any of the targets fails, data can still be sent to the other targets.

3) Replicat Process :-

- Replicat process runs on the target system. Replicat reads extracted data changes and DDL changes (if supported) that are specified in the Replicat configuration, and then it replicates them to the target database.

Replicat process can be configured in one of the following ways:-

- Initial loads: For initial data loads, Replicat can apply data to target objects or route them to a high-speed bulk-load utility.

- Change synchronization: To maintain synchronization, Replicat applies extracted transactional changes to target objects using native database calls, statement caches,and local database access. Replicated DDL and sequences are also applied, if supported for the type of database that is being used. To maintain data integrity, Replicat applies the replicated changes in the same order as those changes were committed to the source database.


- You can use multiple Replicat processes with multiple Extract processes in parallel to increase throughput.Each set of processes can handle different objects.To differentiate among processes, you assign each one a group name.
- You can delay Replicat so that it waits for a specific amount of time before applying data to the target database.A delay may be useful for such purposes to prevent the propagation of bad SQL, to control data arrival across different time zones, and to allow time for other planned events to
occur.DEFERAPPLYINTERVAL parameter can be used for controlling length of the delay.

4) Trails :-

- Trail is a series of files to store the supported database changes temporarily on disk to support the continuous extraction and replication.
- It can exist on the source or target system, or on an intermediary system, depending on how you configure GoldenGate.
- On the local system it is known as an extract trail (or local trail). On a remote system it is known as a remote trail.
- By using a trail for storage, GoldenGate ensures data accuracy and fault tolerance.
- The use of a trail is that it allows extraction and replication activities to occur independently of each other.
- With these processes separated,you have more choices for how data is delivered. For example, instead of extracting and replicating changes continuously, you could extract changes continuously but store them in the trail for replication to the target later, whenever the target application needs them.
- Only one Extract process can write to a trail.

Processes that read the trail are :-

- Data-pump Extract: Extracts data from a local trail for further processing,and transfers it to the target system or to the next GoldenGate process downstream in the GoldenGate configuration.

- Replicat: Reads a trail to apply change data to the target database.


For information on other components please read another topic.

Monday, November 2, 2009

GoldenGate Supported Topologies



1) Uni-Directional :- This topology can be used to create a reporting instance out of OLTP databases or to offload the reporting on OLTP or Production database.

2) Bi-Directional :- Can be used for Active-Active Instant failover.For example,If we want to apply patch on one database we can use another for our application like rolling upgrade in Oracle Real Application Clusters.

3) Peer-To-Peer :- Used in the case of High Availability and Load Balancing.High Availability mean if any of the database fails because of any reason(Hardware Failure,Disk Failure or Instance Failure) our application would not get affected becuase other two databases will be up and running.Other benefit is that load is also being shared between the databases equally.

4) Broadcast :- Used for Data Distribution.If we have Production database and we want to do different kind of things (reporting,testing and development)on the data being stored .Then we can distribute the data between same or different databases and can be used for different purposes mentioned above.

5) Consildation :- If want to create common datawarehouse by consolidating the data from the datamarts of various departments in the enterprise.

6) Cascading :- Can be used for Scalability and Database Tiering.If want the move the production database to testing database and then to move the results of testing into various other databases of other departments.We can use any of the above topologies in any manner to do cascading like in the diagram is Uni-directional and then Broadcasting is used.

GoldenGate Features and Capabilities

Oracle acquired GoldenGate on September 3,2009.
GoldenGate enables the exchange and manipulation of data at the transaction level among multiple, heterogeneous platforms across the enterprise.

GoldenGate Features and Capabilites :-
1) Filtering :- Gives the flexibility to select and replicate the desired amount of data from the source(s) that could be Database(s),Flat files,File systems,Datawarehouse,etc.

2) Transformation :- Gives the capability to move the data between two different kind of databases like between SQL Server and Oracle,IBM DB2 and Oracle by transformation.

3) Custom Processing :- Allows to perform complex mappings and to use Ibuild selective functions for processing.Gives the option for scheduling also.

4) Real-Time :- Log-based capture capability allows to move thousands of transactions per second with very low impact.

5) Heterogenous :- Allows to move changed data accross different databases and platforms.

6) Transactional :- Can work in transactional environments as well by maintaining the transactional integrity.

7) Performance :- More performance because of Log-based capture.

8) Extensibility and Flexibility :- Its open and modular architecture allows to extract and replicate selected data records, transactional changes, and changes to DDL across a variety of topologies.

9) Reliability :- It is resilient against failures and interruptions by maintaining checkpoints,trails and logs.