What is reference link in DataStage?

A reference link represents a table lookup operation. You can use a reference link as an input link to a Lookup stage and as an output link from other types of stages, such as the Db2 Connector stage.

How do I add a link in DataStage?

To add a link between stages, you click the Link object in the General palette group, and then click and drag the cursor from one stage to another. Another option is to right-click on one stage and drag the link to another stage. By default, new links are named.

What is reject link in DataStage?

Reject links output rows that have not been written on any other output links from the Transformer stage because they have failed or constraints or because a write failure has occurred. To define a constraint or specify an otherwise link, do one of the following: Select an output link and click the constraints button.

What is parallel job in Datastage?

Datastage parallel job process is a program created in Datastage Designer using a GUI. It is monitored and executed by Datastage Director. The Datastage parallel job includes individual stages where each stage explains different processes.

What are the types of views in DataStage director?

DataStage Director has three view options:

  • The Status view displays the status, date and time started, elapsed time, and other run information about each job in the selected repository category.
  • The Schedule view displays job scheduling details.
  • The Log view displays all of the events for a particular run of a job.

What is a node in DataStage?

In a grid environment a Node is the place where the jobs are executed. There will be a Conductor node and multiple grid nodes which are executing in parallel to make the processing faster. If the number of nodes are increased it increases the Parallelism of the job and hence the performance.

How do I capture a rejected record in DataStage?

To capture rejected duplicates use a Transformer. Partition and sort on your primary key. In a transformer keep the primary key stored in a Stage Variable. Compare incoming primary key to the stored primary key Stage Variable.

What is Lookup DataStage?

The Lookup stage is a processing stage that is used to perform lookup operations on a data set read into memory from any other Parallel job stage that can output data.

What is Dsodb DataStage?

The DataStage Operations Console has the ability to track link and stage job run metrics and store that data in the operations console database, i.e. DSODB. However, the tables which store that data remain empty even after multiple job runs have completed.

How do you collect operational metadata in DataStage?


  1. Open the InfoSphere DataStage and QualityStage Administrator client.
  2. On the Projects page, select the project that you want to generate operational metadata for, and click Properties to open the Project Properties window.
  3. Select Generate operational metadata.
  4. Click OK.

How do I find duplicate records in DataStage?

You can capture the duplicate records based on keys using Transformer stage variables.

  1. Sort and partition the input data of the transformer on the key(s) which defines the duplicate.
  2. Define two stage variables, let’s say StgVarPrevKeyCol(data type same as KeyCol) and StgVarCntr as Integer with default value 0.

What is the use of transformer in DataStage?

The Transformer stage is a processing stage. It appears under the processing category in the tool palette. Transformer stages allow you to create transformations to apply to your data. These transformations can be simple or complex and can be applied to individual columns in your data.

What is partitioning in DataStage?

Data partitioning and collecting in Datastage. Partitioning mechanism divides a portion of data into smaller segments, which is then processed independently by each node in parallel. It helps make a benefit of parallel architectures like SMP, MPP, Grid computing and Clusters.

What is hash partitioning in DataStage?

Hash partitioner. Partitioning is based on a function of one or more columns (the hash partitioning keys) in each record. The hash partitioner examines one or more fields of each input record (the hash key fields). Records with the same values for all hash key fields are assigned to the same processing node.

What is surrogate key in Datastage?

A surrogate key is a unique primary key that is not derived from the data that it represents, therefore changes to the data do not change the primary key. In a star schema database, surrogate keys are used to join a fact table to a dimension table.

What is extraction and transformation metadata?

Extraction and transformation metadata contain data about the extraction of data from the source systems, namely, the extraction frequencies, extraction methods, and business rules for the data extraction.

What is operational metadata?

Operational metadata describes the events and processes that occur and the objects that are affected when you run a job that was created in IBM® InfoSphere® DataStage® and QualityStage®.

What are the different types of links in HTML?

The Different Types of Links 1 Links. An link creates a hyperlink with the a href attribute stating the link’s destination, as well as the anchor text which is the text shown for the 2 Image Links. 3 JavaScript Links. 4 Rel Links. 5 Nofollow Links.

Why is it important to know the different types of links?

It’s important to have a good understanding of the types of links that you can build, as well as understanding the technical aspects of links. This will give you a solid grounding so that you can assess the value of different types of links, and work out which ones are the best for you to pursue.

What are the different data sources in DataStage?

The data sources might include sequential files, indexed files, relational databases, external data sources, archives, enterprise applications, etc. DataStage facilitates business analysis by providing quality data to help in gaining business intelligence.

What is a data stage?

It describes the flow of data from a data source to a data target. Usually, a stage has minimum of one data input and/or one data output. However, some stages can accept more than one data input and output to more than one stage.