Thursday, July 21, 2016

Understanding of sql loader, oracle export/import and oracle data pump.


Oracle SQL Loader
Oracle SQL Loader is a utility for loading data into an Oracle database and is often used for transporting data from a non-Oracle source system to an Oracle data warehouse. It is a versatile utility that can load data in almost any format, can load multiple files at the same time into multiple tables and can load data from files on disk, on tape or from a named pipe.

It runs in one of 3 modes:

conventional load.
direct-path load.
external-path load.

The conventional load is the default method and has less restrictions (see below) than the direct-path load which is generally much faster but less flexible. The direct-path load is faster for large data sets as it doesn't generate any undo data and bypasses the database buffer cache but it is limited to use just on heap tables (see below for the other restrictions). The external-path load creates an external table for the specified data file and then executes SQL INSERT statements to load the data into the target table.

This mode has 2 advantages over direct-path and conventional loads:

1.         If a data file is big enough it will be loaded in parallel;
2.         The source data can be modified by SQL and PL/SQL functions as it is being loaded.

SQL*Loader enables you to:

Load data from multiple files (from disk, tape or named pipe) into one or more target tables in the same load
Load data in any character set supported by Oracle
Load or discard records depending on values in the input fields
Transform the data before loading using SQL functions
Generate unique sequential keys for specified columns
Append to existing data or replace existing data
Load large objects (lobs), collections (nested tables and varrays) and object-relational data

How It Works

Oracle SQL Loader processes the input data files according to the directions in a text file called the control file which specifies the names and locations of the source data files, the format of the data to be loaded and the data transformations to be performed when loading.

As the input files are processed, any records that do not pass format checks are written to the bad file and any records that do not meet the specified selection criteria are written to the discard file.

Records that pass both format and selection criteria are written to the specified target tables but they may still be rejected because of, for example, constraint violations in which case they are written to the bad file along with those records rejected for being invalid.

A log file containing a detailed summary of the load, including a description of any errors that occurred during the load is also produced.

How to Use Oracle SQL Loader

Oracle SQL Loader is initiated from the command line and the various parameters such as the name of the control file and the userid can be specified at the same time but it is generally a lot easier to put all these parameters into a parameter file, thereby saving on typing and frustration when typos are made. Which of the conventional/direct path/external path load types to use is a trade off between performance and flexibility.

The default load type is the conventional load which creates and executes SQL insert statements to load the data into the target tables.

This method is better when:

other users need to be able update data in the target tables whilst new data is being loaded into them;
loading data into clustered tables;
loading a relatively small amount of rows into a large indexed table as the load process makes a copy of the original index before merging in the new keys (this is a relatively slow process for a large table);
loading data into a large table with referential or column check integrity constraints as these constraints are disabled during a direct path load and re-enabled when the load finishes requiring the whole table to be checked;
you want to ensure that a record will be rejected if it causes an Oracle error, is formatted incorrectly or violates a constraint on the target table;
insert triggers must be fired.      
           
The direct-path load is initiated by specifying DIRECT=TRUE when starting Oracle SQL Loader. This method writes formatted data blocks directly to the target tables bypassing the SQL layer making loading faster but has the restrictions mentioned above. This method should be used when you need to load a large amount of data and need to maximise performance.

When using external-path load the source file has to be in a location accessible to the database and specified by an Oracle directory and the user must have been granted read and write access to the Oracle directory.

Oracle export and import

The Oracle export and import utilities are provided by Oracle as a way of making logical backups of the database to complement physical backups made of the data files either by using RMAN or the o/s copy utility and of copying data from one database to another. This may be carried out for testing and/or training purposes or when upgrading to a new release of Oracle.

In Oracle 10g and later releases, these utilities are superseded by Oracle Data Pump which acts in much the same way as the original export and import utilities but is designed to make the process of transferring data from one database to another faster and more secure. Oracle Data Pump also provides a new network mode which removes the need to create intermediate files.

How do the Export and Import utilities work and what are they used for?

The Oracle export and import utilities are used to provide logical backups of objects in the database to supplement the physical backups made for example with RMAN (the Recovery Manager).

These utilities can work on several different levels: (full) database, tablespace, user (schema) or table levels. They can only be used for logical backup and recovery (for example to recover the data of a table accidentally truncated by a user) because the export utility can only take a snapshot of the data at the time it is run and unlike with objects restored from physical backups can't be rolled forward by using the redo logs.

Backups made using the Oracle export utility are performed with the database open so in this sense they are hot (online) backups. However recovered data can't be rolled forward as this would require the redo logs from the same point in time as the export.
This means exports must be performed when there are no updates occurring on the objects being exported to ensure that the backup is consistent and can therefore be used for recovery.

Use of the export utility is also a good way to capture the metadata of the database/schemas/tables as the export process generates the SQL statements to recreate the database objects (with or without the associated data) and stores them in the export dump file (by default called expdat.dmp).

The exported file is in a proprietary (binary) format and can only be read by the import utility, and not by the other utilities such as RMAN or SQL*Loader.

A new feature was added to the Oracle export utility in Oracle 9i to enable the export of just a subset of the table data by using a query to specify the data to be exported.

Summary

Although no longer supported in Oracle 11g (except for downgrading to a previous release), the Oracle export and import utilities have for many years been the primary utilities used for supplemental (logical) backups and for transferring data between databases. They are relatively easy to use and understand and reasonably quick to run for small datasets.

The main disadvantage to using them has been that the availability of an export dump file containing unencrypted data from a production database is a security risk - anyone with access to the file can copy it and load it into another database. For this and other reasons, these utilities are being replaced with Oracle Data Pump import and export.

Oracle Data Pump

Oracle Data Pump is a server-side utility that replaces the venerable export and import utilities. The export and import utilities have been provided by Oracle for many years as a way of making logical backups of the database and of copying data from one database to another and Oracle data pump does exactly the same thing.

So why were they replaced?

Well, one of the biggest problems with the old export and import utilities is that there was no security.  These tools ran on the client and the export dump file was stored on the  client and was therefore accessible to anyone who could log on to the pc.

As Data Pump runs on the server the export dump files can be stored only in directory objects created by the dba and the dba therefore controls who has access to the dump files.  Oracle data pump jobs also run asynchronously which enables you to start a job (to export or import data), disconnect and then re-connect later to monitor progress

How does Oracle Data Pump work?

Oracle Data Pump consists of 3 distinct components:

1.     expdp and impdp (the command line interface tools)
2.     the PL/SQL package DBMS_DATAPUMP (the data pump api)
3.     the PL/SQL package DBMS_METADATA (the meta data api)

Expdp and impdp are used to export data from and import data to the database respectively by invoking sub routines in DBMS_DATAPUMP.
Meta data is extracted and loaded by use of the DBMS_METADATA package.

As with the export and import utilities data pump can work on several different levels: (full) database, tablespace, transportable tablespace, user (schema) or table levels. Unlike export/import, however, data pump can use several different methods to export or import data.

These methods are:

1.     data file copying
2.     direct path
3.     external tables
4.     conventional path
5.     network link

Data file copying

The quickest way to copy data from one database to another is just to transfer the meta data (the information about the structure of the database such as tables, partitions, sub partitions) via use of Oracle  data pump and then physically copy the relevant data files via use of operating system commands. This method  is most commonly used for transferring a whole tablespace (when the TRANSPORT_TABLESPACES parameter is specified) but can also be used at the table level when TRANSPORTABLE=ALWAYS is specified. The character set must be the same on both databases to be able to use this method.

Direct path transfer

This is the second fastest method of transferring data as it bypasses the SQL layer of the database and is used automatically by data pump when the structure of the tables being exported imported allow it. There are different restrictions for using this method on export and import - the full set of conditions is available in the Oracle utilities documentation.

External tables

When direct path transfer can't be used the next best method is to use the external table mechanism which is similar to but not the same as the SQL*Loader external table mechanism. When using this method the dump file is regarded as a table (just like any other except that it is outside the Oracle database) and the database SQL engine is then used to import or export the relevant data.  Oracle data pump will select this method automatically if it can't use direct path load or unload and data may be loaded using the external table mechanism having been unloaded using direct path and vice versa.

Conventional path

The conventional path is used when none of the other methods can be used for loading/unloading data because of the way the tables have been built.  More work has to be done by the database to load or unload data this way so performance is generally slower with this method.

Network link

The data pump import and export utilities can also use a database link to load or unload data from a remote database. However this is the slowest method as all the data has to be transferred over the network and is not recommended for large volumes of data. This method has to be used for read only databases.

NOTE: Data Pump does not work with utilities older than the 10g release 1

No comments:

Post a Comment