Archive - migration of historical databases from Oracle to PostgreSQL

Image Description

Peter Humaj

January 10 2023, 4 min read

About historians and depositories

A standard feature of archival subsystems (historians) in SCADA systems is that their historical data is stored only for a configuration-defined time (month, year, 5 years ..). Practice required the introduction of "unlimited" archiving, even at a time when disks were small and expensive. How to resolve such a discrepancy? The authors of the D2000 Archive implemented a feature called depository databases or depositories. The archive with active depositories stores data in a standard archive database and in parallel in the so-called depository. Storing in the depository can be enabled or disabled for each archive object. Once created, the depository is being filled for a defined time (usually a month, but it can also be a week or 10 days), and then it is disconnected and a new depository begins to be filled. After it is disconnected, it is possible to copy the depository away, or burn it to a CD-ROM and insert it into a real depository, as was the case with the first customer who required this functionality - hence the name. Old depositories can be reconnected and data can be read.

.. and about their migration

In February 2022, we implemented the migration phase of depositories as part of a larger project that concerns the migration of the entire MES system.
The original MES runs on D2000 version 9.0.20 on the Windows platform. One of the redundant archives has depositories enabled, every depository contains 1 month of data. Depositories 1-230 are for the period January 1, 2003 to December 31, 2022, i.e. for 18 years. Oracle 11g is used as the archive/depository database.
The new MES system will run on D2000 version 21.1.72 on the Linux platform. One of the redundant archives has depositories turned on, again with a period of 1 month, the depositories are in the PostgreSQL 13 database.
As this is a relatively large application, the depositories are divided into so-called depository segments. This means that 8 depository segments (0-7) are filled in parallel. On the Oracle platform, depositories are implemented as separate tablespaces. Figure 1 shows depository segments 230 and 231, each tablespace being represented on a disk by a single file. Depository numbers (230-231), depository segment numbers (S00-S07) and depository creation date (YYYYMMDD) are part of the file names for easier orientation.

Figure 1 - The disk with Oracle depositories on the original server contains 4.7 TB of data. 

The D2000 Sysconcole administration tool allows system administrators to connect to the archive and query the status of individual depositories. Figure 2 shows that all older depositories are connected for reading and the last depository (231) is currently being filled.

Figure 2 - Start and end of the depository listing on the original MES system.

The migration of 4.7 TB of depository data took almost 2 weeks. The migration used the arcsynchro utility, which was running on a new MES server. Up to 4 arcsynchros were run in parallel, each copying data from one source depository segment. Figure 3 shows the end of the synchronization listing of one depository segment. The report shows that 231 million values were transferred.

Figure 3 - End of the arcsynchro listing for one depository segment.

Because Oracle has more efficient data storage (supports index-organized tables and has less overhead per row), depositories with 4.7 TB of data would grow to about 15 TB in a PostgreSQL database. On the other hand, the implementation of depository compression on the PostgreSQL platform has significantly reduced the space used. How significantly? Figure 4 shows the final state after the migration is complete. All PostgreSQL depositories take up 755 GB of data. This is a very nice disk space saving compared to 4700 GB of Oracle data!

Figure 4 - Disks on the new Mes server. Depositories on disk /trezors use 755 GB.

The compression of the depositories was performed continuously - also due to the fact that we had "only" 8 TB of disk space available on the target system. Unlike the archive database, standard HDD disks were used for depositories. What was the depository compression speed? This is shown in the last picture.

Figure 5 - Listing from the compression of segments 7 of depositories 222 to 224

Compression of each of 21 GB depository segments on the PostgreSQL platform took about 3 minutes and 5 seconds. The result was compressed depository segments with sizes of 1024 - 1043 GB, i.e. the compression ratio was about 20: 1.

Conclusion

In my opinion, depository compression on the PostgreSQL platform is a "killer feature" for anyone who uses depositories. Significant reduction of needed disk space is another important argument for migrating to the new version of D2000.

February 28, 2022, Ing. Peter Humaj, www.ipesoft.com

Subscription was successful

Thank you for submitting form.

Image Description

Your message was successfully sent.

Thank you for submitting the form.

Image Description

Your message was successfully sent.

Thank you for submitting the form.

Image Description

Your message was successfully sent.

Thank you for submitting the form.

Image Description