SCADA system upgrade done quickly and efficiently
A few weeks ago, my colleagues and I returned from a several-day event. We upgraded the SCADA system for dispatching for our customer. It was a SCADA control system, which was originally built on redundant Alpha DS25 servers with the OpenVMS operating system (version 7.3.1) and on the IPESOFT D2000 OpenVMS version 7.1 technology.
Configuration, logging, and archive data were stored in an Oracle version 9.2 database. The dispatching was built as a redundant 24 x 7 with High Availability / Disaster Recovery features: 2 application and 2 archive servers at the main site, 1 application, and 1 archive server at a several kilometers distant backup site. Network segments were also doubled. Communication with the technology took place both via serial lines and via a WAN network. Some communicated technological equipment and local control systems were as far as 200 km away.
In the past, within the SCADA system and after about ten years of operation, the obsolete hardware was replaced by HP servers based on Itanium processors. The operating system then remained OpenVMS, just the newer version 8.4. Archive, logging, and configuration databases were migrated to Oracle 10.2.0.4, later updated to Oracle 10.2.0.5.
This year (due to the obsolescence of the Intel Itanium-based infrastructure + OpenVMS operating system), a hardware replacement was planned in connection with the moving of servers to new premises. This time it was a complete replacement of the technological platform, and apart from the D2000, there was no stone left on the stone. The customer pushed for the transition to DELL servers, agreed on the transition to the Linux operating system - specifically RedHat - and as a database platform, we chose PostgreSQL version 11 (also for archive, logging, and configuration databases). And on top of that, SCADA migrated to, of course, the latest IPESOFT D2000 Linux v 12.1. So - how did it go?
Thorough and consistent preparation for migration is the key to success. Especially when you change the entire technological stack. As part of the preparation, new servers were installed and connected to the technology network - using temporary addresses from which it was not possible to communicate with the technology. The initial migration of the configuration database from IPESOFT D2000 v9.2 to version 12.1 and importing it into the PostgreSQL database was performed. Similarly, archives and depository databases were migrated from Oracle to PostgreSQL - this took more than a month, as it was about 1.56 TB of data since August 2005. At the same time, we changed the time parameters of depositories - we migrated the original 10-day depositories to monthly ones.
Due to another way of starting processes on the OpenVMS platform (using startup DCL scripts), it was necessary to manually adjust the configuration of automatically started processes and set the appropriate parameters. Similarly, it was necessary to modify the IPESOFT SysProf prophylactic system and add modules to monitor the RedHat operating system and DELL servers.
We were prepared for the migration in the first half of August. Subsequently, the new servers were connected to the production system using a transparent D2000 Gateway. The D2000 Gateway has been configured to transfer the values of all communication objects, structured variables, user variables, and switches. At the same time, the connection ensured that the D2000 Archives of the new system continuously stored object values.
Dispatchers and other users of the control system could then open schemes with live values and test the functionality of the new system (except for controlling it). The real data from the communication also helped to test the performance of the new servers.
During the testing, the D2000 KOM processes were not started (measured point values were transferred via the D2000 Gateway) as well as EVH processes (so as not to affect the production system, e.g. by writing to an external database).
The administrators configured and started a separate KOM process and several sample communications of various types (e.g. IEC-104). In this way, they verified the functionality of the communication protocols in the new version. What we were unable to verify was the ICCP/TASE-2 protocol. This communication in the production system took place with only one partner, moreover a foreign one, who could not provide us with a device to which a test connection could be established.
The day before the switchover itself, the configuration database was reimported - as configuration changes took place in the production environment during August and September. Subsequently, all necessary objects (communication processes, etc.) were modified using XML import.
On the day of the switchover, the production system was put into a state where the application was running on only one node (ScadaB server). The new ScadaA server was set to use the production IP addresses and the application was launched - so far without D2000 KOM and D2000 EVH processes. Through the transparent D2000 Gateway, the new application was connected to the production, so that all objects had current values until the switchover.
Dispatchers also launched a new HI (Human Interface) and connected to the new system through it.
Switchover itself was basically easy and took about a minute. A colleague turned off the D2000 KOM and D2000 EVH processes on the original system (and disabled their automatic start). I switched on these processes on the new system and set them to start automatically.
We also left the old ScadaB server running with the application - as a backup, in case it was necessary to go back quickly. Besides, it was intended as a backup for further diagnosis of possible "unplanned" problems, which we would have to solve immediately after the switch. What was it about?
Some of the archive objects did not have any data. We found out that these were archives filled from a script. Since the EVH processes were not running in the new environment, there was nothing to execute these scripts. The solution? The synchronization of relevant objects. Thanks to a well-designed naming convention, these objects were identified using several name masks. The D2000 arcsynchro utility, which is used to fill holes in the archive, was then started by entering these name masks - three times, as we needed to synchronize script-filled objects into three redundant archives.
One or two communication paths went out for several minutes. By analyzing the communication logs, we found out that there was a breakdown of the network connection. The D2000 KOM process tried unsuccessfully to establish it for a few minutes, and only then did it succeed. So the error was not on our side, but somewhere in the WAN network.
The values of some objects in the schemes were displayed in a slightly different way than in the old system (e.g. only the value 1 was displayed instead of the value 1.00). The analysis showed that when flipping the application configuration file (an equivalent of the Windows registry), the value of the OldMasks application parameter was not set to True. Subsequent detailed inspection showed several marginal differences in the configuration that were corrected. E.g. the dynamGraphDescTable parameter, which controls the display of the descriptive table when the ad-hoc dynamic chart is open, was turned off.
Once it was verified that the control system works well and reliably after the migration, we changed the IP addresses to the production ones on the second new ScadaB server, so the new system already had application redundancy. To compare the data of the old and new applications, we started the application on the old ScadaC server in the backup dispatching center. So far, the new ScadaC server has been running on non-production IP addresses, with the redundancy priority set to 0 so that it does not automatically become active (communication would be inoperative due to IP addresses), but only ScadaA and ScadaB with production addresses. We still planned to assign non-production IP addresses to the old ScadaC server so that it could still be run for sure - if there was a problem, it would be easier to check if it already existed before the migration or as a result of the upgrade.
For the next few days, we tested the functionality of switching redundancy from ScadaA to ScadaB, performed administrator training on D2000 Linux V12.1 - operations such as starting and shutting down an application or archive, finding out the status of applications and running processes, performing database backups and restores, etc.
Given that the upgrade included a generational replacement of the control system after many years, the replacement of the hardware platform, operating system, and database platform as well as the upgrade of IPESOFT D2000 from version 9.2 to version 12.1, I rate the replacement process as very smooth. Quality project preparation, good cooperation of our team with the customer and thorough multiple testing of individual migration steps as well as simulation of the whole migration process "in rough" with an emphasis on minimizing risks, accidents, and surprises, as it was a critical SCADA system, certainly contributed to this result. More importantly, the customer's application administrators, who have almost 20 years of experience with the IPESOFT D2000 platform, have deemed it in a similar way.
The replacement brought the customer all the benefits of the latest version of the D2000. At the same time, they no longer have to service obsolete Itanium servers and also save on energy consumption.
The migration path from IPESOFT D2000 OpenVMS and IPESOFT D2000 HP-UX to IPESOFT D2000 Linux version 12.1 are also available to our other customers who operate critical control dispatching centers on the IPESOFT D2000 technology.
Ing. Peter Humaj, www.ipesoft.com