Prio 1 Use case for SDI is the Table Replication from a non-SAP Data- source into SAP HANA tables. For some Data- sources it is also supported to not only read data from it, it is also supported to write back to it.
Meaning having SAP HANA content as source and writing the Data into the 3rd Party Data- source. Source could be a table or a, even BW generated, Calculation View.
I will have a look into this feature called “Write Back” with the usage of a SAP HANA Database as source and a MS SQL Server as target.
Which 3rd Party Databases are supported for Write Back is documented in the SDI PAM.
Main Use-case for SDI is Data replication from a non-SAP source into SAP HANA. For a couple of releases now it is possible, for certain 3rd party Databases, to not only read from them but to also write back into them.
This Blog shows a little the setup of this Use-case as well as some obstacles you are facing.
As a conclusion we can determine that the default parameter setup shipped with the installation of SAP HANA is sufficient enough to write into a 3rd party Database in general.
When it comes to certain performance requirements this default setup is maybe not sufficient enough. It that case you can apply this Blog to your Environment to fulfill them. As Write Back queues the Data occasionally inside the DP Agent, main memory consumption of the DP Agent is foreseen.
2.1 History
Meaning having SAP HANA content as source and writing the Data into the 3rd Party Data- source. Source could be a table or a, even BW generated, Calculation View.
I will have a look into this feature called “Write Back” with the usage of a SAP HANA Database as source and a MS SQL Server as target.
Which 3rd Party Databases are supported for Write Back is documented in the SDI PAM.
1. Summary and Conclusion
Main Use-case for SDI is Data replication from a non-SAP source into SAP HANA. For a couple of releases now it is possible, for certain 3rd party Databases, to not only read from them but to also write back into them.
This Blog shows a little the setup of this Use-case as well as some obstacles you are facing.
As a conclusion we can determine that the default parameter setup shipped with the installation of SAP HANA is sufficient enough to write into a 3rd party Database in general.
When it comes to certain performance requirements this default setup is maybe not sufficient enough. It that case you can apply this Blog to your Environment to fulfill them. As Write Back queues the Data occasionally inside the DP Agent, main memory consumption of the DP Agent is foreseen.
2. Introduction
2.1 History
Version | Date | Adjustments |
0.1 | 2018.12.21 | Initial creation |
2.2 Architecture
I am using the following architecture as test Environment:
As you can see this is pretty straight forward. A SAP HANA 2.0 Database will act as the source Database. Inside my Tenant i have the “Sale” Table with ~ 12 Mill. records.
I will use a Flow Graph as Object for the Data transfer because Replication Tasks are currently not supported to replicate data out of SAP HANA into a 3rd Party Database.
Target is a MS SQL Server 2017 hosting also the DP Agent.
2.3 Versioning
I am using the following versions of Software Applications.
Source System
◈ SAP HANA Database 2.0 SPS 03
Target System
◈ Microsoft SQL Server 2017 Developer Edition
◈ Using the WideWorldImportersDW sample Data
◈ SAP HANA SDI DP Agent in Version 2.0 SPS 03 Patch 4 (2.3.4)
2.4 Scope
This Blog will cover the following areas in this manner:
◈ Architecture challenges
◈ Flow Graph design
◈ Parameter Hints
◈ Performance
Everything else is out of scope in this Blog.
3. Initial Position
The initial Position looks as follows:
1. My Data source transform has the “TBL01_FactSale” table as source. This is a SAP HANA column store table containing the ~ 12 Mill. records. It is also supported to use Calculation Views as data source. This is especially useful wen you face a BWonHANA / BW/4HANA scenario. Here you can use the generated Calculation Views as your Data source.
In that case you maintain the the Calc. View centrally via the BW layer.
2. A standard Projection. Just a 1:1 Field Mapping.
3. My Data target transform has a Virtual Table as its target. This is the only way to be able to write back into a supported 3rd party database.
This means that you create your virtual Table on top of your physical table as you do when reading from a 3rd Party datasource into SAP HANA. In our special case we will use the virtual Table not in or Data Source transform, we will use it in our Data Target transform.
To have greater performance i configured the automatic Flow Graph partitioning having eight parallel partitions in use.
3.1 Target Table Setup
When writing into a supported 3rd Party database we are not able to use Template Tables for table generation. In the first place we have to know the structure of our target Table. Afterwards we have to create the table manually in the target database.
In my case this was pretty easy. Source and target structure are the same. For easier handling we can use the “Generate Create” function in our Database explorer.
We take this SQL command, adjust it in order to remove the SAP HANA specifics and paste it in a SQL Console of our target database. In this case the MS SQL Management Studio.
Finally we are able to create a Virtual Table on top of this physical Table. This Virtual Table will be used as target in our Data Target Transform.
3.2 Permissions on the target Database
In the usual setup, when we read data out of the source, it is sufficient enough to have read permissions. Write permissions is always required in the target. In a standard SDI scenario, SAP HANA is the target. In that case we need the write permissions here.
As, in our use case, the target is not a SAP HANA Database, it is the MS SQL Server, guess what?! Exactly! We need write permissions in our target.
At least on table level or, for easier maintenance, on Database level.
4. Write Back
4.1 Initial Discoveries
Having everything set up as described in chapter three i made these initial discoveries.
Beside the fact that i have configured eight partitions, the run- time was (from my point of view) quite long.
I went ahead and took a look into the Query Monitor were i saw the following picture:
1. Execution time for 1000 rows was between 192 – 255 ms.
2. Per INSERT Statement, via the Virtual Table, the adapter wrote exactly 1000 rows. Never more, never less. This is somewhat a very different behavior with respect to SELECT Statements on a Virtual Table.
3. The INSERT statements are generated based on my Partition settings. Same as when reading. In that case we have eight INSERT Statements each writing 1000 rows.
4.2 Findings
The Note states that the 1000 row fetch situation was by design in older versions of SAP HANA. It has been fixed with SAP HANA 2.0 SPS 02 rev. 024.01.
On top of this the behavior of the Virtual Table, when used for Write Back, is the following (Statement out of the above Note)
“[…]Fixed an issue, that writing to a remote virtual table was slower than reading from it, because on one hand the batch size was hard coded to 1000 and each batch was doing a prepare statement instead of caching it, and on the other hand no streaming was present, so each batch waited for reply before sending the next one. See SAP Note 2421290 for details.[…]”
As a matter of fact my deployment is on a higher release but i still faced this issue.
After getting in contact with some colleagues i got to know that even with newer versions of SAP HANA the behavior observed is still the default.
Nevertheless three parameters have been introduced with SAP HANA 2.0 SPS 02 rev. 024.01 which need to be activated.
4.3 Parameter settings
The three Parameter mentioned in 4.2 are:
‘useWritebackStreaming’
This parameter guarantees that the data will be send in a stream to the Virtual Table for Write Back instead as in batches.
‘cache_prepared_statements’
When activating this Parameter SAP HANA will not prepare the SQL Statement each time again.
‘prepared_statement_fetch_size’
Here we can adjust the 1000 row behavior in order to have more or even less.
4.3.1 Setting useWritebackStreaming
First i set the ‘useWritebackStreaming‘ parameter on my SAP HANA Database.
ALTER SYSTEM ALTER CONFIGURATION ('dpserver.ini', 'SYSTEM')
SET ('framework', 'useWritebackStreaming') = 'true'
WITH RECONFIGURE;
After setting this parameter the behavior of my Flow Graph looks as follows:
In the first place it doesn’t look like a big difference but…
1. Beside i configured eight partitions in the Flow Graph, i observed that sometimes i had less parallel INSERT Statements running.
I believe that this is due to the fact that the adapter is now continuously streaming the data and this is not ideally reflected in the Query Monitoring of the SAP HANA Studio.
At the end of the Day we have a slight improvement with regards to the run- time:
4.3.2 Setting cache_prepared_statements
Now i set the second parameter – ‘cache_prepared_statements‘ on my SAP HANA Database.
ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini', 'SYSTEM')
SET ('smart_data_integration', 'cache_prepared_statements') = 'true'
WITH RECONFIGURE;
After setting this parameter the behavior of my Flow Graph looks as follows:
1. Check the Execution Time for each 1000 row chunk. Much faster right now. Beforehand we had Execution Times up to 290 ms. Now we have them narrowed down to something between 39 – 58 ms.
This is due to the fact that HANA doesn’t prepare the statement each time. It does it ones and caches it afterwards.
2. Same situation as in chapter 4.3.1.
Now we end up much faster:
4.3.3 Setting prepared_statement_fetch_size
Now i set the third parameter – ‘prepared_statement_fetch_size‘ on my SAP HANA Database.
I will set it in four waves to check the results. First with 2000 rows, 5000 rows, 7000 rows and 10000 rows. Previous we used the default with 1000 rows.
ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini', 'SYSTEM')
SET ('smart_data_integration', 'prepared_statement_fetch_size') = 'XXXXX'
WITH RECONFIGURE;
The results are
Fetch size in rows | Execution Time per chunk of rows | Total execution Time of the FG |
1000 | 28 – 45 ms | 376754 ms |
2000 | 62 – 107 ms | 377363 ms |
5000 | 105 – 212 ms | 374468 ms |
7000 | 206 – 347 ms | 383563 ms |
10000 | 290 – 425 ms | 412557 ms |
We can see that with a higher Fetch Size we are getting higher Execution Times per chunk as well as a higher overall execution time.
In our test we can see that with a fetch size of 5000 rows we have the best performance. Even better as with the default of 1000 rows.
This shows us that there is no general rule of thumb when adjusting this parameter. You have to play a bit with it in your Environment.
4.4 Target database Write speed
4.4.1 Write speed
The write speed should be below the roundtrip speed that we see in chapter 4.3.2. Outerwise we will end up in queueing events (4.4.2) and wait events (4.4.3).
I checked the average write speed for my database file using the MS SQL Server Management Studio.
4.4.2 DP Agent queueing events
After setting the Write back optimized Parameter we have the situation that SDI send`s the data pretty fast to the target Database. If the target Database is not capable to write the records accordingly fast, it will end up in queueing situations inside the DP Agent.
2018-12-04 03:24:21,198 [INFO ] DPFramework | ResponseQueue.run – AHRQ(176): Stopped
2018-12-04 03:24:21,198 [INFO ] DPFramework | ResponseQueue.run – AHRQ(177): Stopped
2018-12-04 03:24:21,198 [INFO ] DPFramework | ResponseQueue.close – AHRQ(176,177): [Pool: max=1.71gb,cur=1.18kb,peak=3.56kb,tot=6.01mb; Queue: cur=0(0b),peak=2(1.76kb),tot=501(293.55kb)]
What you see above is a snipset from the framework trace File. In our case we dont have any queueing events because the target Database was able to write the records fast enough.
If you face queueing events check the framework trace file as well as the DP Agent process. Ideally with the tool “Process Explorer” on Windows or the “top” command on Unix. While queueing the DP Agent spawns a lot of processes and the main memory consuption increases.
4.4.3 MS SQL Wait events
As discussed in 4.2.1 wait events happen when the target Database is not capable of writing the data fast enough. In case of the MS SQL Server you can check the Activity Monitor out of the Management Console.
1. Here you see an overall view of all Wait Events througout the whole SQL Server instance. The less – the better.
2. Running processes of the SQL Server. Here you see our DP Agent connections
3. Database I/O files overview. Here you can see the read/write speed of the database Files. Here you also find the Response time of the Database files. The less – the better.
5. Appendix
Additional Information related to this Topic.
5.1 Enhance performance
By applying these Write Back optimized parameters we were able to increase the performance. There are still feature and functions available to even further increase the performance.