Overview
The SAP HANA JSON Document Store (also known as DocStore or Document Store) is a new feature which has been introduced with SAP HANA 2.0 SPS 01. The new store combines a relational and document-oriented database to a hybrid innovative technology which is unique for a variety of reasons, namely, its ACID compliant, fully integrated with SAP HANA in terms of access/query and administrative capabilities.
The embedded Document Store belongs to the group of “NoSQL” databases, more precisely to the document-oriented ones. These type of storing technologies are storing semi-structured documents (most JSON or XML) in collections without an explicit structure which offers high flexibility and compactness.
Terms
Beside the known terms like tables or schemas, this blog and the documentation of the Document Store uses some (new) terms which will be explained as follows:
Semi-structured data: Data which is not fixed in its structure but has the structure information in itself. In contrary, structured data like tables has a constant or fixed structure which must be defined before inserting data.
Collection: A collection holds multiple documents and is assigned to a schema. This is comparable to a table with the difference that a collection doesn’t have a predefined structure (column definition).
Document: A document in the Document Store is a semi-structured document in the JSON format. Such a document is like a row in a table. In this analogy the keys of the JSON document are the columns of the table.
Statement Examples
Since the Document Store is being used in relational database context, SQL is used as the query language. For that some new expressions and keywords where introduced to enrich SQL with the needs of the Document Store. In the following section the most commonly used statements are illustrated.
Enablement of the Document Store
Since the document store is implemented as an additional store in SAP HANA that comes with its own process, it has to be enabled by the administrator in the SYSTEMDB for a specific tenant.
ALTER DATABASE <database> ADD 'docstore';
Create a collection
This statement creates a new collection called MyCollection into the current schema. This is like CREATE TABLE, but without defining the column characteristics. Users can create as many collections as needed.
CREATE COLLECTION MyCollection;
Drop collection
By using the DROP COLLECTION statement, the whole collection will be deleted. This statement behaves like the known DROP statements.
DROP COLLECTION MyCollection;
Insert
The insert statement of the document store takes one JSON document as an argument without an optional column definition. The newly document must be valid JSON, but documents may have different identifiers or structure.
INSERT INTO MyCollection VALUES({
"name":'John Doe',
"address": {
"city": 'Berlin',
"street": 'Street 22'
}
});
Select
Selecting values from a collection is similar to the selection from a table. Furthermore, it is possible to access nested fields via a path by using the dot operator. The statement is tolerant to non-existing fields.
SELECT "name", "address"."city" AS "city" FROM MyCollection WHERE "name" = 'John Doe';
This returns a result set with the columns name and city where the name equals John Doe.
Update
To perform updates on the data, the update statement should be used. Beside the simple updating of values, this operation can be used for adding or deleting field or for replacing whole documents.
UPDATE MyCollection SET "address"."city" = 'Munich' WHERE "name" = 'John Doe';
Delete
As the statement name implicates, it deletes documents from a collection.
DELETE FROM MyCollection WHERE "name" = 'John Doe';
Conclusion
SAP HANA already provides capabilities for graph, spatial, hierarchies and for relational tables of course. By introducing a document store the set of capabilities is enriched. This enables applications that are built on SAP HANA to use the best from each database technology. Especially they can mix different technologies with the well-known relational world in an intuitive way. This leads to many advantages, such as the ability to use a flexible and dynamic kind of storing data and the availability of using both database technologies at the same time. Overall it reduces administration overhead since only one database needs to be maintained and offers innovative development.