The OJAI Document Lifecycle in MapR-DB – Whiteboard Walkthrough

In this week's Whiteboard Walkthrough, Bharat Baddepudi, engineer on the MapR-DB team, explains how documents in MapR-DB are inserted and updated.

Here's the unedited transcription: 

Hello folks, welcome to Whiteboard Walkthrough. My name is Bharat Baddepudi. I'm an Engineer in MapR-DB team at MapR. Today, I'll be talking about OJAI document processing and the new feature that we have come up with. OJAI stands for Open JSON Application Interface and we'll be seeing how that can be used to build document database based applications with flexible schema and data type other content.

Let's start at the user application. The user application will integrate with our APIs to create a JSON document which is of this form which has a bunch of key values with data types. For example, there's an integer, there's a string. It can have complex data type like array or nested map. Once the document is created using the OJAI interfaces, they will be send to the server and stored in the backend. We have the OJAI interfaces which provides the insert or creation of a document. There are two more ways a document like this can be inserted into the database. One is the synchronous mode in which each row or document is sent and acknowledged by the server and the other is where the client batches them for you and sends it in one big to the server and it gets stored.

Once the document has been created, there could be other additional information that comes up for the document. For example, in this case, the customers' zip code might be known later on and they might to add it in. That operation is performed using a data operation using a mutation. You can update actually multiple field within same update operation and they get sent to the server. There are two ways of implementing this, some of the databases out there, take the updated object, send it to server but the real world is existing out there in the server and then modified and write it back. It is a read modify write which is a little costly operation.

For us, we have optimized it so that when update comes to server, it makes a note of it, saves it on the backend and then returns it immediately. There is read whatever is not there for the update and when the read comes, the data is merged back and sent to the client. That does improve the performance of update but there is no read or write there and the read will take care of the creating and maintaining the full document on the demand.

Those and, of course, you can delete fields or delete the whole row based on the ID. To find the document, the user can request it using OJAI APIs too, find the document using the given ID. This will return the full that has been stored in the backend but there are operations which require only a certain portion of the fields which also satisfy a given condition. For example, in this case, you wanted to find all the documents in which the customer age was greater than or equal to 25 and the city was San Jose. Then, you would send it as a condition which is purely executed on the server side and only the subset of the data that is needed will be returned back to the client, thus, reducing the network and other resources.

Those are some of the changes we have done keeping in mind that doing work on the server side in a more control fashion to reduce the resource consumption will give better performance to the end user and help them achieve their target user base. Thank you for talking your time to watch the video. Thanks.



Driving The Next Generation Data Architecture
This paper examines the emergence of Hadoop as an operational data platform, and how complementary data strategies and year-over-year adoption can accelerate consolidation and realize business value.

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free