Monday, June 27, 2016

Rise of NoSQL

In the age of digital transformation, there has been a proliferation of social media, mobile, cloud, IoT (Internet of Things) in last couple of years. People and businesses are using these mediums/technologies with others/customers to drive their business and stay connected. These technologies are always available, deliver high customer experience and support large number of concurrent users. Users perform billions of interaction on these platforms thereby generating humongous data. This data is usually unstructured and heterogeneous in nature.

The earlier Database systems aka RDBMS are finding it difficult to store and process such large amounts of heterogeneous data as businesses want (near) real time data management which heightens the level of scalability and speed requirements. This is where users/organization are turning to NoSQL databases as they provide: 
  • Better application development productivity through more flexible data model
  • Greater ability to scale dynamically to support more users and data
  • Improved performance to satisfy expectations of users wanting highly responsive applications and to allow more complex processing of data


What is NoSQL?
  • Some call it Not only SQL, some call it non SQL. However the term NoSQL gained popularity in early 21st century
  • It represents class of non-relational data storage systems
  • These databases usually do not require a fixed table schema nor do they use joins
  • All NoSQL offerings relax one or more of the ACID (The famous RDBMS acronym) properties and follow BASE properties and principles of CAP Theorem 
BASE properties
  • Basically Available: possibilities of faults but not fault of whole system
  • Soft state: copies of data item may be consistent
  • Eventually consistent: copies becomes consistent at some later time if there are no more updates to that data item
CAP Theorem (Brewers' Theorem)
Computer scientist Eric Brewer proposed that a distributed system can achieve any two of below simultaneously 
  • Consistency: all nodes of a system are in a consistent state after the execution of an operation and see the same data at the same time
  • Availability: clients can always read and write data in a specific period of time
  • Partition Tolerance: the ability of the system to continue operation in the presence of network partitions
Most of the the large system will partition at some point, therefore it is mainly to decide between consistency and availability. Traditional databases (RDBMS) prefer consistency over availability and partition tolerance whereas most web application choose availability.





NoSQL - Different Data models 

NoSQL databases leverages different data models based on the target functionality/use case. Some of the popular ones are:
  • Key-Value store: Redis, MemecachedDB, BerkeleyDB
  • Column Store: Cassandra, Hbase
  • Document Store: MongoDB, CouchDB, Terrastore
  • Graph Database: OrientDB, Neo4J, InfineGraph

  


Benefits of NoSQL
  • Easy to implement
  • Can scale horizontally and vertically
  • Quickly process large amounts of data
  • Flexibility due to schema-less design
  • Relax the data consistency requirements (CAP)
  • Can easily handle large web scale heterogeneous data

Cons of NoSQL
  • Data is generally duplicated, potential for inconsistency
  • No standardize Schema
  • No standard format for queries
  • Difficult to impose complicated structures
  • Depend of application layer to enforce data integrity









Saturday, June 18, 2016

This code is crappy!

Have you found yourself looking at code and thinking what a crappy code? Why can't people write good code/follow best practices? Some of the common issues you may observe are Unused methods, unused imports, unused variables, poor formatting, static access etc. You ask dev folks and some of the common statements you may hear "it is difficult to complete functionality, when do i look at best practices", "code has undergone changes so many times, it is difficult to take care of such things". Majority of you may have gone through this situation. 

How do we tackle this situation? 

Here comes Eclipse to our rescue. There are many configuration in eclipse that can take care of such things for us. Let's figure out some of those.     

Removing Unused Imports  Ctrl + Shift + O. Simple enough! 

Removing Unused private / local variables, methods etc.

1 - Go to Windows > Preferences > Java > Code Style > Cleanup

2 - Create New Eclipse Profile. Give a new name in Profile Name text box and click OK.

3 - Select appropriate cleanup actions in "Unnecessary code" tab that you want to use and click OK. In addition to “Unnecessary code”, actions from other tabs can also be configured based on project need/best practices.




4 - Your newly created profile will show up Active Profile drop-down. If not, select the profile that you created and click OK.

5 - Now you can use this newly created eclipse profile. During development, select a project in "Project Explorer" and click Source à Cleanup. Eclipse will apply your profile and perform appropriate code clean up/formatting.


Isn't it easy to keep such issues at bay by simply configuring Eclipse? Happy coding!



Memcached for Java Enterprise Application

Caching is a mechanism to store copy of some piece of data/information so application can reuse it later without making expensive service or backend call. The main goal of caching is to increase the performance of the portal/web application. Data can be cached in different ways i.e. query, objects etc. and can be stored in different mediums i.e. file, db, memory Some known and commonly used java caching frameworks are: 
  • EHCache
  • OSCache
  • Gemfire
Java caching frameworks like EHCache & OSCache are essentially HashMap objects in application code. Every new object added to cache is added to application memory. This strategy works fine for storing small amounts of data, but not so good for storing large amount of data (few gigabytes) due to performance hit.

Memcached server leverages distributed architectural approach allowing system scalability. As a result, huge amount of data can be stored in Memcached. It is an open-source, distributed memory caching system aimed at reducing heavy database loads and improving application performance by adding a scalable object-caching layer. It
  • Significantly reduces the number of retrieval requests to database.
  • Uses RAM for storage.
  • Acts as a dictionary of stored data with key/value pairs.

The Memcached client (Client) take an object to be cached, serialize it, and send a byte array to the Memcached server (Server) for storage. To fetch a cached object, we can call the client's get() method. The client will receive get request, serialize it and send to server, which will lookup object from cache. After lookup, byte array is sent to client, which de-serialize it and send to requesting application. 

Memcached Server
  • is a process, which handles data storage where it is running
  • can run on same machine where application is running or different machine can be used as server
  • use hashing algorithm to validate key to return appropriate object value
  • don't share data between multiple servers
Memcached Client
  • is programming language dependent
  • uses hash algorithm to determine server where data needs to be stored
  • can use max 250 byte long keys
  • Accepts keys having no space in key name
Memcached cache provides following advantages:
  • Supports multiple languages e.g. C/C++, PHP, Java, Python, Ruby, Perl, .Net, Erlang etc
  • Stores data in hash tables thereby improves fetch speeds
  • Uses LRU algorithm to purge stale data and retain frequently used data automatically
  • Allows storing of upto 1MB data as object value in cache making it ideal for objects like user profile, social graph etc.
  • Uses hash (of key) algorithm to determine the server to store/fetch object value by client 
  • Provides constant fetch time i.e. O(1) for each memcached operation

Is Memcached good fit for every project? Short answer is NO. Every project has specific use cases that can be satiated by preparing a Memcached solutions.
When to use?

  • To store application which is constant system wide
  • To store small chunk of data (upto 1MB) in large amount

When not to use?
  • If there is a need for data replication
  • If there is a need to backup cached data
Some of the organizations using Memcached are