Mistakes are proof that you are trying: 2016

Monday, June 27, 2016

Rise of NoSQL

In the age of digital transformation, there has been a proliferation of social media, mobile, cloud, IoT (Internet of Things) in last couple of years. People and businesses are using these mediums/technologies with others/customers to drive their business and stay connected. These technologies are always available, deliver high customer experience and support large number of concurrent users. Users perform billions of interaction on these platforms thereby generating humongous data. This data is usually unstructured and heterogeneous in nature.

The earlier Database systems aka RDBMS are finding it difficult to store and process such large amounts of heterogeneous data as businesses want (near) real time data management which heightens the level of scalability and speed requirements. This is where users/organization are turning to NoSQL databases as they provide:

Better application development productivity through more flexible data model
Greater ability to scale dynamically to support more users and data
Improved performance to satisfy expectations of users wanting highly responsive applications and to allow more complex processing of data

What is NoSQL?

Some call it Not only SQL, some call it non SQL. However the term NoSQL gained popularity in early 21st century
It represents class of non-relational data storage systems
These databases usually do not require a fixed table schema nor do they use joins
All NoSQL offerings relax one or more of the ACID (The famous RDBMS acronym) properties and follow BASE properties and principles of CAP Theorem

BASE properties

Basically Available: possibilities of faults but not fault of whole system
Soft state: copies of data item may be consistent
Eventually consistent: copies becomes consistent at some later time if there are no more updates to that data item

CAP Theorem (Brewers' Theorem)
Computer scientist Eric Brewer proposed that a distributed system can achieve any two of below simultaneously

Consistency: all nodes of a system are in a consistent state after the execution of an operation and see the same data at the same time
Availability: clients can always read and write data in a specific period of time
Partition Tolerance: the ability of the system to continue operation in the presence of network partitions

Most of the the large system will partition at some point, therefore it is mainly to decide between consistency and availability. Traditional databases (RDBMS) prefer consistency over availability and partition tolerance whereas most web application choose availability.

NoSQL - Different Data models

NoSQL databases leverages different data models based on the target functionality/use case. Some of the popular ones are:

Key-Value store: Redis, MemecachedDB, BerkeleyDB
Column Store: Cassandra, Hbase
Document Store: MongoDB, CouchDB, Terrastore
Graph Database: OrientDB, Neo4J, InfineGraph

Benefits of NoSQL

Easy to implement
Can scale horizontally and vertically
Quickly process large amounts of data
Flexibility due to schema-less design
Relax the data consistency requirements (CAP)
Can easily handle large web scale heterogeneous data

Cons of NoSQL

Data is generally duplicated, potential for inconsistency
No standardize Schema
No standard format for queries
Difficult to impose complicated structures
Depend of application layer to enforce data integrity

Saturday, June 18, 2016

This code is crappy!

Have you found yourself looking at code and thinking what a crappy code? Why can't people write good code/follow best practices? Some of the common issues you may observe are Unused methods, unused imports, unused variables, poor formatting, static access etc. You ask dev folks and some of the common statements you may hear "it is difficult to complete functionality, when do i look at best practices", "code has undergone changes so many times, it is difficult to take care of such things". Majority of you may have gone through this situation.

How do we tackle this situation?

Here comes Eclipse to our rescue. There are many configuration in eclipse that can take care of such things for us. Let's figure out some of those.

Removing Unused Imports Ctrl + Shift + O. Simple enough!

Removing Unused private / local variables, methods etc.

1 - Go to Windows > Preferences > Java > Code Style > Cleanup

2 - Create New Eclipse Profile. Give a new name in Profile Name text box and click OK.

3 - Select appropriate cleanup actions in "Unnecessary code" tab that you want to use and click OK. In addition to “Unnecessary code”, actions from other tabs can also be configured based on project need/best practices.

4 - Your newly created profile will show up Active Profile drop-down. If not, select the profile that you created and click OK.

5 - Now you can use this newly created eclipse profile. During development, select a project in "Project Explorer" and click Source à Cleanup. Eclipse will apply your profile and perform appropriate code clean up/formatting.

Isn't it easy to keep such issues at bay by simply configuring Eclipse? Happy coding!

Memcached for Java Enterprise Application

Caching is a mechanism to store copy of some piece of data/information so application can reuse it later without making expensive service or backend call. The main goal of caching is to increase the performance of the portal/web application. Data can be cached in different ways i.e. query, objects etc. and can be stored in different mediums i.e. file, db, memory Some known and commonly used java caching frameworks are:

EHCache
OSCache
Gemfire

Java caching frameworks like EHCache & OSCache are essentially HashMap objects in application code. Every new object added to cache is added to application memory. This strategy works fine for storing small amounts of data, but not so good for storing large amount of data (few gigabytes) due to performance hit.

Memcached server leverages distributed architectural approach allowing system scalability. As a result, huge amount of data can be stored in Memcached. It is an open-source, distributed memory caching system aimed at reducing heavy database loads and improving application performance by adding a scalable object-caching layer. It

Significantly reduces the number of retrieval requests to database.
Uses RAM for storage.
Acts as a dictionary of stored data with key/value pairs.

The Memcached client (Client) take an object to be cached, serialize it, and send a byte array to the Memcached server (Server) for storage. To fetch a cached object, we can call the client's get() method. The client will receive get request, serialize it and send to server, which will lookup object from cache. After lookup, byte array is sent to client, which de-serialize it and send to requesting application.

Memcached Server

is a process, which handles data storage where it is running
can run on same machine where application is running or different machine can be used as server
use hashing algorithm to validate key to return appropriate object value
don't share data between multiple servers

Memcached Client

is programming language dependent
uses hash algorithm to determine server where data needs to be stored
can use max 250 byte long keys
Accepts keys having no space in key name

Memcached cache provides following advantages:

Supports multiple languages e.g. C/C++, PHP, Java, Python, Ruby, Perl, .Net, Erlang etc
Stores data in hash tables thereby improves fetch speeds
Uses LRU algorithm to purge stale data and retain frequently used data automatically
Allows storing of upto 1MB data as object value in cache making it ideal for objects like user profile, social graph etc.
Uses hash (of key) algorithm to determine the server to store/fetch object value by client
Provides constant fetch time i.e. O(1) for each memcached operation

Is Memcached good fit for every project? Short answer is NO. Every project has specific use cases that can be satiated by preparing a Memcached solutions.

When to use?

To store application which is constant system wide
To store small chunk of data (upto 1MB) in large amount

When not to use?

If there is a need for data replication
If there is a need to backup cached data

Some of the organizations using Memcached are

Mistakes are proof that you are trying

Navigation

Monday, June 27, 2016

Rise of NoSQL

Saturday, June 18, 2016

This code is crappy!

Memcached for Java Enterprise Application

Blog Archive