Architecture pattern for real time processing in Bigdata


The flow is as follows

1) Real time data is ingested into system via stream ingestion tools like Flume , Kafta or Samza. The choice of which tool to be used among these dependent on factors like do we need event guaranteed delivery , are we bothered about sequence of events. etc. This is topic for another posr

2) Apply the processing required on the incoming data via Spark streaming api and make it ready to be consumable by third party apps

3) Using the REST gateway job server of Spark , third party apps trigger further spark jobs to process the data and return results back to the applications.

Sync folder between Android Tablet or Phone and computer



To synchronize the contents of one folder between my android tablet or phone with folder inside my computer

Softwares used


Some FTP server

In windows i used Filezilla server

See guide here

Read about different sync modes

In Android

Download the application

I was able to setup FTP server and sync between the folder in 10 mins

Please follow the tutorial here

Now i two folders synchronize each other every 20 seconds.

How to see total folder size in Windows


Download folder size software

On lower right side you will have icon to allow display of folder size.

It will appear something like this





On each folder you will have window like shown below



Install protobuf 2.5 on Windows


Download prorobuf from

Extract it to some location

Add it to your path

After that you can see its installed

C:\Users\jj\Desktop>which protoc

Tip : I use Rapid Environment Editor to manage all my environment variables. You will like it , read about it here.

Set windows environment variables via script command line


When using windows , one the feature which i miss is use of scripts to do lots of things we can do in Linux

To set the deterministic development environment rapidly i use the Rapid Environment Editor

You can use the bat script to manage all environment variables.

Here is example


rem This is a comment
rapidee -C JAVA_HOME
rem rapidee -S JAVA_HOME "C:\Program Files\Java\jdk1.8.0"
rapidee -S JAVA_HOME "C:\Program Files\Java\jdk1.7.0_51"
rapidee -S MVN_HOME G:\dev\tools\apache-maven-3.1.1
rapidee -S SCALA_HOME G:\dev\tools\scala\scala-2.10.4
rapidee -S Path
rem rapidee -A -E Path "C:\Program Files\Java\jdk1.8.0\bin"
rapidee -A -E Path "C:\Program Files\Java\jdk1.7.0_51\bin"
rapidee -A -E Path G:\dev\tools\apache-maven-3.1.1\bin
rapidee -A -E -C Path G:\dev\tools\protoc-2.5.0-win32
rapidee -A -E -C Path G:\dev\tools\0.5.1_windows_amd64_packer
rapidee -A -E -C Path C:\cygwin64\bin
rapidee -A -E -C Path "C:\Program Files (x86)\Git\bin"
rapidee -A -E -C Path G:\dev\tools\scala\scala-2.10.4\bin


Save this above file as set_env.bat

You can do lots of things via batch scripting.

Download the Rapid Environment editor from

Install it and add it to your path.

Now whenever you want to set windows environment variable you can just add it to above set_env.bat script and execute

Here is what it looks like on my system



Does not contain a valid host:port authority: logicaljt


Exception in thread "main" java.lang.IllegalArgumentException: Does not contain a valid host:port authority: logicaljt


Check that your hadoop supports the Job Tracker HA
Check the conf files of Hadoop

Windows 8 icon location


Can not remove eclipse.exe Backup failed

While updating eclipse i got the below error

Backup of file G:\dev\tools\eclipse\eclipse.exe failed.
Can not remove : G:\dev\tools\eclipse\eclipse.exe

So to resolve this.

Start eclipse
Rename eclipse.exe to eclipse_bkup
Start Update again
Restart eclipse

Java 8 framework and Tools support


Following post if the draft , i am going to update it on ongoing basis as part of my learnings.

Updated 26 March 2014

Development tools

Eclipse Kelper SR2 supports Java8


Application servers


People have been running Java 8 with Tomcat

JBoss Wildfy

Comes with basic Java 8 support



Spring Framework 4.0 provides support for several Java 8 features. You can make use of lambda expressions and method references with Spring’s callback interfaces. There is first-class support for java.time (JSR-310), and several existing annotations have been retrofitted as @Repeatable. You can also use Java 8’s parameter name discovery (based on the -parameters compiler flag) as an alternative to compiling your code with debug information enabled.


Graphing libraries


Following post if the draft , i am going to update it on ongoing basis as part of my learning's.

Javascript based library awesome for working with graphs and social network types of data structures.


Change cygwin username


Go to


open the /etc/passwd file

Change the username to required need

Setup Ipython notebook over ssh tunnel


Ipython is running in remote machine which has no graphical GUI access. We want to use Ipython notebook

Solution :

Use SSH Tunnel to make
We can use following command to start ipython if we want to use tunnel

ipython notebook --no-browser --port=7500

Now configure the local system to use proxy SSH tunnel

If you are using Windows as local machine then

You can configure the Putty to use SSH Tunnel

Go to Tunnel
Enter Source port say 9999
Select Destination Dynamic

Click Add
It would look like as shown below

Open your Putty session and Login with your username and password to the remote server with this newly created tunnel settings.

In Browser use ( Use firefox )

Proxy as
This should allow to access the Ipython as

Or you can use command line option also

Few links

How to install Anaconda python unattended in linux at custom location?


Use commands

$bash -b -p /home/vagrant/simple_hadoop/anaconda

By default it asks

Location confirmation
Path env confirmation

I want to install without confirmation unattended via linux bash script at custom location

If location is not possible


bash illegal option -- y
Error: did not recognize option, please try -h
vagrant@precise32:/vagrant/binaries/anaconda$ bash -h
usage: [options]

Installs Anaconda 1.9.1

    -b           run install in batch mode (without manual intervention),
                 it is expected the license terms are agreed upon
    -h           print this help message and exit
    -p PREFIX    install prefix, defaults to /home/vagrant/anaconda

Making oozie hbase work with Kerberos enabled cluster

At the top of workflow add

  <credential name='hbaseauth' type='hbase'>

Within any action add the details about credentials.

<action name="process" cred="hbaseauth">

Also add details about hbase-site.xml


Complete example

<action name="process" cred="hbaseauth">

        <ok to="success"/>
        <error to="failed"/>


Installing software from source in (Ubuntu) Linux

Installing software from source in (Ubuntu) Linux

Download the tar.gz ball for the software

Extract it

Read the Readme for build related instructions


gunzip wget-1.11.4.tar.gz
cd wget-1.11.4
make install

Find which , where is software or library installed in (Ubuntu)Linux

Find which , where is software or library installed in (Ubuntu)Linux

$which gcc

$whereis gcc

$locate signal.h

Locate searches the periodic index which is made by cron


Find which version of rpm is installed for Redhat or Centos

$rpm -q python

Find which version of deb is installed


How to install software in Linux

How to install software in Ubuntu (Linux)


$apt-get install python


$yum install python


$yast --install python


/opt/csw/bin/pkgutil --install python