Jugnu Life :-): March 2014

Architecture pattern for real time processing in Bigdata

3Vrfc+I2EP5reOyMweCER+CgufbauR430+mjwALU2BYjy0nIX38rrBW2ZKghtpMrwzBojdf6vv0piZ4/i19+FWS/+4OHNOoNvPCl53/qDQb3ngefSnCwBFvBwlzU14KMhTQtiSTnkWT7snDNk4SuZUm24VFZ2Z5sqSNYrknkSv9modzpyQ2Ck/yBsu0OH9MPxvmVVB5QR0g3JIvkL0cRXFOXY4K6NMwXLx8Gd/n4oMd9Tyvck6Q0pVfO45JA0JS9lqe9YXpe+iErLkIqSqKIJY9F3vw5GElwDjeqb/HLjEbKUGiE/LbFmauGLkGT0qPP3aDpeCJRpqfu0Pe8Y5Iu92Stxs/gPD1/umFRNOMRz7H489Fstli4z9bTeaJCUu1oJ5DgiZTHVIqDoh/tp2nRXjjytTmeT8YPAv2bXdHwyCfRZG6N7hN6+KIJqCZDW/o6MhoAfW+BRh8sgEZ4RcxNQEa+O8c8tjDfdYgZZ9sx5gGmVMR83yHmmwK9Acx2QI87xOy/E2acvMYceB1i1oWxgBnQiUcQ/cZX8PltvvyuKKACAN2S68fjBbwaoGk0LNPkV+T6gV/BE6bLN/Gkn3XJN9Id2auvWRxN1lLhnyp0DLqTL2RFo688ZZJx1RasuJSqHTA/mERsqy5I3oRH3VlMDSqqYlVRxITzJqbuz3rU8q8vt3jQdDz3JqNGiojWgrwMXV5MY1AkBsl6Cy9muhc8iIbQzOphwhN6FrP64UXERURo1UpTCxoRyZ7KfWgVTv2Mr5zBXE65C6nRlA4xUFBFyjOxpvquYmNpKRrZxQ59FhVJIrZUOoqOtBvg9SwxdCzxOVnzmCVbtaAhkjiGAQcD4qZER+ka7KH6cid8YxaG6p5pE2UBcxkyUuWtVbbFPPkWb/V1pFzy1q3g2RXZyqzwyAo1HCd6PuHbBIxcAjC3N53vBzXwwx2whlXWrtEVGEL/2/4dlH1cD78HPLxaNi4OuwBfo5a3DN7q+HDYBXi3PHcN3gprHHYBvsaKvWXwVhOLww7AI8/vCH5kJfQOwdfpvtoFb3VKOOwCfI1sT4Tgzz9Bv2ktbQJPY7u637Tqj6OouX4T16kF+r8LkqQbLmIaHvd0gQ+4zqDx9HCxlEpBSd6TfoBm1LdCN8A+v5Nm1K1ZvlDEASSpJjnZw1nC+XDGhfk64hmorBPe1+9OWGvLfkV8V64tm+hW/Tp17adbWwbWnu/QzoV1Y91WZBYWzcc6uvv/zBJW7TKV+2pLWIr8Fi3hVvyHT4vlLRtQ6rBqbI71GjyuCipWtJXHVU1sQA3dLf1/Jt/+vIUQtaPbCiF3FcvA9ghx9/sfshgKM8gyKBx6w9JbkfRYpmE041Brofp+hJJsqgkGZcVaorWSPHT30JaHVNJYkZdmqxhOsgfev3ylxnljk58kfADiAmur0ThhF8T1bynV0MJIwR/NXxo0+R+6UbcjGw/yri0ZtqIAT7tb2Bh2izdsDAsaA8vk+MNkS9PjEY7KEMfeHFYN/gs5Oqb5uoiyGNKHGf8+2ag9ZdAObyNdkvgVpAUFl2Mjohs1bHXLeVRuYs2felqIDBie/raSG+v0RyN//gM=

The flow is as follows

1) Real time data is ingested into system via stream ingestion tools like Flume , Kafta or Samza. The choice of which tool to be used among these dependent on factors like do we need event guaranteed delivery , are we bothered about sequence of events. etc. This is topic for another posr

2) Apply the processing required on the incoming data via Spark streaming api and make it ready to be consumable by third party apps

3) Using the REST gateway job server of Spark , third party apps trigger further spark jobs to process the data and return results back to the applications.

Sync folder between Android Tablet or Phone and computer

Requirement

To synchronize the contents of one folder between my android tablet or phone with folder inside my computer

Softwares used

Computer

Some FTP server

In windows i used Filezilla server

See guide here

http://www.addictivetips.com/windows-tips/how-to-setup-personal-ftp-server-using-filezilla-step-by-step-guide/

Read about different sync modes

https://sites.google.com/a/sergey-bogdanov.info/pcfilesync/tutorial-1/syncmodes

In Android

Download the application

https://play.google.com/store/apps/details?id=org.pcfilesync&hl=en

I was able to setup FTP server and sync between the folder in 10 mins

Please follow the tutorial here

https://sites.google.com/a/sergey-bogdanov.info/pcfilesync/tutorial

Now i two folders synchronize each other every 20 seconds.

How to see total folder size in Windows

Download folder size software

http://sourceforge.net/projects/foldersize/

On lower right side you will have icon to allow display of folder size.

It will appear something like this

On each folder you will have window like shown below

Details

http://www.intowindows.com/how-to-view-folder-size-in-windows-8-1-explorer/

Install protobuf 2.5 on Windows

Download prorobuf from https://code.google.com/p/protobuf/downloads/detail?name=protoc-2.5.0-win32.zip&can=2&q=

Extract it to some location

Add it to your path

After that you can see its installed

C:\Users\jj\Desktop>which protoc
/cygdrive/g/dev/tools/protoc-2.5.0-win32/protoc

Tip : I use Rapid Environment Editor to manage all my environment variables. You will like it , read about it here.

Set windows environment variables via script command line

When using windows , one the feature which i miss is use of scripts to do lots of things we can do in Linux

To set the deterministic development environment rapidly i use the Rapid Environment Editor

You can use the bat script to manage all environment variables.

Here is example

set_env.bat

rem This is a comment
rapidee -C JAVA_HOME
rem rapidee -S JAVA_HOME "C:\Program Files\Java\jdk1.8.0"
rapidee -S JAVA_HOME "C:\Program Files\Java\jdk1.7.0_51"
rapidee -S MVN_HOME G:\dev\tools\apache-maven-3.1.1
rapidee -S SCALA_HOME G:\dev\tools\scala\scala-2.10.4
rapidee -S Path
rem rapidee -A -E Path "C:\Program Files\Java\jdk1.8.0\bin"
rapidee -A -E Path "C:\Program Files\Java\jdk1.7.0_51\bin"
rapidee -A -E Path G:\dev\tools\apache-maven-3.1.1\bin
rapidee -A -E -C Path G:\dev\tools\protoc-2.5.0-win32
rapidee -A -E -C Path G:\dev\tools\0.5.1_windows_amd64_packer
rapidee -A -E -C Path C:\cygwin64\bin
rapidee -A -E -C Path "C:\Program Files (x86)\Git\bin"
rapidee -A -E -C Path G:\dev\tools\scala\scala-2.10.4\bin

Save this above file as set_env.bat

You can do lots of things via batch scripting. http://en.wikibooks.org/wiki/Windows_Batch_Scripting

Download the Rapid Environment editor from http://www.rapidee.com/

Install it and add it to your path.

Now whenever you want to set windows environment variable you can just add it to above set_env.bat script and execute

Here is what it looks like on my system

Does not contain a valid host:port authority: logicaljt

Error

Exception in thread "main" java.lang.IllegalArgumentException: Does not contain a valid host:port authority: logicaljt

Solution

Check that your hadoop supports the Job Tracker HA

Check the conf files of Hadoop

Windows 8 icon location

%windir%\System32\imageres.dll

Can not remove eclipse.exe Backup failed

While updating eclipse i got the below error

Backup of file G:\dev\tools\eclipse\eclipse.exe failed.

Can not remove : G:\dev\tools\eclipse\eclipse.exe

So to resolve this.

Start eclipse

Rename eclipse.exe to eclipse_bkup

Start Update again

Restart eclipse

Done

Java 8 framework and Tools support

Following post if the draft , i am going to update it on ongoing basis as part of my learnings.

Updated 26 March 2014

Development tools

Eclipse Kelper SR2 supports Java8

Application servers

Tomcat

People have been running Java 8 with Tomcat

http://tomcat.apache.org/whichversion.html

JBoss Wildfy

Comes with basic Java 8 support

http://www.wildfly.org/

Frameworks

Spring

Spring Framework 4.0 provides support for several Java 8 features. You can make use of lambda expressions and method references with Spring’s callback interfaces. There is first-class support for java.time (JSR-310), and several existing annotations have been retrofitted as @Repeatable. You can also use Java 8’s parameter name discovery (based on the -parameters compiler flag) as an alternative to compiling your code with debug information enabled.

http://docs.spring.io/spring/docs/current/spring-framework-reference/html/new-in-4.0.html

Links

http://spring.io/blog/2014/03/21/java-8-in-enterprise-projects

Graphing libraries

Following post if the draft , i am going to update it on ongoing basis as part of my learning's.

Javascript based library awesome for working with graphs and social network types of data structures.

http://sigmajs.org/

#draft

Change cygwin username

Go to

C:\cygwin64\etc

open the /etc/passwd file

Change the username to required need

Setup Ipython notebook over ssh tunnel

Problem:

Ipython is running in remote machine which has no graphical GUI access. We want to use Ipython notebook

Solution :

Use SSH Tunnel to make

We can use following command to start ipython if we want to use tunnel

ipython notebook --no-browser --port=7500

Now configure the local system to use proxy SSH tunnel

If you are using Windows as local machine then

You can configure the Putty to use SSH Tunnel

Go to Tunnel

Enter Source port say 9999

Select Destination Dynamic

Click Add

It would look like as shown below

Open your Putty session and Login with your username and password to the remote server with this newly created tunnel settings.

In Browser use ( Use firefox )

Proxy as

127.0.0.1:9999

SOCKS5

This should allow to access the Ipython as

http://127.0.0.1:7500

Or you can use command line option also

Few links

http://wisdomthroughknowledge.blogspot.com.au/2012/07/accessing-ipython-notebook-remotely.html

https://coderwall.com/p/ohk6cg

http://waterprogramming.wordpress.com/2013/02/14/connecting-to-an-ipython-html-notebook-on-the-cluster-using-an-ssh-tunnel/

How to install Anaconda python unattended in linux at custom location?

Use commands

$bash Anaconda-1.x.x-Linux-x86.sh -b -p /home/vagrant/simple_hadoop/anaconda

By default it asks

Location confirmation
Path env confirmation

I want to install without confirmation unattended via linux bash script at custom location

If location is not possible

Thanks

bash

Anaconda-1.9.1-Linux-x86.sh: illegal option -- y
Error: did not recognize option, please try -h
vagrant@precise32:/vagrant/binaries/anaconda$ bash Anaconda-1.9.1-Linux-x86.sh -h
usage: Anaconda-1.9.1-Linux-x86.sh [options]

Installs Anaconda 1.9.1

    -b           run install in batch mode (without manual intervention),
                 it is expected the license terms are agreed upon
    -h           print this help message and exit
    -p PREFIX    install prefix, defaults to /home/vagrant/anaconda

Making oozie hbase work with Kerberos enabled cluster

At the top of workflow add

<credentials>
<credential name='hbaseauth' type='hbase'>
</credential>
</credentials>

Within any action add the details about credentials.

<action name="process" cred="hbaseauth">

Also add details about hbase-site.xml

<job-xml>${hbaseSite}</job-xml>
<file>${hbaseSite}#hbase-site.xml</file>

Complete example

<action name="process" cred="hbaseauth">
        <java>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <job-xml>${hbaseSite}</job-xml>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>

            <main-class>${process_classname}</main-class>
           <file>${hbaseSite}#hbase-site.xml</file>
            <capture-output/>
        </java>
        <ok to="success"/>
        <error to="failed"/>
    </action>

Installing software from source in (Ubuntu) Linux

Download the tar.gz ball for the software

Extract it

Read the Readme for build related instructions

Example

gunzip wget-1.11.4.tar.gz

cd wget-1.11.4

./configure

make

make install

Find which , where is software or library installed in (Ubuntu)Linux

Example

$which gcc

$whereis gcc

$locate signal.h

Locate searches the periodic index which is made by cron

/usr/include/signal.h

Find which version of rpm is installed for Redhat or Centos

$rpm -q python

python-2.4.3-21.el5

Find which version of deb is installed

Ubuntu

dpkg

How to install software in Linux

How to install software in Ubuntu (Linux)

Ubuntu

$apt-get install python

Redhat

$yum install python

SUSE

$yast --install python

Solaris

/opt/csw/bin/pkgutil --install python