Architecture pattern for real time processing in Bigdata


3Vrfc+I2EP5reOyMweCER+CgufbauR430+mjwALU2BYjy0nIX38rrBW2ZKghtpMrwzBojdf6vv0piZ4/i19+FWS/+4OHNOoNvPCl53/qDQb3ngefSnCwBFvBwlzU14KMhTQtiSTnkWT7snDNk4SuZUm24VFZ2Z5sqSNYrknkSv9modzpyQ2Ck/yBsu0OH9MPxvmVVB5QR0g3JIvkL0cRXFOXY4K6NMwXLx8Gd/n4oMd9Tyvck6Q0pVfO45JA0JS9lqe9YXpe+iErLkIqSqKIJY9F3vw5GElwDjeqb/HLjEbKUGiE/LbFmauGLkGT0qPP3aDpeCJRpqfu0Pe8Y5Iu92Stxs/gPD1/umFRNOMRz7H489Fstli4z9bTeaJCUu1oJ5DgiZTHVIqDoh/tp2nRXjjytTmeT8YPAv2bXdHwyCfRZG6N7hN6+KIJqCZDW/o6MhoAfW+BRh8sgEZ4RcxNQEa+O8c8tjDfdYgZZ9sx5gGmVMR83yHmmwK9Acx2QI87xOy/E2acvMYceB1i1oWxgBnQiUcQ/cZX8PltvvyuKKACAN2S68fjBbwaoGk0LNPkV+T6gV/BE6bLN/Gkn3XJN9Id2auvWRxN1lLhnyp0DLqTL2RFo688ZZJx1RasuJSqHTA/mERsqy5I3oRH3VlMDSqqYlVRxITzJqbuz3rU8q8vt3jQdDz3JqNGiojWgrwMXV5MY1AkBsl6Cy9muhc8iIbQzOphwhN6FrP64UXERURo1UpTCxoRyZ7KfWgVTv2Mr5zBXE65C6nRlA4xUFBFyjOxpvquYmNpKRrZxQ59FhVJIrZUOoqOtBvg9SwxdCzxOVnzmCVbtaAhkjiGAQcD4qZER+ka7KH6cid8YxaG6p5pE2UBcxkyUuWtVbbFPPkWb/V1pFzy1q3g2RXZyqzwyAo1HCd6PuHbBIxcAjC3N53vBzXwwx2whlXWrtEVGEL/2/4dlH1cD78HPLxaNi4OuwBfo5a3DN7q+HDYBXi3PHcN3gprHHYBvsaKvWXwVhOLww7AI8/vCH5kJfQOwdfpvtoFb3VKOOwCfI1sT4Tgzz9Bv2ktbQJPY7u637Tqj6OouX4T16kF+r8LkqQbLmIaHvd0gQ+4zqDx9HCxlEpBSd6TfoBm1LdCN8A+v5Nm1K1ZvlDEASSpJjnZw1nC+XDGhfk64hmorBPe1+9OWGvLfkV8V64tm+hW/Tp17adbWwbWnu/QzoV1Y91WZBYWzcc6uvv/zBJW7TKV+2pLWIr8Fi3hVvyHT4vlLRtQ6rBqbI71GjyuCipWtJXHVU1sQA3dLf1/Jt/+vIUQtaPbCiF3FcvA9ghx9/sfshgKM8gyKBx6w9JbkfRYpmE041Brofp+hJJsqgkGZcVaorWSPHT30JaHVNJYkZdmqxhOsgfev3ylxnljk58kfADiAmur0ThhF8T1bynV0MJIwR/NXxo0+R+6UbcjGw/yri0ZtqIAT7tb2Bh2izdsDAsaA8vk+MNkS9PjEY7KEMfeHFYN/gs5Oqb5uoiyGNKHGf8+2ag9ZdAObyNdkvgVpAUFl2Mjohs1bHXLeVRuYs2felqIDBie/raSG+v0RyN//gM=

The flow is as follows

1) Real time data is ingested into system via stream ingestion tools like Flume , Kafta or Samza. The choice of which tool to be used among these dependent on factors like do we need event guaranteed delivery , are we bothered about sequence of events. etc. This is topic for another posr

2) Apply the processing required on the incoming data via Spark streaming api and make it ready to be consumable by third party apps

3) Using the REST gateway job server of Spark , third party apps trigger further spark jobs to process the data and return results back to the applications.

Sync folder between Android Tablet or Phone and computer

 

Requirement

To synchronize the contents of one folder between my android tablet or phone with folder inside my computer

Softwares used

Computer

Some FTP server

In windows i used Filezilla server

See guide here

http://www.addictivetips.com/windows-tips/how-to-setup-personal-ftp-server-using-filezilla-step-by-step-guide/

Read about different sync modes

https://sites.google.com/a/sergey-bogdanov.info/pcfilesync/tutorial-1/syncmodes

In Android

Download the application

https://play.google.com/store/apps/details?id=org.pcfilesync&hl=en

I was able to setup FTP server and sync between the folder in 10 mins

Please follow the tutorial here

https://sites.google.com/a/sergey-bogdanov.info/pcfilesync/tutorial

Now i two folders synchronize each other every 20 seconds.

How to see total folder size in Windows

 

Download folder size software

http://sourceforge.net/projects/foldersize/

On lower right side you will have icon to allow display of folder size.

It will appear something like this

 

 

 

 image

On each folder you will have window like shown below

image

Details

http://www.intowindows.com/how-to-view-folder-size-in-windows-8-1-explorer/

Install protobuf 2.5 on Windows

 

Download prorobuf from https://code.google.com/p/protobuf/downloads/detail?name=protoc-2.5.0-win32.zip&can=2&q=

Extract it to some location

Add it to your path

After that you can see its installed

C:\Users\jj\Desktop>which protoc
/cygdrive/g/dev/tools/protoc-2.5.0-win32/protoc

Tip : I use Rapid Environment Editor to manage all my environment variables. You will like it , read about it here.

Set windows environment variables via script command line

 

When using windows , one the feature which i miss is use of scripts to do lots of things we can do in Linux

To set the deterministic development environment rapidly i use the Rapid Environment Editor

You can use the bat script to manage all environment variables.

Here is example

set_env.bat

rem This is a comment
rapidee -C JAVA_HOME
rem rapidee -S JAVA_HOME "C:\Program Files\Java\jdk1.8.0"
rapidee -S JAVA_HOME "C:\Program Files\Java\jdk1.7.0_51"
rapidee -S MVN_HOME G:\dev\tools\apache-maven-3.1.1
rapidee -S SCALA_HOME G:\dev\tools\scala\scala-2.10.4
rapidee -S Path
rem rapidee -A -E Path "C:\Program Files\Java\jdk1.8.0\bin"
rapidee -A -E Path "C:\Program Files\Java\jdk1.7.0_51\bin"
rapidee -A -E Path G:\dev\tools\apache-maven-3.1.1\bin
rapidee -A -E -C Path G:\dev\tools\protoc-2.5.0-win32
rapidee -A -E -C Path G:\dev\tools\0.5.1_windows_amd64_packer
rapidee -A -E -C Path C:\cygwin64\bin
rapidee -A -E -C Path "C:\Program Files (x86)\Git\bin"
rapidee -A -E -C Path G:\dev\tools\scala\scala-2.10.4\bin

 

Save this above file as set_env.bat

You can do lots of things via batch scripting. http://en.wikibooks.org/wiki/Windows_Batch_Scripting

Download the Rapid Environment editor from http://www.rapidee.com/

Install it and add it to your path.

Now whenever you want to set windows environment variable you can just add it to above set_env.bat script and execute

Here is what it looks like on my system

 

image

Does not contain a valid host:port authority: logicaljt

Error

Exception in thread "main" java.lang.IllegalArgumentException: Does not contain a valid host:port authority: logicaljt

Solution

Check that your hadoop supports the Job Tracker HA
Check the conf files of Hadoop

Windows 8 icon location

%windir%\System32\imageres.dll

Can not remove eclipse.exe Backup failed

While updating eclipse i got the below error

Backup of file G:\dev\tools\eclipse\eclipse.exe failed.
Can not remove : G:\dev\tools\eclipse\eclipse.exe




So to resolve this.

Start eclipse
Rename eclipse.exe to eclipse_bkup
Start Update again
Restart eclipse
Done

Java 8 framework and Tools support

 

Following post if the draft , i am going to update it on ongoing basis as part of my learnings.

Updated 26 March 2014

Development tools

Eclipse Kelper SR2 supports Java8

 

Application servers

Tomcat

People have been running Java 8 with Tomcat

http://tomcat.apache.org/whichversion.html

JBoss Wildfy

Comes with basic Java 8 support

http://www.wildfly.org/

Frameworks

Spring

Spring Framework 4.0 provides support for several Java 8 features. You can make use of lambda expressions and method references with Spring’s callback interfaces. There is first-class support for java.time (JSR-310), and several existing annotations have been retrofitted as @Repeatable. You can also use Java 8’s parameter name discovery (based on the -parameters compiler flag) as an alternative to compiling your code with debug information enabled.

http://docs.spring.io/spring/docs/current/spring-framework-reference/html/new-in-4.0.html

Links

http://spring.io/blog/2014/03/21/java-8-in-enterprise-projects

Graphing libraries

 

Following post if the draft , i am going to update it on ongoing basis as part of my learning's.

Javascript based library awesome for working with graphs and social network types of data structures.

http://sigmajs.org/

#draft

Change cygwin username

 

Go to

C:\cygwin64\etc

open the /etc/passwd file

Change the username to required need

Setup Ipython notebook over ssh tunnel

Problem:

Ipython is running in remote machine which has no graphical GUI access. We want to use Ipython notebook

Solution :

Use SSH Tunnel to make
We can use following command to start ipython if we want to use tunnel

ipython notebook --no-browser --port=7500

Now configure the local system to use proxy SSH tunnel

If you are using Windows as local machine then

You can configure the Putty to use SSH Tunnel

Go to Tunnel
Enter Source port say 9999
Select Destination Dynamic


Click Add
It would look like as shown below



Open your Putty session and Login with your username and password to the remote server with this newly created tunnel settings.

In Browser use ( Use firefox )

Proxy as
127.0.0.1:9999
SOCKS5
This should allow to access the Ipython as

http://127.0.0.1:7500

Or you can use command line option also


Few links




How to install Anaconda python unattended in linux at custom location?

 

Use commands

$bash Anaconda-1.x.x-Linux-x86.sh -b -p /home/vagrant/simple_hadoop/anaconda

By default it asks

Location confirmation
Path env confirmation

I want to install without confirmation unattended via linux bash script at custom location

If location is not possible

Thanks

bash

Anaconda-1.9.1-Linux-x86.sh: illegal option -- y
Error: did not recognize option, please try -h
vagrant@precise32:/vagrant/binaries/anaconda$ bash Anaconda-1.9.1-Linux-x86.sh -h
usage: Anaconda-1.9.1-Linux-x86.sh [options]

Installs Anaconda 1.9.1

    -b           run install in batch mode (without manual intervention),
                 it is expected the license terms are agreed upon
    -h           print this help message and exit
    -p PREFIX    install prefix, defaults to /home/vagrant/anaconda

Making oozie hbase work with Kerberos enabled cluster

At the top of workflow add

<credentials>
  <credential name='hbaseauth' type='hbase'>
  </credential>
</credentials>

Within any action add the details about credentials.

<action name="process" cred="hbaseauth">


Also add details about hbase-site.xml

<job-xml>${hbaseSite}</job-xml>
<file>${hbaseSite}#hbase-site.xml</file>

Complete example


<action name="process" cred="hbaseauth">
        <java>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <job-xml>${hbaseSite}</job-xml>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>

            <main-class>${process_classname}</main-class>
           <file>${hbaseSite}#hbase-site.xml</file>
            <capture-output/>
        </java>
        <ok to="success"/>
        <error to="failed"/>
    </action>

 

Installing software from source in (Ubuntu) Linux

Installing software from source in (Ubuntu) Linux

Download the tar.gz ball for the software

Extract it

Read the Readme for build related instructions

Example

gunzip wget-1.11.4.tar.gz
cd wget-1.11.4
./configure
make
make install

Find which , where is software or library installed in (Ubuntu)Linux

Find which , where is software or library installed in (Ubuntu)Linux
Example

$which gcc

$whereis gcc

$locate signal.h

Locate searches the periodic index which is made by cron

/usr/include/signal.h

Find which version of rpm is installed for Redhat or Centos

$rpm -q python
python-2.4.3-21.el5

Find which version of deb is installed
Ubuntu

dpkg

How to install software in Linux

How to install software in Ubuntu (Linux)

Ubuntu

$apt-get install python

Redhat

$yum install python

SUSE

$yast --install python

Solaris

/opt/csw/bin/pkgutil --install python