hadoop fs -test example
Install Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy File
This post is part of series of post which explains how to enable Kerberos on Hadoop Cluster.
To use AES 256 encryption in Kerberos you must install the Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy File in each of the host of your cluster.
Process is straight forward
Download the JCE from Oracle website ( Follow the Link and JCE is at bottom of the page )
http://www.oracle.com/technetwork/java/javase/downloads/index.html
Extract the Zip
Copy the files
local_policy.jar
US_export_policy.jar
In all machines at following location
JAVA_HOME/jre/lib/security
Take appropriate path as per your configuration above
Verify aes256-cts:normal is present in supported_enctypes field of the kdc.conf or krb5.conf file.
After changing the kdc.conf file, you'll need to restart both the KDC and the kadmin server for those changes to take
Hadoop cluster benchmarks
To do bench marks of Hadoop cluster is an ongoing process as we use it inside the organization.
The main thing which we don't know when we buy new cluster is how this new power house of machine will behave for various different sets of workloads.
Intel who is also working on its own flavor of hadoop has product to do Benchmark of cluster performance against different types of workloads.
https://github.com/intel-hadoop/HiBench/blob/master/README.md
Micro Benchmarks:
-
Sort (sort)
This workload sorts its text input data, which is generated using the Hadoop RandomTextWriter example.
-
WordCount (wordcount)
This workload counts the occurrence of each word in the input data, which are generated using the Hadoop RandomTextWriter example. It is representative of another typical class of real world MapReduce jobs - extracting a small amount of interesting data from large data set.
-
TeraSort (terasort)
TeraSort is a standard benchmark created by Jim Gray. Its input data is generated by Hadoop TeraGen example program.
HDFS Benchmarks:
-
enhanced DFSIO (dfsioe)
Enhanced DFSIO tests the HDFS throughput of the Hadoop cluster by generating a large number of tasks performing writes and reads simultaneously. It measures the average I/O rate of each map task, the average throughput of each map task, and the aggregated throughput of HDFS cluster.
Web Search Benchmarks:
-
Nutch indexing (nutchindexing)
Large-scale search indexing is one of the most significant uses of MapReduce. This workload tests the indexing sub-system in Nutch, a popular open source (Apache project) search engine. The workload uses the automatically generated Web data whose hyperlinks and words both follow the Zipfian distribution with corresponding parameters. The dict used to generate the Web page texts is the default linux dict file /usr/share/dict/linux.words.
-
PageRank (pagerank)
The workloads contains an implementation of the PageRank algorithm on Hadoop (a search engine ranking benchmark included in pegasus 2.0). The workload uses the automatically generated Web data whose hyperlinks follow the Zipfian distribution.
Machine Learning Benchmarks:
-
Mahout Bayesian classification (bayes)
Large-scale machine learning is another important use of MapReduce. This workload tests the Naive Bayesian (a popular classification algorithm for knowledge discovery and data mining) trainer in Mahout 0.7, which is an open source (Apache project) machine learning library. The workload uses the automatically generated documents whose words follow the zipfian distribution. The dict used for text generation is also from the default linux file /usr/share/dict/linux.words.
-
Mahout K-means clustering (kmeans)
This workload tests the K-means (a well-known clustering algorithm for knowledge discovery and data mining) clustering in Mahout 0.7. The input data set is generated by GenKMeansDataset based on Uniform Distribution and Guassian Distribution.
Data Analytics Benchmarks:
-
Hive Query Benchmarks (hivebench)
This workload is developed based on SIGMOD 09 paper "A Comparison of Approaches to Large-Scale Data Analysis" and HIVE-396. It contains Hive queries (Aggregation and Join) performing the typical OLAP queries described in the paper. Its input is also automatically generated Web data with hyperlinks following the Zipfian distribution.
Google Search only university domains
Export MySQL data to csv
How to learn Japji Sahib
Waheguru ji
I have been trying to learn Japji sahib since sometime now , but i get confused with pauri orders.
So this is what i have planned. I am going to post progress of this effort also how it goes.
Divide and Conquer
This rule has been used since ages to solve all the tasks of the world. To sub divide the tasks of the world. Guru ji already has divided the tasks in pauris for us. So i am going to make sure i remember each of them correctly. So what i am going to do is while i see one line i will hide next line from my view and would try to recite it by my heart. Then i would scroll little down to see if i was correct in next line. I would scroll only one line at a time , comparing what was there what i recited , correcting my mistakes for recitation and repeating if required. In this way i would try to make sure i learn all individual pauris with correct recitation.
You can do the above steps online at
http://www.sikhiwiki.org/index.php/Learn_all_of_Japji_Sahib#Salok
If you want to use your computer then i would say download Sikhi2Max software (from Sikhnet website other links are not working).
Click on compiled Banis section.
Click on Japji sahib and then show yourself only one line at a time as discussed above.
Note on your notebook where you are doing mistake , so that next time you become extra cautious that you are making mistake there.
Giving hint to myself
I have been this method using in my school studies remembering long long answers (this is relevant if you studied in India). Or if you have worked in theater plays then you must have learned this art. You remember your dialogue with dialogue ending of your previous speaker.
So i planned to arrange and learn all the first lines of pauris so that i can have hint which pauri is coming next. I have written the first lines of all of them. If you want the sheet which i made you can see it here online.
It starts from Pauris
https://plus.google.com/u/0/114107145152608094049/posts
Listen Audio of Japji Sahib all time
For many years i have been listening Audio by Tarlochan singh. His voice is so familiar in all Indian houses. (His voice has been embedded deep in my heart with memories from as old as when i was 8 years old , it was played in my neighbours place every evening Rehraas Sahib and those memories are still fresh in my mind) If you want to get this you can get if from Youtube. I have copied it in my mobile. You can download it from Internet. If you want this just write to me i can share it. In my mobile i have just placed Japji sahib on repeat and it repeats all times when it finishes. By listening we learn things fast , that's why you must have noticed we remember songs all lyrics ( Although i cannot remember , to be honest) . But listening gives you one more input source to mind besides reading by eyes.
Last and most important
Learn Meanings of Japji Sahib
I have downloaded the english translation of Japji sahib so that i know what it is written. Its common saying in Punjabi ratta jaldi bhul janda hai , so i want to learn by understanding rather than just mugging them up. Guru Nanak dev ji ne vi likhya hai
Vidhya vichari ta paropkari
Sanu sab kuch samahj ke vichar ke karna chahida hai
One more idea which i have is
If you want to mix the steps 2 and 3 i would recommend to use the below Youtube video. It shows meaning also besides you can always listen to Audio
Please feel free to share this post with your friends.
Waheguru ji mehar karna gurbani te gurubani de arth man andar was jaan.
Do post your experiences and suggestions. I would keep you updated.
Oozie day light savings example
Null behaviour in Sqoop and Hive
Note
\N (backslash with a capital N)—not to be confused with \n, which is only a single newline character.