Spark 1.6 has Pivot functionality.
Let's try that out.
Create a simple file with following data
cat /tmp/sample.csv
language,year,earning
net,2012,10000
java,2012,20000
net,2012,5000
net,2013,48000
java,2013,30000
Start the Spark shell with Spark csv
bin/spark-shell --packages "com.databricks:spark-csv_2.10:1.2.0"
Load the sample file
scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("file:///tmp/sample.csv")
df: org.apache.spark.sql.DataFrame = [language: string, year: int, earning: int]
Run simple pivot
scala> df.groupBy("language").pivot("year","2012","2013").agg(sum("earning")).show
+--------+-----+-----+
|language| 2012| 2013|
+--------+-----+-----+
| java|20000|30000|
| net|15000|48000|
+--------+-----+-----+
Let's try that out.
Create a simple file with following data
cat /tmp/sample.csv
language,year,earning
net,2012,10000
java,2012,20000
net,2012,5000
net,2013,48000
java,2013,30000
Start the Spark shell with Spark csv
bin/spark-shell --packages "com.databricks:spark-csv_2.10:1.2.0"
Load the sample file
scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("file:///tmp/sample.csv")
df: org.apache.spark.sql.DataFrame = [language: string, year: int, earning: int]
Run simple pivot
scala> df.groupBy("language").pivot("year","2012","2013").agg(sum("earning")).show
+--------+-----+-----+
|language| 2012| 2013|
+--------+-----+-----+
| java|20000|30000|
| net|15000|48000|
+--------+-----+-----+
References
https://github.com/apache/spark/pull/7841
http://sqlhints.com/2014/03/10/pivot-and-unpivot-in-sql-server/