Spark 1.6 has Pivot functionality.
Let's try that out.
Create a simple file with following data
cat /tmp/sample.csv
language,year,earning
net,2012,10000
java,2012,20000
net,2012,5000
net,2013,48000
java,2013,30000
Start the Spark shell with Spark csv
bin/spark-shell --packages "com.databricks:spark-csv_2.10:1.2.0"
Load the sample file
scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("file:///tmp/sample.csv")
df: org.apache.spark.sql.DataFrame = [language: string, year: int, earning: int]
Run simple pivot
scala> df.groupBy("language").pivot("year","2012","2013").agg(sum("earning")).show
+--------+-----+-----+
|language| 2012| 2013|
+--------+-----+-----+
| java|20000|30000|
| net|15000|48000|
+--------+-----+-----+
Let's try that out.
Create a simple file with following data
cat /tmp/sample.csv
language,year,earning
net,2012,10000
java,2012,20000
net,2012,5000
net,2013,48000
java,2013,30000
Start the Spark shell with Spark csv
bin/spark-shell --packages "com.databricks:spark-csv_2.10:1.2.0"
Load the sample file
scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("file:///tmp/sample.csv")
df: org.apache.spark.sql.DataFrame = [language: string, year: int, earning: int]
Run simple pivot
scala> df.groupBy("language").pivot("year","2012","2013").agg(sum("earning")).show
+--------+-----+-----+
|language| 2012| 2013|
+--------+-----+-----+
| java|20000|30000|
| net|15000|48000|
+--------+-----+-----+
References
https://github.com/apache/spark/pull/7841
http://sqlhints.com/2014/03/10/pivot-and-unpivot-in-sql-server/
Let's try that out.
Create a simple file with following data
cat /tmp/sample.csv
language,year,earning
net,2012,10000
java,2012,20000
net,2012,5000
net,2013,48000
java,2013,30000
Start the Spark shell with Spark csv
bin/spark-shell --packages "com.databricks:spark-csv_2.10:1.2.0"
Load the sample file
scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("file:///tmp/sample.csv")
df: org.apache.spark.sql.DataFrame = [language: string, year: int, earning: int]
Run simple pivot
scala> df.groupBy("language").pivot("year","2012","2013").agg(sum("earning")).show
+--------+-----+-----+
|language| 2012| 2013|
+--------+-----+-----+
| java|20000|30000|
| net|15000|48000|
+--------+-----+-----+
Create a simple file with following data
cat /tmp/sample.csv
language,year,earning
net,2012,10000
java,2012,20000
net,2012,5000
net,2013,48000
java,2013,30000
Start the Spark shell with Spark csv
bin/spark-shell --packages "com.databricks:spark-csv_2.10:1.2.0"
Load the sample file
scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("file:///tmp/sample.csv")
df: org.apache.spark.sql.DataFrame = [language: string, year: int, earning: int]
Run simple pivot
scala> df.groupBy("language").pivot("year","2012","2013").agg(sum("earning")).show
+--------+-----+-----+
|language| 2012| 2013|
+--------+-----+-----+
| java|20000|30000|
| net|15000|48000|
+--------+-----+-----+
References
https://github.com/apache/spark/pull/7841
http://sqlhints.com/2014/03/10/pivot-and-unpivot-in-sql-server/
Hi there, You have done a fantastic job. I will certainly digg it and personally suggest to my friends. I'm confident they will be benefited from this website.
ReplyDeleteExcellent post. Keep writing such kind of info on your site. Im really impressed by your site.
ReplyDeleteHi there, You have done an incredible job. I will definitely digg it and for my part suggest to my friends. I'm confident they will be benefited from this site.