Read JSON data to DataFrame
First, we have to read the JSON file.
Use the following command to read the JSON document named employee.json. The data is shown as a table with the fields − id, name, and age.
As an example, the following code creates a DataFrame based on the content of a JSON file and request data by DataFrame API's:
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.{Dataset, SQLContext, SparkSession}
object SparkSessionExample {
def main(args: Array[String]): Unit = {
val jsonFile = args(0);
//Initialize SparkSession
val sparkSession = SparkSession
.builder()
.appName("spark-sql-basic")
.master("local[*]")
.getOrCreate()
//Read json file to DF
val employeeDF = sparkSession.read.json(jsonFile)
//Show the first 100 rows
employeeDF.show(100);
//Show thw scheme of DF
employeeDF.printSchema();
//Select only the "name" column
employeeDF.select("name").show()
//Select people older than 21
employeeDF.where("age > 21").show()
// Count people by age
employeeDF.groupBy("age").count().show()
}
}
As a result you will see:
Registering DataFrame as a table in SparkSession
You can also register dataframe as a table in SparkSession and execute SQL queries by the following way:
// Register the DataFrame as a SQL table view
employeeDF.createOrReplaceTempView("employee")
sparkSession.sql("select * from employee where age > 21").show()