Read JSON data to DataFrame

First, we have to read the JSON file.

Use the following command to read the JSON document named employee.json. The data is shown as a table with the fields − id, name, and age.

As an example, the following code creates a DataFrame based on the content of a JSON file and request data by DataFrame API's:

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.{Dataset, SQLContext, SparkSession}

object SparkSessionExample {

  def main(args: Array[String]): Unit = {

    val jsonFile = args(0);

    //Initialize SparkSession
    val sparkSession = SparkSession
      .builder()
      .appName("spark-sql-basic")
      .master("local[*]")
      .getOrCreate()

    //Read json file to DF
    val employeeDF  = sparkSession.read.json(jsonFile)

    //Show the first 100 rows
    employeeDF.show(100);

    //Show thw scheme of DF
    employeeDF.printSchema();

    //Select only the "name" column
    employeeDF.select("name").show()

    //Select people older than 21
    employeeDF.where("age > 21").show()

    // Count people by age
    employeeDF.groupBy("age").count().show()
  }
}

As a result you will see:

Registering DataFrame as a table in SparkSession

You can also register dataframe as a table in SparkSession and execute SQL queries by the following way:

    // Register the DataFrame as a SQL table view
    employeeDF.createOrReplaceTempView("employee")

    sparkSession.sql("select * from employee where age > 21").show()

results matching ""

    No results matching ""