手把手教你 在IDEA搭建 SparkSQL的開發環境

1. 創建maven項目 在IDEA中添加scala插件 並添加scala的sdk

https://www.cnblogs.com/bajiaotai/p/15381309.html

2. 相關依賴jar的引入 配置pom.xml

2.1 pom.xml 示例 (spark版本: 3.0.0  scala版本: 2.12)

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="//maven.apache.org/POM/4.0.0"
         xmlns:xsi="//www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="//maven.apache.org/POM/4.0.0 //maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.dxm.sparksql</groupId>
    <artifactId>sparksql</artifactId>
    <version>1.0-SNAPSHOT</version>

    <!-- 指定變數 spark的版本資訊 scala的版本資訊--> 
    <properties>
        <spark.version>3.0.0</spark.version>
        <scala.version>2.12</scala.version>
    </properties>

    <dependencies>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-yarn_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.27</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>1.2.1</version>
        </dependency>

    </dependencies>


</project>

2.2 spark版本與scala版本對應關係的問題

#根據下面鏈接 即可查詢 spark版本和scala版本的對應關係及依賴配置
//www.cnblogs.com/bajiaotai/p/16270971.html

2.3 在scala程式碼中查看運行時的scala版本

println(util.Properties.versionString)

2.4 FAQ 因Spark版本和Scala版本不一致導致的報錯

待補充

3. 程式碼測試

object TestSparkSQLEnv extends App {

  //1.初始化 SparkSession 對象
  val spark = SparkSession
    .builder
    .master("local")
    //.appName("SparkSql Entrance Class SparkSession")
    //.config("spark.some.config.option", "some-value")
    .getOrCreate()

  //2.通過 SparkSession 獲取 SparkContext
  private val sc: SparkContext = spark.sparkContext

  //3.設置日誌級別
  // Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN
  // This overrides any user-defined log settings //會覆蓋掉 用戶設置的日誌級別 比如 log4j.properties
  sc.setLogLevel("ERROR")

  import spark.implicits._

  //4.創建DataFream
  private val rdd2DfByCaseClass: DataFrame = spark.sparkContext
    .makeRDD(Array(Person("疫情", "何時"), Person("結束", "呢")))
    .toDF("名稱", "行動")
  rdd2DfByCaseClass.show()
  //  +----+----+
  //  |名稱|行動|
  //  +----+----+
  //  |疫情|何時|
  //  |結束|  呢|
  //  +----+----+

  //5.關閉資源
  spark.stop()

}

4. 結束語

如果能正常執行,恭喜你環境搭建沒問題,如果遇到問題請留言共同探討

 

Tags: