手把手教你 在IDEA搭建 SparkSQL的開發環境
1. 創建maven項目 在IDEA中添加scala插件 並添加scala的sdk
https://www.cnblogs.com/bajiaotai/p/15381309.html
2. 相關依賴jar的引入 配置pom.xml
2.1 pom.xml 示例 (spark版本: 3.0.0 scala版本: 2.12)
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="//maven.apache.org/POM/4.0.0" xmlns:xsi="//www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="//maven.apache.org/POM/4.0.0 //maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.dxm.sparksql</groupId> <artifactId>sparksql</artifactId> <version>1.0-SNAPSHOT</version> <!-- 指定變數 spark的版本資訊 scala的版本資訊--> <properties> <spark.version>3.0.0</spark.version> <scala.version>2.12</scala.version> </properties> <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-yarn_${scala.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_${scala.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.27</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_${scala.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>1.2.1</version> </dependency> </dependencies> </project>
2.2 spark版本與scala版本對應關係的問題
#根據下面鏈接 即可查詢 spark版本和scala版本的對應關係及依賴配置
//www.cnblogs.com/bajiaotai/p/16270971.html
2.3 在scala程式碼中查看運行時的scala版本
println(util.Properties.versionString)
2.4 FAQ 因Spark版本和Scala版本不一致導致的報錯
待補充
3. 程式碼測試
object TestSparkSQLEnv extends App { //1.初始化 SparkSession 對象 val spark = SparkSession .builder .master("local") //.appName("SparkSql Entrance Class SparkSession") //.config("spark.some.config.option", "some-value") .getOrCreate() //2.通過 SparkSession 獲取 SparkContext private val sc: SparkContext = spark.sparkContext //3.設置日誌級別 // Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN // This overrides any user-defined log settings //會覆蓋掉 用戶設置的日誌級別 比如 log4j.properties sc.setLogLevel("ERROR") import spark.implicits._ //4.創建DataFream private val rdd2DfByCaseClass: DataFrame = spark.sparkContext .makeRDD(Array(Person("疫情", "何時"), Person("結束", "呢"))) .toDF("名稱", "行動") rdd2DfByCaseClass.show() // +----+----+ // |名稱|行動| // +----+----+ // |疫情|何時| // |結束| 呢| // +----+----+ //5.關閉資源 spark.stop() }
4. 結束語
如果能正常執行,恭喜你環境搭建沒問題,如果遇到問題請留言共同探討