Scala 練習題 學生分數案例
一、相關資訊
題目:
1、統計班級人數
2、統計學生的總分
3、統計總分年級排名前十學生各科的分數
4、統計總分大於年級平均分的學生
5、統計每科都及格的學生
6、統計偏科最嚴重的前100名學生
數據樣例(部分數據):
1.學生資訊數據:students.txt
1500100001,施笑槐,22,女,文科六班
1500100002,呂金鵬,24,男,文科七班
1500100003,單樂蕊,22,女,理科六班
1500100004,葛德曜,24,男,理科三班
1500100005,宣谷芹,22,女,理科五班
1500100006,邊昂雄,21,男,理科二班
1500100007,尚孤風,23,女,文科六班
1500100008,符半雙,22,女,理科六班
1500100009,沈德昌,21,男,理科一班
1500100010,羿彥昌,23,男,理科六班
1500100011,宰運華,21,男,理科三班
1500100012,梁易槐,21,女,理科一班
1500100013,逯君昊,24,男,文科二班
1500100014,羿旭炎,23,男,理科五班
1500100015,宦懷綠,21,女,理科一班
1500100016,潘訪煙,23,女,文科一班
2.學生分數資訊(部分):
500100001,1000001,98
1500100001,1000002,5
1500100001,1000003,0
1500100001,1000004,29
1500100001,1000005,85
1500100001,1000006,52
1500100002,1000001,139
1500100002,1000002,102
1500100002,1000003,44
1500100002,1000004,18
1500100002,1000005,46
1500100002,1000006,91
1500100003,1000001,48
3.學生科目資訊(部分):
1000001,語文,150
1000002,數學,150
1000003,英語,150
1000004,政治,100
1000005,歷史,100
1000006,物理,100
1000007,化學,100
1000008,地理,100
1000009,生物,100
二、題目程式碼編寫
1、統計班級人數
package shujia
import scala.io.Source
//1、統計班級人數
/**
* 以下所有的方法都是返回新的集合,不會修改原始的集合
* 同時以下這些方法在set集合中也有,除了sort
* foreach:遍曆數據
* map:一條一條處理數據
* filter:過濾數據
* flatMap:將一行轉換成多行
* sortBy:排序
* groupBy:分組
*/
object Test2 {
def main(args: Array[String]): Unit = {
//讀取文件
val students: List[String] = Source.fromFile("data/score.txt").getLines().toList
//按照逗號分割
val stringses: List[Array[String]] = students.map(line => line.split(","))
//3.過濾臟數據
val listFilter: List[Array[String]] = stringses.filter(line => line.length == 3)
//4.取數據
val scores: List[(String, Int)] = listFilter.map {
case Array(id: String, _: String, sco: String) =>
(id, sco.toInt)
}
//分組group
val group: Map[String, List[(String, Int)]] = scores.groupBy(word => word._1)
//統計數量
val sumScoList: Map[String, Int] = group.map {
case (id: String, list: List[(String, Int)]) =>
val sco: List[Int] = list.map { case (_, sco: Int) => sco }
val sumSco: Int = sco.sum
(id, sumSco)
}
sumScoList.foreach(println)
}
}
2、統計學生的總分
package com.shujia.scala
import scala.io.Source
object Demo22SumScore {
def main(args: Array[String]): Unit = {
/**
* 2、統計學生的總分
*/
//1、讀取分數表
val scoresList: List[String] = Source.fromFile("data/score.txt").getLines().toList
//2、過濾臟數據
val filterList: List[String] = scoresList.filter((line: String) => {
val length: Int = line.split(",").length
length == 3
})
//3、取出學號和分數
val idAndScore: List[(String, Int)] = filterList.map((line => {
val split: Array[String] = line.split(",")
//學號
val id: String = split.head
//分數
val score: Int = split.last.toInt
(id, score)
}))
//4、按照學號分組
val groupByList: Map[String, List[(String, Int)]] = idAndScore.groupBy(kv => kv._1)
//5、統計學生的總分
val sumScoMap: Map[String, Int] = groupByList.map((kv: (String, List[(String, Int)])) => {
val id: String = kv._1
val scores: List[(String, Int)] = kv._2
//取出每個學生所有的分數
val scos: List[Int] = scores.map(sco => sco._2)
//計算總分
val sumSco: Int = scos.sum
(id, sumSco)
})
sumScoMap.foreach(println)
}
}
/*
* 第二種方法,case
*/
object Demo22SumScore {
def main(args: Array[String]): Unit = {
/**
* 2、統計學生的總分
*/
//1、讀取分數表
val scoresList: List[String] = Source.fromFile("data/score.txt").getLines().toList
//2、過濾臟數據
val filterList: List[String] = scoresList.filter((line: String) => {
val length: Int = line.split(",").length
length == 3
})
//3、取出學號和分數
val idAndScore: List[(String, Int)] = filterList.map((line => {
val split: Array[String] = line.split(",")
//學號
val id: String = split.head
//分數
val score: Int = split.last.toInt
(id, score)
}))
//4、按照學號分組
val groupByList: Map[String, List[(String, Int)]] = idAndScore.groupBy(kv => kv._1)
//5、統計學生的總分
val sumScoMap: Map[String, Int] = groupByList.map((kv: (String, List[(String, Int)])) => {
val id: String = kv._1
val scores: List[(String, Int)] = kv._2
//取出每個學生所有的分數
val scos: List[Int] = scores.map(sco => sco._2)
//計算總分
val sumSco: Int = scos.sum
(id, sumSco)
})
sumScoMap.foreach(println)
}
}
3、統計總分年級排名前十學生各科的分數
package shujia
import scala.io.Source
//1、統計總分年級排名前十學生各科的分數
object Test3Top10 {
def main(args: Array[String]): Unit = {
//1、讀取分數
val lines: List[String] = Source.fromFile("data/score.txt").getLines().toList
//2、切分數據
val scoreArr: List[Array[String]] = lines.map(line => line.split(","))
//3、過濾臟數據
val scoreFilter: List[Array[String]] = scoreArr.filter(arr => arr.length == 3)
//scoreFilter.foreach(println)
//4、取出學號和分數
val scoFilter: List[(String, String, Int)] = scoreFilter.map {
case Array(id: String, subject: String, sco: String) =>
(id, subject, sco.toInt)
}
//5.學號分組
val scoGroupBy: Map[String, List[(String, String, Int)]] = scoFilter.groupBy(kv => kv._1)
//6.計算學生總分
val sSos: List[(String, Int, List[(String, String, Int)])] = scoGroupBy.map {
case (id: String, list: List[(String, String, Int)]) =>
val scores: List[Int] = list.map { case (_, _, sco: Int) => sco }
val scoSum: Int = scores.sum
(id, scoSum, list)
}.toList
val lists: List[(String, Int, List[(String, String, Int)])] = sSos.sortBy(kv => -kv._2)
val top10: List[(String, Int, List[(String, String, Int)])] = lists.take(10)
top10.foreach(println)
}
}
4、統計總分大於年級平均分的學生
import com.shujia.spark.util.HdfsUtil
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
//統計總分大於年級平均分的學生
//平均總分=學生總分/學生人數
object Test2ScoreAvg {
def main(args: Array[String]): Unit = {
val conf = new SparkConf()
conf.setAppName("AVG")
// conf.setMaster("local")
val sc = new SparkContext(conf)
//讀取文件切分過濾臟數據
val scoFilter: RDD[Array[String]] = sc.textFile("/data/student/score.txt").map(_.split(",")).filter(_.length == 3)
//提取分數出來
val scoRDD: RDD[(String, Int)] = scoFilter.map {
case Array(id: String, _, sco: String) =>
(id, sco.toInt)
}
//按照學生學號進行分組
val scoGroRDD: RDD[(String, Iterable[(String, Int)])] = scoRDD.groupBy(_._1)
//暫存快取中,提高速率
scoRDD.cache()
//計算學生的總分
val sumStusRDD: RDD[(String, Int)] = scoRDD.reduceByKey(_ + _)
//計算年級的總分
val sumNJ: Double = sumStusRDD.map(_._2).sum
//計算年級的平均總分
val avgSum: Double = sumNJ / scoGroRDD.count
//過濾總分大於平均分的數據
val avgtoSum: RDD[(String, Int)] = sumStusRDD.filter { case (_, sco: Int) => sco >= avgSum }
val l1: Long = avgtoSum.count()
println(s"大於平均分有$l1+人,平均分是:$avgSum")
// avgtoSum.foreach(println)
HdfsUtil.delete("/data/sum_avgToSum")
avgtoSum.saveAsTextFile("/data/sum_avgToSum")
}
}
5、統計每科都及格的學生
package shujia
import scala.collection.immutable
import scala.io.Source
//3、統計每科都及格的學生
object Test5_60fen {
def main(args: Array[String]): Unit = {
//讀取文件
val list: List[String] = Source.fromFile("data/score.txt").getLines().toList
// list.foreach(println)
//按照逗號分割
val listSplit: List[Array[String]] = list.map(line => line.split(","))
// listSplit.foreach(println)
//過濾數據
val listFilter: List[Array[String]] = listSplit.filter(line => line.length == 3)
//提取數據,過濾分數大於60的人
// listFilter.foreach(println)
val listFilter2: List[Array[String]] = listFilter.filter(sco => sco.last.toInt - (60) >= 0)
//listFilter2.foreach(println)
val lists: List[(String, String, Int)] = listFilter2.map {
case Array(id: String, sub: String, sco: String) =>
(id, sub, sco.toInt)
}
//按照學號分組
val listGroup: Map[String, List[(String, String, Int)]] = lists.groupBy(line => line._1)
val list1: List[(String, List[(String, String, Int)])] = listGroup.map((kv: (String, List[(String, String, Int)])) => {
val id: String = kv._1
val count: List[(String, String, Int)] = kv._2
(id, count)
}).toList
list1.foreach(println)
}
}
6、統計偏科最嚴重的前100名學生
package com.shujia.scala
import scala.collection.immutable
import scala.io.Source
object Demo31Student {
def main(args: Array[String]): Unit = {
/**
* 4、統計偏科最嚴重的前100名學生
*
* 偏科評估的標準: 方差
*/
//1、讀取分數
val lines: List[String] = Source.fromFile("data/score.txt").getLines().toList
//2、切分數據
val scoreArr: List[Array[String]] = lines.map(line => line.split(","))
//3、過濾臟數據
val scoreFilter: List[Array[String]] = scoreArr.filter(arr => arr.length == 3)
//4、取出學號和分數
val idAndScore: List[(String, Int)] = scoreFilter.map {
case Array(id: String, _, sco: String) =>
(id, sco.toInt)
}
//5、按照學號分組
val groupBy: Map[String, List[(String, Int)]] = idAndScore.groupBy(kv => kv._1)
//計算方差
val std: List[(String, Double, List[(String, Int)])] = groupBy.map {
case (id: String, list: List[(String, Int)]) =>
//一個學生所有的分數
val scores: List[Int] = list.map { case (_, sco: Int) => sco }
/**
* 計算方差
* 1、計算總數
* 2、計算平均值
* 3、計算方差
*
*/
//科目數
val N: Double = scores.length.toDouble
//平均數
val avg: Double = scores.sum / N
//計算方差
val std: Double = scores.map((sco: Int) => (sco - avg) * (sco - avg)).sum / N
(id, std, list)
}.toList
//按照方差排序,取前100
val sortByStd: List[(String, Double, List[(String, Int)])] = std.sortBy(kv => -kv._2)
//取前100
val top10: List[(String, Double, List[(String, Int)])] = sortByStd.take(100)
top10.foreach(println)
}
}