数据分析高级教程(三)
- 2019 年 10 月 6 日
- 筆記
工作流单元测试
1、工作流定义配置上传
[hadoop@hdp-node-01 wf-oozie]$ hadoop fs -put hive2-etl /user/hadoop/oozie/myapps/[hadoop@hdp-node-01 wf-oozie]$ hadoop fs -put hive2-dw /user/hadoop/oozie/myapps/ [hadoop@hdp-node-01 wf-oozie]$ lltotal 12drwxrwxr-x. 2 hadoop hadoop 4096 Nov 23 16:32 hive2-dwdrwxrwxr-x. 2 hadoop hadoop 4096 Nov 23 16:32 hive2-etldrwxrwxr-x. 3 hadoop hadoop 4096 Nov 23 11:24 weblog[hadoop@hdp-node-01 wf-oozie]$ export OOZIE_URL=http://localhost:11000/oozie |
---|
2、工作流单元提交启动
oozie job -D inpath=/weblog/input -D outpath=/weblog/outpre-config weblog/job.properties -run
启动etl的hive工作流
oozie job -config hive2-etl/job.properties -run
启动pvs统计的hive工作流
oozie job -config hive2-dw/job.properties -run
3、工作流coordinator配置(片段)
多个工作流job用coordinator组织协调:
[hadoop@hdp-node-01 hive2-etl]$ lltotal 28-rw-rw-r–. 1 hadoop hadoop 265 Nov 13 16:39 config-default.xml-rw-rw-r–. 1 hadoop hadoop 512 Nov 26 16:43 coordinator.xml-rw-rw-r–. 1 hadoop hadoop 382 Nov 26 16:49 job.propertiesdrwxrwxr-x. 2 hadoop hadoop 4096 Nov 27 11:26 lib-rw-rw-r–. 1 hadoop hadoop 1910 Nov 23 17:49 script.q-rw-rw-r–. 1 hadoop hadoop 687 Nov 23 16:32 workflow.xml |
---|
l config-default.xml
<configuration><property><name>jobTracker</name><value>hdp-node-01:8032</value></property><property><name>nameNode</name><value>hdfs://hdp-node-01:9000</value></property><property><name>queueName</name><value>default</value></property></configuration> |
---|
l job.properties
user.name=hadoopoozie.use.system.libpath=trueoozie.libpath=hdfs://hdp-node-01:9000/user/hadoop/share/liboozie.wf.application.path=hdfs://hdp-node-01:9000/user/hadoop/oozie/myapps/hive2-etl/ |
---|
l workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.5" name="hive2-wf"><start to="hive2-node"/> <action name="hive2-node"><hive2 xmlns="uri:oozie:hive2-action:0.1"><job-tracker>${jobTracker}</job-tracker><name-node>${nameNode}</name-node><configuration><property><name>mapred.job.queue.name</name><value>${queueName}</value></property></configuration><jdbc-url>jdbc:hive2://hdp-node-01:10000</jdbc-url><script>script.q</script><param>input=/weblog/outpre2</param></hive2><ok to="end"/><error to="fail"/></action> <kill name="fail"><message>Hive2 (Beeline) action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message></kill><end name="end"/></workflow-app> |
---|
l coordinator.xml
<coordinator-app name="cron-coord" frequency="${coord:minutes(5)}" start="${start}" end="${end}" timezone="Asia/Shanghai" xmlns="uri:oozie:coordinator:0.2"><action><workflow><app-path>${workflowAppUri}</app-path><configuration><property><name>jobTracker</name><value>${jobTracker}</value></property><property><name>nameNode</name><value>${nameNode}</value></property><property><name>queueName</name><value>${queueName}</value></property></configuration></workflow></action></coordinator-app> |
---|
模块开发——数据展示
在企业的数据分析系统中,前端展现工具有很多,
l 独立部署专门系统的方式:以BusinessObjects(BO,Crystal Report),Heperion(Brio),Cognos等国外产品为代表的,它们的服务器是单独部署的,与应用程序之间通过某种协议沟通信息
l 有WEB程序展现方式:通过独立的或者嵌入式的java web系统来读取报表统计结果,以网页的形式对结果进行展现,如,100%纯Java的润乾报表
本日志分析项目采用自己开发web程序展现的方式
u Web展现程序采用的技术框架:
Jquery + Echarts + springmvc + spring + mybatis + mysql
u 展现的流程:
1. 使用ssh从mysql中读取要展现的数据
2. 使用json格式将读取到的数据返回给页面
3. 在页面上用echarts对json解析并形成图标
Web程序工程结构
采用maven管理工程,引入SSH框架依赖及jquery+echarts的js库

Web程序的实现代码
采用典型的MVC架构实现
页面 |
HTML + JQUERY + ECHARTS |
---|---|
Controller |
SpringMVC |
Service |
Service |
DAO |
Mybatis |
数据库 |
Mysql |
代码详情见项目工程
代码示例:ChartServiceImpl
@Service("chartService")public class ChartServiceImpl implements IChartService { @Autowired IEchartsDao iEchartsDao; public EchartsData getChartsData() { List<Integer> xAxiesList = iEchartsDao.getXAxiesList(""); List<Integer> pointsDataList = iEchartsDao.getPointsDataList(""); EchartsData data = new EchartsData(); ToolBox toolBox = EchartsOptionUtil.getToolBox(); Serie serie = EchartsOptionUtil.getSerie(pointsDataList); ArrayList<Serie> series = new ArrayList<Serie>(); series.add(serie); List<XAxi> xAxis = EchartsOptionUtil.getXAxis(xAxiesList); List<YAxi> yAxis = EchartsOptionUtil.getYAxis(); HashMap<String, String> title = new HashMap<String, String>(); title.put("text", "pvs"); title.put("subtext", "超级pvs"); HashMap<String, String> tooltip = new HashMap<String, String>(); tooltip.put("trigger", "axis"); HashMap<String, String[]> legend = new HashMap<String, String[]>(); legend.put("data", new String[]{"pv统计"}); data.setTitle(title); data.setTooltip(tooltip); data.setLegend(legend); data.setToolbox(toolBox); data.setCalculable(true); data.setxAxis(xAxis); data.setyAxis(yAxis); data.setSeries(series); return data; } public List<HashMap<String, Integer>> getGaiKuangList(String date) throws ParseException{ HashMap<String, Integer> gaiKuangToday = iEchartsDao.getGaiKuang(date); SimpleDateFormat sf = new SimpleDateFormat("MMdd"); Date parse = sf.parse(date); Calendar calendar = Calendar.getInstance(); calendar.setTime(parse); calendar.add(Calendar.DAY_OF_MONTH, -1); Date before = calendar.getTime(); String beforeString = sf.format(before); System.out.println(beforeString); HashMap<String, Integer> gaiKuangBefore = iEchartsDao.getGaiKuang(beforeString); ArrayList<HashMap<String, Integer>> gaiKuangList = new ArrayList<HashMap<String, Integer>>(); gaiKuangList.add(gaiKuangToday); gaiKuangList.add(gaiKuangBefore); return gaiKuangList; } public static void main(String[] args) { ChartServiceImpl chartServiceImpl = new ChartServiceImpl(); EchartsData chartsData = chartServiceImpl.getChartsData(); Gson gson = new Gson(); String json = gson.toJson(chartsData); System.out.println(json); }} |
---|
Web程序的展现效果
网站概况



流量分析


来源分析


访客分析

OVER,整个数据项目实战到此结束!