数据分析高级教程(三)

  • 2019 年 10 月 6 日
  • 筆記

工作流单元测试

1、工作流定义配置上传

[hadoop@hdp-node-01 wf-oozie]$ hadoop fs -put hive2-etl /user/hadoop/oozie/myapps/[hadoop@hdp-node-01 wf-oozie]$ hadoop fs -put hive2-dw /user/hadoop/oozie/myapps/ [hadoop@hdp-node-01 wf-oozie]$ lltotal 12drwxrwxr-x. 2 hadoop hadoop 4096 Nov 23 16:32 hive2-dwdrwxrwxr-x. 2 hadoop hadoop 4096 Nov 23 16:32 hive2-etldrwxrwxr-x. 3 hadoop hadoop 4096 Nov 23 11:24 weblog[hadoop@hdp-node-01 wf-oozie]$ export OOZIE_URL=http://localhost:11000/oozie

2、工作流单元提交启动

oozie job -D inpath=/weblog/input -D outpath=/weblog/outpre-config weblog/job.properties -run

启动etl的hive工作流

oozie job -config hive2-etl/job.properties -run

启动pvs统计的hive工作流

oozie job -config hive2-dw/job.properties -run

3、工作流coordinator配置(片段)

多个工作流job用coordinator组织协调:

[hadoop@hdp-node-01 hive2-etl]$ lltotal 28-rw-rw-r–. 1 hadoop hadoop 265 Nov 13 16:39 config-default.xml-rw-rw-r–. 1 hadoop hadoop 512 Nov 26 16:43 coordinator.xml-rw-rw-r–. 1 hadoop hadoop 382 Nov 26 16:49 job.propertiesdrwxrwxr-x. 2 hadoop hadoop 4096 Nov 27 11:26 lib-rw-rw-r–. 1 hadoop hadoop 1910 Nov 23 17:49 script.q-rw-rw-r–. 1 hadoop hadoop 687 Nov 23 16:32 workflow.xml

l config-default.xml

<configuration><property><name>jobTracker</name><value>hdp-node-01:8032</value></property><property><name>nameNode</name><value>hdfs://hdp-node-01:9000</value></property><property><name>queueName</name><value>default</value></property></configuration>

l job.properties

user.name=hadoopoozie.use.system.libpath=trueoozie.libpath=hdfs://hdp-node-01:9000/user/hadoop/share/liboozie.wf.application.path=hdfs://hdp-node-01:9000/user/hadoop/oozie/myapps/hive2-etl/

l workflow.xml

<workflow-app xmlns="uri:oozie:workflow:0.5" name="hive2-wf"><start to="hive2-node"/> <action name="hive2-node"><hive2 xmlns="uri:oozie:hive2-action:0.1"><job-tracker>${jobTracker}</job-tracker><name-node>${nameNode}</name-node><configuration><property><name>mapred.job.queue.name</name><value>${queueName}</value></property></configuration><jdbc-url>jdbc:hive2://hdp-node-01:10000</jdbc-url><script>script.q</script><param>input=/weblog/outpre2</param></hive2><ok to="end"/><error to="fail"/></action> <kill name="fail"><message>Hive2 (Beeline) action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message></kill><end name="end"/></workflow-app>

l coordinator.xml

<coordinator-app name="cron-coord" frequency="${coord:minutes(5)}" start="${start}" end="${end}" timezone="Asia/Shanghai" xmlns="uri:oozie:coordinator:0.2"><action><workflow><app-path>${workflowAppUri}</app-path><configuration><property><name>jobTracker</name><value>${jobTracker}</value></property><property><name>nameNode</name><value>${nameNode}</value></property><property><name>queueName</name><value>${queueName}</value></property></configuration></workflow></action></coordinator-app>

模块开发——数据展示

在企业的数据分析系统中,前端展现工具有很多,

l 独立部署专门系统的方式:以BusinessObjects(BO,Crystal Report),Heperion(Brio),Cognos等国外产品为代表的,它们的服务器是单独部署的,与应用程序之间通过某种协议沟通信息

l 有WEB程序展现方式:通过独立的或者嵌入式的java web系统来读取报表统计结果,以网页的形式对结果进行展现,如,100%纯Java的润乾报表

本日志分析项目采用自己开发web程序展现的方式

u Web展现程序采用的技术框架:

Jquery + Echarts + springmvc + spring + mybatis + mysql

u 展现的流程:

1. 使用ssh从mysql中读取要展现的数据

2. 使用json格式将读取到的数据返回给页面

3. 在页面上用echarts对json解析并形成图标

Web程序工程结构

采用maven管理工程,引入SSH框架依赖及jquery+echarts的js库

Web程序的实现代码

采用典型的MVC架构实现

页面

HTML + JQUERY + ECHARTS

Controller

SpringMVC

Service

Service

DAO

Mybatis

数据库

Mysql

代码详情见项目工程

代码示例:ChartServiceImpl

@Service("chartService")public class ChartServiceImpl implements IChartService { @Autowired IEchartsDao iEchartsDao; public EchartsData getChartsData() { List<Integer> xAxiesList = iEchartsDao.getXAxiesList(""); List<Integer> pointsDataList = iEchartsDao.getPointsDataList(""); EchartsData data = new EchartsData(); ToolBox toolBox = EchartsOptionUtil.getToolBox(); Serie serie = EchartsOptionUtil.getSerie(pointsDataList); ArrayList<Serie> series = new ArrayList<Serie>(); series.add(serie); List<XAxi> xAxis = EchartsOptionUtil.getXAxis(xAxiesList); List<YAxi> yAxis = EchartsOptionUtil.getYAxis(); HashMap<String, String> title = new HashMap<String, String>(); title.put("text", "pvs"); title.put("subtext", "超级pvs"); HashMap<String, String> tooltip = new HashMap<String, String>(); tooltip.put("trigger", "axis"); HashMap<String, String[]> legend = new HashMap<String, String[]>(); legend.put("data", new String[]{"pv统计"}); data.setTitle(title); data.setTooltip(tooltip); data.setLegend(legend); data.setToolbox(toolBox); data.setCalculable(true); data.setxAxis(xAxis); data.setyAxis(yAxis); data.setSeries(series); return data; } public List<HashMap<String, Integer>> getGaiKuangList(String date) throws ParseException{ HashMap<String, Integer> gaiKuangToday = iEchartsDao.getGaiKuang(date); SimpleDateFormat sf = new SimpleDateFormat("MMdd"); Date parse = sf.parse(date); Calendar calendar = Calendar.getInstance(); calendar.setTime(parse); calendar.add(Calendar.DAY_OF_MONTH, -1); Date before = calendar.getTime(); String beforeString = sf.format(before); System.out.println(beforeString); HashMap<String, Integer> gaiKuangBefore = iEchartsDao.getGaiKuang(beforeString); ArrayList<HashMap<String, Integer>> gaiKuangList = new ArrayList<HashMap<String, Integer>>(); gaiKuangList.add(gaiKuangToday); gaiKuangList.add(gaiKuangBefore); return gaiKuangList; } public static void main(String[] args) { ChartServiceImpl chartServiceImpl = new ChartServiceImpl(); EchartsData chartsData = chartServiceImpl.getChartsData(); Gson gson = new Gson(); String json = gson.toJson(chartsData); System.out.println(json); }}

Web程序的展现效果

网站概况

流量分析

来源分析

访客分析

OVER,整个数据项目实战到此结束!