2019-09-16 17:27:40 +08:00
|
|
|
|
![](https://github.com/cas-bigdatalab/piflow/blob/master/doc/piflow-logo2.png)
|
2019-09-16 17:23:05 +08:00
|
|
|
|
[![GitHub releases](https://img.shields.io/github/release/cas-bigdatalab/piflow.svg)](https://github.com/cas-bigdatalab/piflow/releases)
|
|
|
|
|
[![GitHub stars](https://img.shields.io/github/stars/cas-bigdatalab/piflow.svg)](https://github.com/cas-bigdatalab/piflow/stargazers)
|
|
|
|
|
[![GitHub forks](https://img.shields.io/github/forks/cas-bigdatalab/piflow.svg)](https://github.com/cas-bigdatalab/piflow/network)
|
|
|
|
|
[![GitHub downloads](https://img.shields.io/github/downloads/cas-bigdatalab/piflow/total.svg)](https://github.com/cas-bigdatalab/piflow/releases)
|
|
|
|
|
[![GitHub issues](https://img.shields.io/github/issues/cas-bigdatalab/piflow.svg)](https://github.com/cas-bigdatalab/piflow/issues)
|
|
|
|
|
[![GitHub license](https://img.shields.io/github/license/cas-bigdatalab/piflow.svg)](https://github.com/cas-bigdatalab/piflow/blob/master/LICENSE)
|
2019-09-11 14:21:34 +08:00
|
|
|
|
|
|
|
|
|
πFlow is an easy to use, powerful big data pipeline system.
|
2020-05-21 16:12:16 +08:00
|
|
|
|
Try PiFlow v0.6 with: http://piflow.cstcloud.cn/piflow-web/
|
2018-12-24 15:59:18 +08:00
|
|
|
|
## Table of Contents
|
|
|
|
|
|
|
|
|
|
- [Features](#features)
|
2018-12-24 17:45:55 +08:00
|
|
|
|
- [Architecture](#architecture)
|
2018-12-24 15:59:18 +08:00
|
|
|
|
- [Requirements](#requirements)
|
|
|
|
|
- [Getting Started](#getting-started)
|
2020-02-19 17:09:47 +08:00
|
|
|
|
- [PiFlow Docker](#docker-started)
|
2020-02-19 17:14:34 +08:00
|
|
|
|
- [Use Interface](#use-interface)
|
2018-12-24 15:59:18 +08:00
|
|
|
|
|
|
|
|
|
## Features
|
|
|
|
|
|
|
|
|
|
- Easy to use
|
|
|
|
|
- provide a WYSIWYG web interface to configure data flow
|
2019-03-14 16:50:41 +08:00
|
|
|
|
- monitor data flow status
|
|
|
|
|
- check the logs of data flow
|
|
|
|
|
- provide checkpoints
|
|
|
|
|
- Strong scalability:
|
|
|
|
|
- Support customized development of data processing components
|
2018-12-24 15:59:18 +08:00
|
|
|
|
- Superior performance
|
|
|
|
|
- based on distributed computing engine Spark
|
|
|
|
|
- Powerful
|
|
|
|
|
- 100+ data processing components available
|
|
|
|
|
- include spark、mllib、hadoop、hive、hbase、solr、redis、memcache、elasticSearch、jdbc、mongodb、http、ftp、xml、csv、json,etc.
|
2018-12-24 17:45:55 +08:00
|
|
|
|
|
|
|
|
|
## Architecture
|
|
|
|
|
![](https://github.com/cas-bigdatalab/piflow/blob/master/doc/architecture.png)
|
2018-12-24 15:59:18 +08:00
|
|
|
|
## Requirements
|
2020-05-21 16:12:16 +08:00
|
|
|
|
* JDK 1.8
|
|
|
|
|
* Scala-2.11.8
|
2018-12-24 15:59:18 +08:00
|
|
|
|
* Apache Maven 3.1.0 or newer
|
|
|
|
|
* Git Client (used during build process by 'bower' plugin)
|
2020-05-21 16:12:16 +08:00
|
|
|
|
* Spark-2.1.0、 Spark-2.2.0、 Spark-2.3.0
|
2019-03-14 16:50:41 +08:00
|
|
|
|
* Hadoop-2.6.0
|
2018-12-24 15:59:18 +08:00
|
|
|
|
|
|
|
|
|
## Getting Started
|
2018-12-24 17:37:20 +08:00
|
|
|
|
|
2020-05-21 16:24:16 +08:00
|
|
|
|
### To Build:
|
2020-05-21 16:28:34 +08:00
|
|
|
|
- `install external package`
|
2020-05-21 16:18:30 +08:00
|
|
|
|
|
|
|
|
|
mvn install:install-file -Dfile=/.../piflow/piflow-bundle/lib/spark-xml_2.11-0.4.2.jar -DgroupId=com.databricks -DartifactId=spark-xml_2.11 -Dversion=0.4.2 -Dpackaging=jar
|
|
|
|
|
mvn install:install-file -Dfile=/.../piflow/piflow-bundle/lib/java_memcached-release_2.6.6.jar -DgroupId=com.memcached -DartifactId=java_memcached-release -Dversion=2.6.6 -Dpackaging=jar
|
|
|
|
|
mvn install:install-file -Dfile=/.../piflow/piflow-bundle/lib/ojdbc6-11.2.0.3.jar -DgroupId=oracle -DartifactId=ojdbc6 -Dversion=11.2.0.3 -Dpackaging=jar
|
|
|
|
|
mvn install:install-file -Dfile=/.../piflow/piflow-bundle/lib/edtftpj.jar -DgroupId=ftpClient -DartifactId=edtftp -Dversion=1.0.0 -Dpackaging=jar
|
|
|
|
|
|
|
|
|
|
|
2020-05-21 16:24:16 +08:00
|
|
|
|
- `mvn clean package -Dmaven.test.skip=true`
|
2018-12-24 17:37:20 +08:00
|
|
|
|
|
|
|
|
|
[INFO] Replacing original artifact with shaded artifact.
|
|
|
|
|
[INFO] Reactor Summary:
|
2020-05-21 16:18:30 +08:00
|
|
|
|
[INFO]
|
|
|
|
|
[INFO] piflow-project ..................................... SUCCESS [ 4.369 s]
|
|
|
|
|
[INFO] piflow-core ........................................ SUCCESS [01:23 min]
|
|
|
|
|
[INFO] piflow-configure ................................... SUCCESS [ 12.418 s]
|
2018-12-24 17:37:20 +08:00
|
|
|
|
[INFO] piflow-bundle ...................................... SUCCESS [02:15 min]
|
2020-05-21 16:18:30 +08:00
|
|
|
|
[INFO] piflow-server ...................................... SUCCESS [02:05 min]
|
2018-12-24 17:37:20 +08:00
|
|
|
|
[INFO] ------------------------------------------------------------------------
|
|
|
|
|
[INFO] BUILD SUCCESS
|
|
|
|
|
[INFO] ------------------------------------------------------------------------
|
2020-05-21 16:18:30 +08:00
|
|
|
|
[INFO] Total time: 06:01 min
|
|
|
|
|
[INFO] Finished at: 2020-05-21T15:22:58+08:00
|
|
|
|
|
[INFO] Final Memory: 118M/691M
|
2018-12-24 17:37:20 +08:00
|
|
|
|
[INFO] ------------------------------------------------------------------------
|
|
|
|
|
|
2020-05-21 16:24:16 +08:00
|
|
|
|
### Run Piflow Server:
|
2019-03-14 16:50:41 +08:00
|
|
|
|
|
|
|
|
|
- `run piflow server on intellij`:
|
|
|
|
|
- edit config.properties
|
2020-05-21 16:12:16 +08:00
|
|
|
|
- build piflow to generate piflow-server-0.9.jar
|
2019-03-14 16:50:41 +08:00
|
|
|
|
- main class is cn.piflow.api.Main(remember to set SPARK_HOME)
|
|
|
|
|
|
|
|
|
|
- `run piflow server by release version`:
|
2020-05-21 16:24:16 +08:00
|
|
|
|
- download piflow.tar.gz:
|
2020-05-21 16:26:42 +08:00
|
|
|
|
https://github.com/cas-bigdatalab/piflow/releases/download/v0.5/piflow.tar.gz
|
|
|
|
|
https://github.com/cas-bigdatalab/piflow/releases/download/v0.6/piflow-server-v0.6.tar.gz
|
|
|
|
|
https://github.com/cas-bigdatalab/piflow/releases/download/v0.7/piflow-server-v0.7.tar.gz
|
2020-05-21 16:12:16 +08:00
|
|
|
|
|
2020-05-21 16:28:34 +08:00
|
|
|
|
- unzip piflow.tar.gz:
|
2020-05-21 16:12:16 +08:00
|
|
|
|
tar -zxvf piflow.tar.gz
|
|
|
|
|
|
2019-03-14 16:50:41 +08:00
|
|
|
|
- edit config.properties
|
2020-05-21 16:12:16 +08:00
|
|
|
|
- run start.sh、stop.sh、 restart.sh、 status.sh
|
2019-03-14 16:50:41 +08:00
|
|
|
|
- `how to configure config.properties`
|
2020-05-21 16:04:15 +08:00
|
|
|
|
|
2018-12-24 17:37:20 +08:00
|
|
|
|
#spark and yarn config
|
|
|
|
|
spark.master=yarn
|
|
|
|
|
spark.deploy.mode=cluster
|
2019-03-14 16:50:41 +08:00
|
|
|
|
|
2020-05-21 16:04:15 +08:00
|
|
|
|
#hdfs default file system
|
|
|
|
|
fs.defaultFS=hdfs://10.0.86.191:9000
|
|
|
|
|
|
|
|
|
|
#yarn resourcemanager.hostname
|
|
|
|
|
yarn.resourcemanager.hostname=10.0.86.191
|
2019-03-14 16:50:41 +08:00
|
|
|
|
|
2020-05-21 16:04:15 +08:00
|
|
|
|
#if you want to use hive, set hive metastore uris
|
|
|
|
|
#hive.metastore.uris=thrift://10.0.88.71:9083
|
2019-03-14 16:50:41 +08:00
|
|
|
|
|
2020-05-21 16:04:15 +08:00
|
|
|
|
#show data in log, set 0 if you do not want to show data in logs
|
2019-03-14 16:50:41 +08:00
|
|
|
|
data.show=10
|
|
|
|
|
|
2020-05-21 16:04:15 +08:00
|
|
|
|
#server port
|
|
|
|
|
server.port=8002
|
|
|
|
|
|
|
|
|
|
#h2db port
|
2019-03-14 16:50:41 +08:00
|
|
|
|
h2.port=50002
|
2020-05-21 16:04:15 +08:00
|
|
|
|
|
2018-12-24 17:37:20 +08:00
|
|
|
|
|
2020-05-21 16:24:16 +08:00
|
|
|
|
### Run Piflow Web:
|
2019-03-18 16:15:21 +08:00
|
|
|
|
- https://github.com/cas-bigdatalab/piflow-web
|
2018-12-24 17:37:20 +08:00
|
|
|
|
|
2020-05-21 16:24:16 +08:00
|
|
|
|
### Use with command line:
|
2018-12-24 17:37:20 +08:00
|
|
|
|
|
|
|
|
|
- command line
|
|
|
|
|
- flow config example
|
|
|
|
|
|
|
|
|
|
{
|
|
|
|
|
"flow":{
|
|
|
|
|
"name":"test",
|
|
|
|
|
"uuid":"1234",
|
|
|
|
|
"checkpoint":"Merge",
|
|
|
|
|
"stops":[
|
|
|
|
|
{
|
|
|
|
|
"uuid":"1111",
|
|
|
|
|
"name":"XmlParser",
|
|
|
|
|
"bundle":"cn.piflow.bundle.xml.XmlParser",
|
|
|
|
|
"properties":{
|
|
|
|
|
"xmlpath":"hdfs://10.0.86.89:9000/xjzhu/dblp.mini.xml",
|
|
|
|
|
"rowTag":"phdthesis"
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"uuid":"2222",
|
|
|
|
|
"name":"SelectField",
|
|
|
|
|
"bundle":"cn.piflow.bundle.common.SelectField",
|
|
|
|
|
"properties":{
|
|
|
|
|
"schema":"title,author,pages"
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"uuid":"3333",
|
|
|
|
|
"name":"PutHiveStreaming",
|
|
|
|
|
"bundle":"cn.piflow.bundle.hive.PutHiveStreaming",
|
|
|
|
|
"properties":{
|
|
|
|
|
"database":"sparktest",
|
|
|
|
|
"table":"dblp_phdthesis"
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"uuid":"4444",
|
|
|
|
|
"name":"CsvParser",
|
|
|
|
|
"bundle":"cn.piflow.bundle.csv.CsvParser",
|
|
|
|
|
"properties":{
|
|
|
|
|
"csvPath":"hdfs://10.0.86.89:9000/xjzhu/phdthesis.csv",
|
|
|
|
|
"header":"false",
|
|
|
|
|
"delimiter":",",
|
|
|
|
|
"schema":"title,author,pages"
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"uuid":"555",
|
|
|
|
|
"name":"Merge",
|
|
|
|
|
"bundle":"cn.piflow.bundle.common.Merge",
|
|
|
|
|
"properties":{
|
|
|
|
|
"inports":"data1,data2"
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"uuid":"666",
|
|
|
|
|
"name":"Fork",
|
|
|
|
|
"bundle":"cn.piflow.bundle.common.Fork",
|
|
|
|
|
"properties":{
|
|
|
|
|
"outports":"out1,out2,out3"
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"uuid":"777",
|
|
|
|
|
"name":"JsonSave",
|
|
|
|
|
"bundle":"cn.piflow.bundle.json.JsonSave",
|
|
|
|
|
"properties":{
|
|
|
|
|
"jsonSavePath":"hdfs://10.0.86.89:9000/xjzhu/phdthesis.json"
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"uuid":"888",
|
|
|
|
|
"name":"CsvSave",
|
|
|
|
|
"bundle":"cn.piflow.bundle.csv.CsvSave",
|
|
|
|
|
"properties":{
|
|
|
|
|
"csvSavePath":"hdfs://10.0.86.89:9000/xjzhu/phdthesis_result.csv",
|
|
|
|
|
"header":"true",
|
|
|
|
|
"delimiter":","
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"paths":[
|
|
|
|
|
{
|
|
|
|
|
"from":"XmlParser",
|
|
|
|
|
"outport":"",
|
|
|
|
|
"inport":"",
|
|
|
|
|
"to":"SelectField"
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"from":"SelectField",
|
|
|
|
|
"outport":"",
|
|
|
|
|
"inport":"data1",
|
|
|
|
|
"to":"Merge"
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"from":"CsvParser",
|
|
|
|
|
"outport":"",
|
|
|
|
|
"inport":"data2",
|
|
|
|
|
"to":"Merge"
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"from":"Merge",
|
|
|
|
|
"outport":"",
|
|
|
|
|
"inport":"",
|
|
|
|
|
"to":"Fork"
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"from":"Fork",
|
|
|
|
|
"outport":"out1",
|
|
|
|
|
"inport":"",
|
|
|
|
|
"to":"PutHiveStreaming"
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"from":"Fork",
|
|
|
|
|
"outport":"out2",
|
|
|
|
|
"inport":"",
|
|
|
|
|
"to":"JsonSave"
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"from":"Fork",
|
|
|
|
|
"outport":"out3",
|
|
|
|
|
"inport":"",
|
|
|
|
|
"to":"CsvSave"
|
|
|
|
|
}
|
|
|
|
|
]
|
|
|
|
|
}
|
|
|
|
|
}
|
2019-01-11 15:47:36 +08:00
|
|
|
|
- curl -0 -X POST http://10.0.86.191:8002/flow/start -H "Content-type: application/json" -d 'this is your flow json'
|
2020-02-19 17:09:47 +08:00
|
|
|
|
|
|
|
|
|
## docker-started
|
2020-02-19 17:15:24 +08:00
|
|
|
|
- pull piflow images
|
2020-04-05 19:29:40 +08:00
|
|
|
|
docker pull registry.cn-hangzhou.aliyuncs.com/cnic_piflow/piflow:v0.6.1
|
2020-02-19 17:09:47 +08:00
|
|
|
|
|
|
|
|
|
- show docker images
|
|
|
|
|
docker images
|
|
|
|
|
|
2020-02-19 17:15:24 +08:00
|
|
|
|
- run a container with piflow imageID , all services run automatically
|
2020-02-20 11:25:14 +08:00
|
|
|
|
docker run --name piflow-v0.6 -it [imageID]
|
2020-02-19 17:09:47 +08:00
|
|
|
|
|
|
|
|
|
- please visit "containerip:6001/piflow-web", it may take a while
|
|
|
|
|
|
|
|
|
|
- if somethings goes wrong, all the application are in /opt folder,
|
|
|
|
|
|
2020-02-19 17:14:34 +08:00
|
|
|
|
## use-interface
|
2020-05-21 16:00:22 +08:00
|
|
|
|
- `Login`:
|
|
|
|
|
|
2020-02-19 17:09:47 +08:00
|
|
|
|
![](https://github.com/cas-bigdatalab/piflow/blob/master/doc/piflow-login.png)
|
2020-05-21 16:00:22 +08:00
|
|
|
|
|
|
|
|
|
- `Flow list`:
|
|
|
|
|
|
2019-03-19 10:35:48 +08:00
|
|
|
|
![](https://github.com/cas-bigdatalab/piflow/blob/master/doc/piflow-flowlist.png)
|
2020-05-21 16:00:22 +08:00
|
|
|
|
|
|
|
|
|
- `Create flow`:
|
|
|
|
|
|
2019-02-26 16:38:32 +08:00
|
|
|
|
![](https://github.com/cas-bigdatalab/piflow/blob/master/doc/piflow-createflow.png)
|
2020-05-21 16:00:22 +08:00
|
|
|
|
|
|
|
|
|
- `Configure flow`:
|
|
|
|
|
|
2019-03-18 17:00:26 +08:00
|
|
|
|
![](https://github.com/cas-bigdatalab/piflow/blob/master/doc/piflow-flowconfig.png)
|
2020-05-21 16:00:22 +08:00
|
|
|
|
|
|
|
|
|
- `Load flow`:
|
|
|
|
|
|
2019-02-26 16:38:32 +08:00
|
|
|
|
![](https://github.com/cas-bigdatalab/piflow/blob/master/doc/piflow-loadflow.png)
|
2020-05-21 16:00:22 +08:00
|
|
|
|
|
|
|
|
|
- `Monitor flow`:
|
|
|
|
|
|
2019-02-26 16:38:32 +08:00
|
|
|
|
![](https://github.com/cas-bigdatalab/piflow/blob/master/doc/piflow-monitor.png)
|
2020-05-21 16:00:22 +08:00
|
|
|
|
|
|
|
|
|
- `Flow logs`:
|
|
|
|
|
|
2019-02-26 16:38:32 +08:00
|
|
|
|
![](https://github.com/cas-bigdatalab/piflow/blob/master/doc/piflow-log.png)
|
2020-05-21 16:00:22 +08:00
|
|
|
|
|
|
|
|
|
- `Group list`:
|
|
|
|
|
|
|
|
|
|
- `Configure group`:
|
|
|
|
|
|
|
|
|
|
- `Monitor group`:
|
|
|
|
|
|
|
|
|
|
- `Process List`:
|
|
|
|
|
|
2019-02-26 16:38:32 +08:00
|
|
|
|
![](https://github.com/cas-bigdatalab/piflow/blob/master/doc/piflow-processlist.png)
|
2020-05-21 16:00:22 +08:00
|
|
|
|
|
|
|
|
|
- `Template List`:
|
|
|
|
|
|
2019-02-26 16:38:32 +08:00
|
|
|
|
![](https://github.com/cas-bigdatalab/piflow/blob/master/doc/piflow-templatelist.png)
|
2020-05-21 16:00:22 +08:00
|
|
|
|
|
|
|
|
|
- `Save Template`:
|
|
|
|
|
|
2019-02-26 16:38:32 +08:00
|
|
|
|
![](https://github.com/cas-bigdatalab/piflow/blob/master/doc/piflow-savetemplate.png)
|
2019-09-09 09:32:13 +08:00
|
|
|
|
|
2019-12-21 11:17:40 +08:00
|
|
|
|
Welcome to join PiFlow User Group! Contact US
|
|
|
|
|
Name:吴老师
|
|
|
|
|
Mobile Phone:18910263390
|
|
|
|
|
WeChat:18910263390
|
|
|
|
|
Email: wzs@cnic.cn
|
|
|
|
|
QQ Group:1003489545
|
2019-12-21 11:25:09 +08:00
|
|
|
|
![](https://github.com/cas-bigdatalab/piflow/blob/master/doc/PiFlowUserGroup_QQ.jpeg)
|
2019-09-11 14:31:52 +08:00
|
|
|
|
|
2019-12-21 11:14:45 +08:00
|
|
|
|
|
2019-09-09 09:32:13 +08:00
|
|
|
|
|