368 lines
12 KiB
368 lines
12 KiB

πFlow is an easy to use, powerful big data pipeline system.
Try PiFlow v0.6 with: http://piflow.cstcloud.cn/piflow-web/
## Table of Contents
- [Features](#features)
- [Architecture](#architecture)
- [Requirements](#requirements)
- [Getting Started](#getting-started)
- [PiFlow Docker](#docker-started)
- [Use Interface](#use-interface)
## Features
- Easy to use
- provide a WYSIWYG web interface to configure data flow
- monitor data flow status
- check the logs of data flow
- provide checkpoints
- Strong scalability:
- Support customized development of data processing components
- Superior performance
- based on distributed computing engine Spark
- Powerful
- 100+ data processing components available
- include spark、mllib、hadoop、hive、hbase、solr、redis、memcache、elasticSearch、jdbc、mongodb、http、ftp、xml、csv、json,etc.
## Architecture

## Requirements
* JDK 1.8
* Scala-2.11.8
* Apache Maven 3.1.0 or newer
* Spark-2.1.0、 Spark-2.2.0、 Spark-2.3.0
* Hadoop-2.6.0
## Getting Started
### To Build:
- `install external package`
mvn install:install-file -Dfile=/../piflow/piflow-bundle/lib/spark-xml_2.11-0.4.2.jar -DgroupId=com.databricks -DartifactId=spark-xml_2.11 -Dversion=0.4.2 -Dpackaging=jar
mvn install:install-file -Dfile=/../piflow/piflow-bundle/lib/java_memcached-release_2.6.6.jar -DgroupId=com.memcached -DartifactId=java_memcached-release -Dversion=2.6.6 -Dpackaging=jar
mvn install:install-file -Dfile=/../piflow/piflow-bundle/lib/ojdbc6- -DgroupId=oracle -DartifactId=ojdbc6 -Dversion= -Dpackaging=jar
mvn install:install-file -Dfile=/../piflow/piflow-bundle/lib/edtftpj.jar -DgroupId=ftpClient -DartifactId=edtftp -Dversion=1.0.0 -Dpackaging=jar
- `mvn clean package -Dmaven.test.skip=true`
[INFO] Replacing original artifact with shaded artifact.
[INFO] Reactor Summary:
[INFO] piflow-project ..................................... SUCCESS [ 4.369 s]
[INFO] piflow-core ........................................ SUCCESS [01:23 min]
[INFO] piflow-configure ................................... SUCCESS [ 12.418 s]
[INFO] piflow-bundle ...................................... SUCCESS [02:15 min]
[INFO] piflow-server ...................................... SUCCESS [02:05 min]
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 06:01 min
[INFO] Finished at: 2020-05-21T15:22:58+08:00
[INFO] Final Memory: 118M/691M
[INFO] ------------------------------------------------------------------------
### Run Piflow Server:
- `run piflow server on Intellij`:
- download piflow: git clone https://github.com/cas-bigdatalab/piflow.git
- import piflow into Intellij
- edit config.properties file
- build piflow to generate piflow jar:
- Edit Configurations --> Add New Configuration --> Maven
- Name: package
- Command line: clean package -Dmaven.test.skip=true -X
- run 'package' (piflow jar file will be built in ../piflow/piflow-server/target/piflow-server-0.9.jar)
- run HttpService:
- Edit Configurations --> Add New Configuration --> Application
- Name: HttpService
- Main class : cn.piflow.api.Main
- Environment Variable: SPARK_HOME=/opt/spark-2.2.0-bin-hadoop2.6(change the path to your spark home)
- run 'HttpService'
- test HttpService:
- run /../piflow/piflow-server/src/main/scala/cn/piflow/api/HTTPClientStartMockDataFlow.scala
- change the piflow server ip and port to your configure
- `run piflow server by release version`:
- download piflow.tar.gz:
- unzip piflow.tar.gz:
tar -zxvf piflow.tar.gz
- edit config.properties
- run start.sh、stop.sh、 restart.sh、 status.sh
- test piflow server
- vim /etc/profile
export PIFLOW_HOME=/yourPiflowPath/bin
- command
piflow flow start example/mockDataFlow.json
piflow flow stop appID
piflow flow info appID
piflow flow log appID
piflow flowGroup start example/mockDataGroup.json
piflow flowGroup stop groupId
piflow flowGroup info groupId
- `how to configure config.properties`
#spark and yarn config
#hdfs default file system
#yarn resourcemanager.hostname
#if you want to use hive, set hive metastore uris
#show data in log, set 0 if you do not want to show data in logs
#server port
#h2db port
### Run Piflow Web:
- https://github.com/cas-bigdatalab/piflow-web
### Restful API:
- flow json
<summary>flow example</summary>
- command:
- curl -0 -X POST -H "Content-type: application/json" -d 'this is your flow json'
## docker-started
- pull piflow images
docker pull registry.cn-hangzhou.aliyuncs.com/cnic_piflow/piflow:v0.6.1
- show docker images
docker images
- run a container with piflow imageID , all services run automatically
docker run --name piflow-v0.6 -it [imageID]
- please visit "containerip:6001/piflow-web", it may take a while
- if somethings goes wrong, all the application are in /opt folder,
## use-interface
- `Login`:

- `Flow list`:

- `Create flow`:

- `Configure flow`:

- `Load flow`:

- `Monitor flow`:

- `Flow logs`:

- `Group list`:

- `Configure group`:

- `Monitor group`:

- `Process List`:

- `Template List`:

- `Save Template`:

Welcome to join PiFlow User Group! Contact US
Mobile Phone:18910263390
Email: wzs@cnic.cn
QQ Group:1003489545
