Go to file
judy0131 9e8e807b20
Update README.md
2020-05-21 17:22:14 +08:00
doc Add files via upload 2020-05-21 17:15:57 +08:00
piflow-bin 1. modify piflow-bin 2020-03-13 17:44:59 +08:00
piflow-bundle optimize code to support scalaExecutor in Group 2020-05-21 15:39:05 +08:00
piflow-configure optimize code to support scalaExecutor in Group 2020-05-21 15:39:05 +08:00
piflow-core new feature: execute scala code in stop 2020-05-20 19:15:06 +08:00
piflow-server optimize code to support scalaExecutor in Group 2020-05-21 15:39:05 +08:00
.gitignore fix putHiveQL bug 2019-07-29 13:52:19 +08:00
LICENSE init 2018-05-03 18:15:05 +08:00
README.md Update README.md 2020-05-21 17:22:14 +08:00
README_CN.md Update README_CN.md 2019-10-23 10:50:29 +08:00
config.properties update stop 2020-03-26 15:29:11 +08:00
piflow_V0.7_componets.md 创建V0.7各组件文档 2020-04-10 17:42:02 +08:00
pom.xml add piflow-configure module 2020-04-15 12:01:42 +08:00
readMe.txt Update readMe.txt 2020-03-25 17:40:17 +08:00

README.md


GitHub releases GitHub stars GitHub forks GitHub downloads GitHub issues GitHub license

πFlow is an easy to use, powerful big data pipeline system. Try PiFlow v0.6 with: http://piflow.cstcloud.cn/piflow-web/

Table of Contents

Features

  • Easy to use
    • provide a WYSIWYG web interface to configure data flow
    • monitor data flow status
    • check the logs of data flow
    • provide checkpoints
  • Strong scalability:
    • Support customized development of data processing components
  • Superior performance
    • based on distributed computing engine Spark
  • Powerful
    • 100+ data processing components available
    • include spark、mllib、hadoop、hive、hbase、solr、redis、memcache、elasticSearch、jdbc、mongodb、http、ftp、xml、csv、jsonetc.

Architecture

Requirements

  • JDK 1.8
  • Scala-2.11.8
  • Apache Maven 3.1.0 or newer
  • Git Client (used during build process by 'bower' plugin)
  • Spark-2.1.0、 Spark-2.2.0、 Spark-2.3.0
  • Hadoop-2.6.0

Getting Started

To Build:

  • install external package

        mvn install:install-file -Dfile=/.../piflow/piflow-bundle/lib/spark-xml_2.11-0.4.2.jar -DgroupId=com.databricks -DartifactId=spark-xml_2.11 -Dversion=0.4.2 -Dpackaging=jar
        mvn install:install-file -Dfile=/.../piflow/piflow-bundle/lib/java_memcached-release_2.6.6.jar -DgroupId=com.memcached -DartifactId=java_memcached-release -Dversion=2.6.6 -Dpackaging=jar
        mvn install:install-file -Dfile=/.../piflow/piflow-bundle/lib/ojdbc6-11.2.0.3.jar -DgroupId=oracle -DartifactId=ojdbc6 -Dversion=11.2.0.3 -Dpackaging=jar
        mvn install:install-file -Dfile=/.../piflow/piflow-bundle/lib/edtftpj.jar -DgroupId=ftpClient -DartifactId=edtftp -Dversion=1.0.0 -Dpackaging=jar
    
  • mvn clean package -Dmaven.test.skip=true

        [INFO] Replacing original artifact with shaded artifact.
        [INFO] Reactor Summary:
        [INFO]
        [INFO] piflow-project ..................................... SUCCESS [  4.369 s]
        [INFO] piflow-core ........................................ SUCCESS [01:23 min]
        [INFO] piflow-configure ................................... SUCCESS [ 12.418 s]
        [INFO] piflow-bundle ...................................... SUCCESS [02:15 min]
        [INFO] piflow-server ...................................... SUCCESS [02:05 min]
        [INFO] ------------------------------------------------------------------------
        [INFO] BUILD SUCCESS
        [INFO] ------------------------------------------------------------------------
        [INFO] Total time: 06:01 min
        [INFO] Finished at: 2020-05-21T15:22:58+08:00
        [INFO] Final Memory: 118M/691M
        [INFO] ------------------------------------------------------------------------
    

Run Piflow Server

Run Piflow Web

Use with command line

  • command line

    flow config example
        
          {
            "flow":{
            "name":"test",
            "uuid":"1234",
            "checkpoint":"Merge",
            "stops":[
            {
              "uuid":"1111",
              "name":"XmlParser",
              "bundle":"cn.piflow.bundle.xml.XmlParser",
              "properties":{
                  "xmlpath":"hdfs://10.0.86.89:9000/xjzhu/dblp.mini.xml",
                  "rowTag":"phdthesis"
              }
            },
            {
              "uuid":"2222",
              "name":"SelectField",
              "bundle":"cn.piflow.bundle.common.SelectField",
              "properties":{
                  "schema":"title,author,pages"
              }
    
        },
        {
          "uuid":"3333",
          "name":"PutHiveStreaming",
          "bundle":"cn.piflow.bundle.hive.PutHiveStreaming",
          "properties":{
              "database":"sparktest",
              "table":"dblp_phdthesis"
          }
        },
        {
          "uuid":"4444",
          "name":"CsvParser",
          "bundle":"cn.piflow.bundle.csv.CsvParser",
          "properties":{
              "csvPath":"hdfs://10.0.86.89:9000/xjzhu/phdthesis.csv",
              "header":"false",
              "delimiter":",",
              "schema":"title,author,pages"
          }
        },
        {
          "uuid":"555",
          "name":"Merge",
          "bundle":"cn.piflow.bundle.common.Merge",
          "properties":{
            "inports":"data1,data2"
          }
        },
        {
          "uuid":"666",
          "name":"Fork",
          "bundle":"cn.piflow.bundle.common.Fork",
          "properties":{
            "outports":"out1,out2,out3"
          }
        },
        {
          "uuid":"777",
          "name":"JsonSave",
          "bundle":"cn.piflow.bundle.json.JsonSave",
          "properties":{
            "jsonSavePath":"hdfs://10.0.86.89:9000/xjzhu/phdthesis.json"
          }
        },
        {
          "uuid":"888",
          "name":"CsvSave",
          "bundle":"cn.piflow.bundle.csv.CsvSave",
          "properties":{
            "csvSavePath":"hdfs://10.0.86.89:9000/xjzhu/phdthesis_result.csv",
            "header":"true",
            "delimiter":","
          }
        }
      ],
      "paths":[
        {
          "from":"XmlParser",
          "outport":"",
          "inport":"",
          "to":"SelectField"
        },
        {
          "from":"SelectField",
          "outport":"",
          "inport":"data1",
          "to":"Merge"
        },
        {
          "from":"CsvParser",
          "outport":"",
          "inport":"data2",
          "to":"Merge"
        },
        {
          "from":"Merge",
          "outport":"",
          "inport":"",
          "to":"Fork"
        },
        {
          "from":"Fork",
          "outport":"out1",
          "inport":"",
          "to":"PutHiveStreaming"
        },
        {
          "from":"Fork",
          "outport":"out2",
          "inport":"",
          "to":"JsonSave"
        },
        {
          "from":"Fork",
          "outport":"out3",
          "inport":"",
          "to":"CsvSave"
        }
      ]
    }
    

    }

    - flow config example

docker-started

  • pull piflow images
    docker pull registry.cn-hangzhou.aliyuncs.com/cnic_piflow/piflow:v0.6.1

  • show docker images
    docker images

  • run a container with piflow imageID all services run automatically
    docker run --name piflow-v0.6 -it [imageID]

  • please visit "containerip:6001/piflow-web", it may take a while

  • if somethings goes wrong, all the application are in /opt folder

use-interface

  • Login:

  • Flow list:

  • Create flow:

  • Configure flow:

  • Load flow:

  • Monitor flow:

  • Flow logs:

  • Group list:

  • Configure group:

  • Monitor group:

  • Process List:

  • Template List:

  • Save Template:

Welcome to join PiFlow User Group! Contact US
Name:吴老师
Mobile Phone18910263390
WeChat18910263390
Email: wzs@cnic.cn
QQ Group1003489545