Go to file
judy0131 0324839fd1
Update README.md
2021-01-20 11:16:56 +08:00
classpath replace piflowexternal.jar 2021-01-18 10:14:48 +08:00
doc Add files via upload 2020-09-30 20:14:16 +08:00
flowFile add flowFile folder 2020-07-06 04:09:37 -04:00
piflow-bin replace piflowexternal.jar 2021-01-18 10:14:48 +08:00
piflow-bundle add icon 2020-12-28 04:41:32 -05:00
piflow-configure fix scala.NotImplementedError of getIcon 2021-01-18 13:56:54 +08:00
piflow-core fixbug: getSparkJarPath 2020-11-04 05:06:20 -05:00
piflow-server fix bug in local mode 2021-01-04 05:30:09 -05:00
.gitignore fix putHiveQL bug 2019-07-29 13:52:19 +08:00
LICENSE init 2018-05-03 18:15:05 +08:00
PiFlow_V0.6_Deployment_Instructions.md Update PiFlow_V0.6_Deployment_Instructions.md 2020-06-02 13:52:53 +08:00
PiFlow_V0.6_User_Guide.md rename doc file 2020-06-01 10:47:47 +08:00
PiFlow_V0.7_Componets.md rename doc file 2020-06-01 10:47:47 +08:00
PiFlow_V0.7_Deployment_Instructions.md Update PiFlow_V0.7_Deployment_Instructions.md 2020-06-02 13:58:08 +08:00
PiFlow_V0.7_User_Guide.md Update PiFlow_V0.7_User_Guide.md 2020-06-04 19:34:31 +08:00
README.md Update README.md 2021-01-20 11:16:56 +08:00
README_CN.md Update README_CN.md 2020-06-04 16:48:07 +08:00
config.properties update stop 2020-03-26 15:29:11 +08:00
piflow使用文档-v0.8.md Update piflow使用文档-v0.8.md 2020-09-29 21:03:37 +08:00
piflow流水线数据处理组件说明书V0.9.md Add files via upload 2020-11-30 21:01:05 +08:00
piflow组件开发说明书-v0.8.md Update piflow组件开发说明书-v0.8.md 2020-09-29 21:42:00 +08:00
pom.xml modify scala version 2020-07-07 00:51:09 -04:00
readMe.txt modify flow example 2020-05-22 16:37:01 +08:00

README.md


GitHub releases GitHub stars GitHub forks GitHub downloads GitHub issues GitHub license

πFlow is an easy to use, powerful big data pipeline system. Try PiFlow v0.6 with: http://piflow.cstcloud.cn/piflow-web/

Table of Contents

Features

  • Easy to use
    • provide a WYSIWYG web interface to configure data flow
    • monitor data flow status
    • check the logs of data flow
    • provide checkpoints
  • Strong scalability:
    • Support customized development of data processing components
  • Superior performance
    • based on distributed computing engine Spark
  • Powerful
    • 100+ data processing components available
    • include spark、mllib、hadoop、hive、hbase、solr、redis、memcache、elasticSearch、jdbc、mongodb、http、ftp、xml、csv、jsonetc.

Architecture

Requirements

  • JDK 1.8
  • Scala-2.11.8
  • Apache Maven 3.1.0 or newer
  • Spark-2.1.0、 Spark-2.2.0、 Spark-2.3.0
  • Hadoop-2.6.0

Getting Started

To Build:

  • install external package

        mvn install:install-file -Dfile=/../piflow/piflow-bundle/lib/spark-xml_2.11-0.4.2.jar -DgroupId=com.databricks -DartifactId=spark-xml_2.11 -Dversion=0.4.2 -Dpackaging=jar
        mvn install:install-file -Dfile=/../piflow/piflow-bundle/lib/java_memcached-release_2.6.6.jar -DgroupId=com.memcached -DartifactId=java_memcached-release -Dversion=2.6.6 -Dpackaging=jar
        mvn install:install-file -Dfile=/../piflow/piflow-bundle/lib/ojdbc6-11.2.0.3.jar -DgroupId=oracle -DartifactId=ojdbc6 -Dversion=11.2.0.3 -Dpackaging=jar
        mvn install:install-file -Dfile=/../piflow/piflow-bundle/lib/edtftpj.jar -DgroupId=ftpClient -DartifactId=edtftp -Dversion=1.0.0 -Dpackaging=jar
    
  • mvn clean package -Dmaven.test.skip=true

        [INFO] Replacing original artifact with shaded artifact.
        [INFO] Reactor Summary:
        [INFO]
        [INFO] piflow-project ..................................... SUCCESS [  4.369 s]
        [INFO] piflow-core ........................................ SUCCESS [01:23 min]
        [INFO] piflow-configure ................................... SUCCESS [ 12.418 s]
        [INFO] piflow-bundle ...................................... SUCCESS [02:15 min]
        [INFO] piflow-server ...................................... SUCCESS [02:05 min]
        [INFO] ------------------------------------------------------------------------
        [INFO] BUILD SUCCESS
        [INFO] ------------------------------------------------------------------------
        [INFO] Total time: 06:01 min
        [INFO] Finished at: 2020-05-21T15:22:58+08:00
        [INFO] Final Memory: 118M/691M
        [INFO] ------------------------------------------------------------------------
    

Run Piflow Server

Run Piflow Web

Restful API

  • flow json

    flow example
        
          {
    "flow": {
      "name": "MockData",
      "executorMemory": "1g",
      "executorNumber": "1",
      "uuid": "8a80d63f720cdd2301723b7461d92600",
      "paths": [
        {
          "inport": "",
          "from": "MockData",
          "to": "ShowData",
          "outport": ""
        }
      ],
      "executorCores": "1",
      "driverMemory": "1g",
      "stops": [
        {
          "name": "MockData",
          "bundle": "cn.piflow.bundle.common.MockData",
          "uuid": "8a80d63f720cdd2301723b7461d92604",
          "properties": {
            "schema": "title:String, author:String, age:Int",
            "count": "10"
          },
          "customizedProperties": {
    
      }
    },
    {
      "name": "ShowData",
      "bundle": "cn.piflow.bundle.external.ShowData",
      "uuid": "8a80d63f720cdd2301723b7461d92602",
      "properties": {
        "showNumber": "5"
      },
      "customizedProperties": {
    
      }
    }
    

    ] } }

  • CURL POST

  • Command line

    • set PIFLOW_HOME
      vim /etc/profile
      export PIFLOW_HOME=/yourPiflowPath/piflow-bin
      export PATH=$PATH:$PIFLOW_HOME/bin

    • command example
      piflow flow start yourFlow.json
      piflow flow stop appID
      piflow flow info appID
      piflow flow log appID

      piflow flowGroup start yourFlowGroup.json
      piflow flowGroup stop groupId
      piflow flowGroup info groupId

docker-started

  • pull piflow images
    docker pull registry.cn-hangzhou.aliyuncs.com/cnic_piflow/piflow:v0.9
    docker pull registry.cn-hangzhou.aliyuncs.com/cnic_piflow/piflow:v0.7.1
    docker pull registry.cn-hangzhou.aliyuncs.com/cnic_piflow/piflow:v0.6.1

  • show docker images
    docker images

  • run a container with piflow imageID all services run automatically
    docker run --name piflow -it [imageID]

  • please visit "containerip:6001", it may take a while

  • if somethings goes wrong, all the application are in /opt folder

  • You can use port mapping to access piflow on the host docker run --name piflow -it -p 6001:6001 -p 6002:6002 [imageID] please visit 'host:6001'

use-interface

  • Login:

  • Dashboard:

  • Flow list:

  • Create flow:

  • Configure flow:

  • Load flow:

  • Monitor flow:

  • Flow logs:

  • Group list:

  • Configure group:

  • Monitor group:

  • Process List:

  • Template List:

  • DataSource List:

  • Schedule List:

  • StopHub List:

Contact Us

  • Name:吴老师
  • Mobile Phone18910263390
  • WeChat18910263390
  • Email: wzs@cnic.cn
  • QQ Group1003489545
  • WeChat group is valid for 7 days