Go to file
judy0131 1fe9f009e1 fix bug: when stop flow, the end time do not show 2019-01-15 15:23:29 +08:00
classpath fix bug 2018-10-17 18:13:37 +08:00
conf fix image bug 2018-10-19 14:37:17 +08:00
doc add icon 2018-12-24 17:44:42 +08:00
piflow-bundle show dataFrame when job is completed 2019-01-14 16:54:22 +08:00
piflow-core fix bug: when stop flow, the end time do not show 2019-01-15 15:23:29 +08:00
piflow-server fix bug: when stop flow, the end time do not show 2019-01-15 15:23:29 +08:00
testdata context.scala 2018-05-29 10:52:38 +08:00
.gitignore create conf module 2018-06-26 10:31:57 +08:00
LICENSE init 2018-05-03 18:15:05 +08:00
README.md Update README.md 2019-01-11 15:47:36 +08:00
config.properties show dataFrame when job is completed 2019-01-14 16:54:22 +08:00
pom.xml monitor flow info and stop info 2018-10-24 17:33:46 +08:00
readMe.txt Oracle database read and write and related driver package 2018-11-15 10:13:49 +08:00
readMeToo.txt Read and write mongodb, read and write Memcache, and complement Memcache. 2018-11-09 15:10:27 +08:00

README.md

is an easy to use, powerful big data pipeline system.

Table of Contents

Features

  • Easy to use
    • provide a WYSIWYG web interface to configure data flow
    • monitor big data flow status
    • check big data flow logs
    • provide checkpoint
  • Strong Scalability:
    • Support for custom development data processing components
  • Superior performance
    • based on distributed computing engine Spark
  • Powerful
    • 100+ data processing components available
    • include spark、mllib、hadoop、hive、hbase、solr、redis、memcache、elasticSearch、jdbc、mongodb、http、ftp、xml、csv、jsonetc.

Architecture

Requirements

  • JDK 1.8 or newer
  • Apache Maven 3.1.0 or newer
  • Git Client (used during build process by 'bower' plugin)
  • spark-2.1.0
  • hadoop-2.6.0

Getting Started

To Build: mvn clean package -Dmaven.test.skip=true

      [INFO] Replacing original artifact with shaded artifact.
      [INFO] Replacing /opt/project/piflow/piflow-server/target/piflow-server-0.9.jar with /opt/project/piflow/piflow-server/target/piflow-server-0.9-shaded.jar
      [INFO] ------------------------------------------------------------------------
      [INFO] Reactor Summary:
      [INFO] 
      [INFO] piflow-project ..................................... SUCCESS [  4.602 s]
      [INFO] piflow-core ........................................ SUCCESS [ 56.533 s]
      [INFO] piflow-bundle ...................................... SUCCESS [02:15 min]
      [INFO] piflow-server ...................................... SUCCESS [03:01 min]
      [INFO] ------------------------------------------------------------------------
      [INFO] BUILD SUCCESS
      [INFO] ------------------------------------------------------------------------
      [INFO] Total time: 06:18 min
      [INFO] Finished at: 2018-12-24T16:54:16+08:00
      [INFO] Final Memory: 41M/812M
      [INFO] ------------------------------------------------------------------------

To Run Piflow Server

  • configure config.properties

    #server ip and port
    server.ip=10.0.86.191
    server.port=8002
    h2.port=50002
    
    #spark and yarn config
    spark.master=yarn
    spark.deploy.mode=cluster
    yarn.resourcemanager.hostname=10.0.86.191
    yarn.resourcemanager.address=10.0.86.191:8032
    yarn.access.namenode=hdfs://10.0.86.191:9000
    yarn.stagingDir=hdfs://10.0.86.191:9000/tmp/
    yarn.jars=hdfs://10.0.86.191:9000/user/spark/share/lib/*.jar
    yarn.url=http://10.0.86.191:8088/ws/v1/cluster/apps/
    
    #hive config
    hive.metastore.uris=thrift://10.0.86.191:9083
    
    #piflow jar path
    piflow.bundle=/opt/piflowServer/piflow-server-0.9.jar
    
    #checkpoint hdfs path
    checkpoint.path=hdfs://10.0.86.89:9000/piflow/checkpoints/
    
  • you can run piflow server on intellij

    • main class is cn.piflow.api.Main
    • remember to set SPARK_HOME
  • you can run piflow server as follows:

    • download piflowServer:***
    • edit config.properties
    • run start.sh

To Run Piflow Web

  • todo

To Use

  • command line
    • flow config example

      {
        "flow":{
        "name":"test",
        "uuid":"1234",
        "checkpoint":"Merge",
        "stops":[
        {
          "uuid":"1111",
          "name":"XmlParser",
          "bundle":"cn.piflow.bundle.xml.XmlParser",
          "properties":{
              "xmlpath":"hdfs://10.0.86.89:9000/xjzhu/dblp.mini.xml",
              "rowTag":"phdthesis"
          }
        },
        {
          "uuid":"2222",
          "name":"SelectField",
          "bundle":"cn.piflow.bundle.common.SelectField",
          "properties":{
              "schema":"title,author,pages"
          }
      
        },
        {
          "uuid":"3333",
          "name":"PutHiveStreaming",
          "bundle":"cn.piflow.bundle.hive.PutHiveStreaming",
          "properties":{
              "database":"sparktest",
              "table":"dblp_phdthesis"
          }
        },
        {
          "uuid":"4444",
          "name":"CsvParser",
          "bundle":"cn.piflow.bundle.csv.CsvParser",
          "properties":{
              "csvPath":"hdfs://10.0.86.89:9000/xjzhu/phdthesis.csv",
              "header":"false",
              "delimiter":",",
              "schema":"title,author,pages"
          }
        },
        {
          "uuid":"555",
          "name":"Merge",
          "bundle":"cn.piflow.bundle.common.Merge",
          "properties":{
            "inports":"data1,data2"
          }
        },
        {
          "uuid":"666",
          "name":"Fork",
          "bundle":"cn.piflow.bundle.common.Fork",
          "properties":{
            "outports":"out1,out2,out3"
          }
        },
        {
          "uuid":"777",
          "name":"JsonSave",
          "bundle":"cn.piflow.bundle.json.JsonSave",
          "properties":{
            "jsonSavePath":"hdfs://10.0.86.89:9000/xjzhu/phdthesis.json"
          }
        },
        {
          "uuid":"888",
          "name":"CsvSave",
          "bundle":"cn.piflow.bundle.csv.CsvSave",
          "properties":{
            "csvSavePath":"hdfs://10.0.86.89:9000/xjzhu/phdthesis_result.csv",
            "header":"true",
            "delimiter":","
          }
        }
      ],
      "paths":[
        {
          "from":"XmlParser",
          "outport":"",
          "inport":"",
          "to":"SelectField"
        },
        {
          "from":"SelectField",
          "outport":"",
          "inport":"data1",
          "to":"Merge"
        },
        {
          "from":"CsvParser",
          "outport":"",
          "inport":"data2",
          "to":"Merge"
        },
        {
          "from":"Merge",
          "outport":"",
          "inport":"",
          "to":"Fork"
        },
        {
          "from":"Fork",
          "outport":"out1",
          "inport":"",
          "to":"PutHiveStreaming"
        },
        {
          "from":"Fork",
          "outport":"out2",
          "inport":"",
          "to":"JsonSave"
        },
        {
          "from":"Fork",
          "outport":"out3",
          "inport":"",
          "to":"CsvSave"
        }
      ]
      

      } }

    • curl -0 -X POST http://10.0.86.191:8002/flow/start -H "Content-type: application/json" -d 'this is your flow json'

  • piflow web