CCAI provides many FCGI APIs. They are named `fcgi_xxxx`. Each fcgi API is a fcgi server, running in the background. Client APPs communicate with the fcgi server by using http post protocol.
![image7.png](/temp/image7.png)
These fcgi APIs will do AI for different cases, such as classification, face detection, OCR, TTS, or ASR. Please refer to the following API list to understand the specific API.
Some fcgi APIs have two working modes. One mode is doing inference locally in the `fcgi_xxxx` server, the other one is proxy mode. In proxy mode, the `fcgi_xxxx` server forwards requests from client apps to the remote server (such as QQ server or Tencent server), the remote server does inference. In which mode the `fcgi_xxxx` server works is decided by configuration file (*policy_setting.cfg*) or the result of policy calculation.
The following picture shows two working modes.
![image5.png](/temp/image5.png)
Some FCGI APIs are implemented by two languages, C++ and python. So some APIs have two types of API: python API and C++ API. Both python API and C++ API provide the same functionality and parameters. The only difference is they have different http addresses. So clients' apps can get the same inference result from either FCGI C++ API or python API by using different addresses.
## TTS API usage
`fcgi_tts` API is used for text-to-speech. This is an end-to-end TTS API. Client app inputs one text sentence, `fcgi_tts` outputs the wave data of the text sentence. The wave data is the sound data. There are two paths for the wave data generated. The first path is that the wave data is written to a wav file. The second path is that the wave data is sent to the speakers directly, so you can hear the sentence from the speaker devices.
There are two working modes for `fcgi_tts` server, local mode and proxy mode.
Client app uses http protocol to communicate with `fcgi_tts` server.
The sample code of sending post request in client app is:
```
response = requests.post(url, post_parameter)
```
The following are the detailed information about request parameters and response.
- a) Input parameters
- http url: such as: url= 'http://localhost:8080/cgi-bin/fcgi_py_tts'
- post parameter: this parameter should include these fields:
| Field name | Type | Range | Example | comments |
| ------- | ------ |-----|----|-----|
|'aht'| Int|[-24, 24]|0|increase(+)/descread(-) amount of semitone for generated speech|
In local mode(doing inference locally), only a “text” field is needed to set, other fields are ignored.
In proxy mode(doing inference on a remote server), all fields are needed to set.
In proxy mode, ‘appid’ and ‘appkey’ are the necessary parameters in order to get the right results from the remote server (`www.ai.qq.com`). You should register on `www.ai.qq.com` and get ‘appid’ and ‘appkey’. Please refer to https://ai.qq.com/doc/aaitts.shtml , find out how to apply these fields and how to write a post request for the remote server.
- b) Response
The response of post request is json format, for example:
```json
{
"ret": 0, //return value: 0 (success) or 1(failure)
"msg": "ok", // request result: “ok” or “inference failed”
"data": { //inference result
"format": 2, // the format of voice : 1(pcm) 2(wav) 3(mp3)
"speech": "UklGRjL4Aw..." // wave data of input sentence
"md5sum": "3bae7bf99ad32bc2880ef1938ba19590" //Base64 encoding of synthesized speech
},
"time": 7.283 //fcgi_tts processing time
}
```
If the speaker devices are configured correctly, you can also hear the sentence directly from the speakers.
One example of a client app for fcgi_tts API is “*api-gateway/cgi-bin/test-script/test-demo/post_local_tts_py.py*”.
- c) Notice
Currently, this model only supports English text, not Chinese text.
It provides only python API.
To configure the speaker devices, you need to enable the pulseaudio and health-monitor services by following the following steps:
(1) On the host PC, install the pulseaudio package if this package hasn't been installed.
For example:
``` $> sudo apt-get install pulseaudio ```
(2) Enable the TCP protocol of the pulseaudio.
Edit the configuration file. for example:
``` $> sudo vim /etc/pulse/default.pa ```
Find out the following tcp configuration:
```#load-module module-native-protocol-tcp```
Uncomment the tcp configuration(remove "#"):
``` load-module module-native-protocol-tcp```
Save and quit the configuration file.
(3) Restart the pulseaudio service. For example:
``` $> sudo systemctl restart pulseaudio ```
(4) Running the health-monitor service on the host pc if you don't run it.
This service is used to monitor the CCAI container.
## ASR API usage (offline ASR case)
`fcgi_asr` API is a usage of Automatic-Speech-Recognition. This is an end-to-end speech recognition. It includes several libraries released by the OpenVINO™ toolkit. These libraries perform feature extraction, OpenVINO™-based neural-network speech recognition, and decoding to produce text from scores. All these libraries provide an end-to-end pipeline converting speech to text. Client app inputs an utterance (speech), `fcgi_asr` outputs the text directly expressed by this utterance.
Same as `fcgi_tts`, `fcgi_asr` also has two working modes, local mode and proxy mode.
Client app uses http protocol to communicate with `fcgi_asr` server.
The sample code of sending post request in client app is:
```
response = requests.post(url, post_parameter)
```
The following are the detailed information about request parameters and response.
- a) Input parameters
- http url: such as: url= 'http://localhost:8080/cgi-bin/fcgi_asr'
- post parameter: this parameter should include these fields:
In local mode(doing inference locally), only a “speech” field is needed to be set.
In proxy mode(doing inference on a remote server), all fields are needed to be set.
In proxy mode, ‘appid’ and ‘appkey’ are the necessary parameters in order to get the right results from the remote server(`www.ai.qq.com`). You should register on `www.ai.qq.com` and get ‘appid’ and ‘appkey’. Please refer to https://ai.qq.com/doc/aaiasr.shtml , find out how to apply these fields and how to write a post request for the remote server.
- b) Response
The response of post request is json format, for example:
```json
{
"ret":0, //return value: 0 (success) or 1(failure)
"msg":"ok", // request result: “ok” or “inference error”
"data":{ //inference result
"text":HOW ARE YOU DOING //text
},
"time":0.695 //fcgi_asr processing time
}
```
One example of a client app for fcgi_asr API is “*api-gateway/cgi-bin/test-script/test-demo/post_local_asr_c.py*”.
- c) Notice
Currently, this model only supports English utterance, not Mathaland.
It provides two types of APIs: both C++ and python API.
## API in Speech sample
`fcgi_speech` API is used for inference speech. The acoustic model is trained on Kaldi * neural networks. The input speech data must be speech feature vectors. The feature vector is ARK format (ARK file - the result of feature extraction). The inference result is score data, which is also ARK format.
Client app uses http protocol to communicate with `fcgi_speech` server.
The following is the detailed information about request parameters and response.
- a) Input parameters
- http url: such as: url= 'http://localhost:8080/cgi-bin/fcgi_speech'
- post parameter: this parameter should include these fields:
|Field name|Type|Value|comments|
|----|---|------|-----|
|'stage'|string|{'RAW_FORMAT_INIT', 'IR_FORMAT_INIT_NETWORK', 'IR_FORMAT_INIT_EXENETWORK', 'INFERENCE'}|Only have 4 items|
|'model'|string|Example: './models/wsj_dnn5b.xml'|IR format file or no IR format model|
|'batch'|int|Positive integer. Example: 1 or 8|Set based on the real case|
|'device'|string|Example: 'GNA_AUTO' or ‘CPU’|Select the inference device|
|'scale_factor'|int|Positive integer Example: 2048|Used for GNA HW|
|'speech'|string|Speech input vector data|Must be encoded by base64 method|
|'time_stamp'|int|Positive integer|Time stamp for this request.|
The `fcgi_speech` uses a finite state machine to record the behavior. Client apps should use different ‘stage’ requests to trigger translation of `fcgi_speech` behavior.
For IR format model, the sample of post requests sequence is:
The First post request is init request
``` c++
['stage'] = 'IR_FORMAT_INIT_NETWORK'
['model'] = './models/wsj_dnn5b.xml'
['batch'] = 8
```
The second post request is also init request:
```c++
['stage'] = 'IR_FORMAT_INIT_EXENETWORK '
['model'] = './models/wsj_dnn5b.xml'
['device'] = 'GNA_AUTO'
```
The last post request is for inference:
```c++
['stage'] = 'INFERENCE'
['model'] = './models/wsj_dnn5b.xml'
['speech'] = base64_data
```
For IR format model, the sample of post requests sequence is: (two requests only)
The First post request is init request:
```//c++//
['stage'] = 'RAW_FORMAT_INIT'
['model'] = './models/ELE_M.raw'
['batch'] = 1
['device'] = 'GNA_AUTO'
['scale_factor'] = 2048
```
The second post request which is also the last request is for inference:
```
['stage'] = 'INFERENCE'
['model'] = './models/ELE_M.raw'
['speech'] = base64_data
```
- b) Response
The response of post request is json format, for example:
```
{
"ret":0, //return value: 0 (success) or 1(failure)
"msg":"ok", // request result: “ok” or “inference error”
"data":{ ….. // inference result
………… // response data
},
"time":0.344222 //fcgi_speech processing time
}
```
One example of a client app for fcgi_speech API is “*api-gateway/cgi-bin/test-script/test-demo/post_local_speech_c.py*”.
- c) Notice
The `fcgi_speech` API doesn’t have proxy mode. That means this API doesn’t support doing inference on remote servers.
This API can use GNA_HW as a reference device.
It provides only C++ API.
## Policy API usage
`fcgi_policy` API is used to select inference devices or working mode(local model or proxy mode) for fcgi APIs.
Client app uses http protocol to communicate with `fcgi_policy` server.
The sample of sending request in client app is:
```
response = requests.post(url, post_parameter)
```
The following is the detailed information about request parameters and response.
- a) Input parameters
- http url: such as: url= 'http://localhost:8080/cgi-bin/fcgi_policy'
- post parameter: this parameter should include these fields.
|Field name|Type|Value|comments|
|----|---|-----|-----|
|'device'|string|CPU, GPU, GNA_AUTO, GNA_HW, GNA_SW|This field is used to set inference devices. Such as “GPU”, “CPU” etc. |
|'local'|string|“1” - do inference locally “0” - do inference on a remote server|Select working mode of fcgi server: local mode or proxy mode|
- b) Response
The response of the post request is a string, which indicates whether the request is processed correctly.
```
“successfully set the policy daemon" // OK
"failed to set policy daemon" // Fail
```
- c) Notice
The policy daemon must be run, or else calling this API will fail.
Run this policy API before running any other case if you want to select an inference device or change working mode of fcgi APIs.
This setting is a global setting. That means the setting will impact the following cases.
It provides two types of APIs: C++ and python API.
## Classification API usage
`fcgi_classification` API is used to run inference on an image, and produce the classification information for objects in the image. Client app inputs one picture(image), `fcgi_classification` outputs the object information, such as what the object is, and the coordinates of the object in the picture.
Same as `fcgi_tts`, `fcgi_classification` also has two working modes, locol mode and proxy mode.
Client app uses http protocol to communicate with `fcgi_classification` server.
The sample code of sending post request in client app is:
```
response = requests.post(url, post_parameter)
```
The following are the detailed information about request parameters and response.
- a) Input parameters
- http url: such as: url= 'http://localhost:8080/cgi-bin/fcgi_classfication'
- post parameter: this parameter should include these fields:
In Local mode(doing inference locally), only an “image” field is needed to be set.
In proxy mode(doing inference on a remote server), all fields are needed to be set.
In proxy mode, ‘appid’ and ‘appkey’ are the necessary parameters in order to get the right results from the remote server(`www.ai.qq.com`). You should register on `www.ai.qq.com` and get ‘appid’ and ‘appkey’. Please refer to https://ai.qq.com/doc/imagetag.shtml , find out how to apply these fields and how to write a post request for the remote server.
- b) Response
The response of post request is json format, for example:
```//json//
{
"ret":0,
"msg":"ok",
"data":{
"tag_list":[
{"tag_name":'sandal',"tag_confidence":0.786503}
]
},
"time":0.380
}
```
One example of a client app for fcgi_classification API is “*api-gateway/cgi-bin/test-script/test-demo/post_local_classification_c.py* ”.
- c) Notice
It provides two types of APIs: both C++ and python API.
## Face Detection API usage
`fcgi_face_detection` API is used to run inference on an image, and find out human faces in the image. Client app inputs one picture(image), `fcgi_face_detection` outputs the face information, such as how many human faces, and the bounding box for each face in the picture.
Same as `fcgi_tts`, `fcgi_face_detection` also has two working modes, local mode and proxy mode.
Client app uses http protocol to communicate with `fcgi_face_detection` server.
The sample code of sending post request in client app is:
```
response = requests.post(url, post_parameter)
```
The following are the detailed information about request parameters and response.
- a) Input parameters
- http url: such as: url= 'http://localhost:8080/cgi-bin/fcgi_face_detection'
- post parameter: this parameter should include these fields:
In Local mode(doing inference locally), only an “image” field is needed to be set.
In proxy mode(doing inference on a remote server), all fields are needed to be set.
In proxy mode, ‘appid’ and ‘appkey’ are the necessary parameters in order to get the right results from the remote server(`www.ai.qq.com`). You should register on `www.ai.qq.com` and get ‘appid’ and ‘appkey’. Please refer to https://ai.qq.com/doc/detectface.shtml , find out how to apply these fields and how to write a post request for the remote server.
- b) Response
The response of post request is json format, for example:
```//json//
{
"ret":0,
"msg":"ok",
"data":{
"face_list":[
{
"x1":655,
"y1":124,
"x2":783,
"y2":304
},
{
"x1":68,
"y1":149,
"x2":267,
"y2":367
} ]
},
"time":0.305
}
```
One example of a client app for fcgi_face_detection API is “*api-gateway/cgi-bin/test-script/test-demo/post_local_face_detection_c.py* ”.
- c) Notice
It provides two types of API: both C++ and python API.
## Facial Landmark API usage
`fcgi_facial_landmark` API is used to run inference on an image, and print human facial landmarks in the image. Client app inputs one picture(image), `fcgi_facial_landmark` outputs the coordinates of facial landmark points.
Same as `fcgi_tts`, `fcgi_facial_landmark` also has two working modes, local mode and proxy mode.
Client app uses http protocol to communicate with `fcgi_facial_landmark` server.
The sample code of sending post request in client app is:
```
response = requests.post(url, post_parameter)
```
The following are the detailed information about request parameters and response.
- a) Input parameters
- http url: such as: url= 'http://localhost:8080/cgi-bin/fcgi_facial_landmark'
- post parameter: this parameter should include these fields:
In Local mode(doing inference locally), only an “image” field is needed to be set.
In proxy mode(doing inference on a remote server), all fields are needed to be set.
In proxy mode, ‘appid’ and ‘appkey’ are the necessary parameters in order to get the right results from the remote server(`www.ai.qq.com`). You should register on `www.ai.qq.com` and get ‘appid’ and ‘appkey’. Please refer to https://ai.qq.com/doc/detectface.shtml , find out how to apply these fields and how to write a post request for the remote server.
- b) Response
The response of post request is json format, for example:
```json
{
"ret":0,
"msg":"ok",
"data":{
"image_width":916.000000,
"image_height":502.000000,
"face_shape_list":[
{"x":684.691284,
"y":198.765793},
{"x":664.316528,
"y":195.681824},
……
{"x":241.314194,
"y":211.847031} ]
},
"time":0.623
}
```
One example of a client app for fcgi_facial_landmark API is “*api-gateway/cgi-bin/test-script/test-demo/post_local_facial_landmark_c.py* ”.
- c) Notice
It provides two types of API: both C++ and python API.
## OCR API usage
`fcgi_ocr` API is used to run inference on an image, and recognize handwritten or printed text from an image. Client app inputs one picture(image), `fcgi_ocr` outputs the text information in the picture. The information includes text coordinations and text confidence.
Same as `fcgi_tts`, `fcgi_ocr` also has two working modes, local mode and proxy mode.
Client app uses http protocol to communicate with `fcgi_ocr` server.
The sample code of sending post request in client app is:
```
response = requests.post(url, post_parameter)
```
The following are the detailed information about request parameters and response.
- a) Input parameters
- http url: such as: url= 'http://localhost:8080/cgi-bin/fcgi_ocr'
- post parameter: this parameter should include these fields:
In Local mode(doing inference locally), only an “image” field is needed to be set.
In proxy mode(doing inference on a remote server), all fields are needed to be set.
In proxy mode, ‘appid’ and ‘appkey’ are the necessary parameters in order to get the right results from the remote server(`www.ai.qq.com`). You should register on `www.ai.qq.com` and get ‘appid’ and ‘appkey’. Please refer to https://ai.qq.com/doc/imgtotext.shtml , find out how to apply these fields and how to write a post request for the remote server.
- b) Response
The response of post request is json format, for example:
```//json//
{
"ret":0,
"msg":"ok",
"data":{
"item_list":[
{
"itemcoord":[
{
"x":161.903748,
"y":91.755684,
"width":141.737503,
"height":81.645004
}
],
"words":[
{
"character":i,
"confidence":0.999999
},
{
"character":n,
"confidence":0.999998
},
{
"character":t,
"confidence":0.621934
},
{
"character":e,
"confidence":0.999999
},
{
"character":l,
"confidence":0.999995
} ],
"itemstring":intel
},
{
"itemcoord":[
{
"x":205.378326,
"y":153.429291,
"width":175.314835,
"height":77.421722
}
],
"words":[
{
"character":i,
"confidence":1.000000
},
{
"character":n,
"confidence":1.000000
},
{
"character":s,
"confidence":1.000000
},
{
"character":i,
"confidence":0.776524
},
{
"character":d,
"confidence":1.000000
},
{
"character":e,
"confidence":1.000000
} ],
"itemstring":inside
} ]
},
"time":1.986
}
```
One example of a client app for fcgi_ocr API is “*api-gateway/cgi-bin/test-script/test-demo/post_local_ocr_c.py* ”.
- c) Notice
It provides two types of API: both C++ and python API.
## formula API usage
`fcgi_formula` API is used to run inference on an image. It can recognize formulas and output formulas in latex format. Client app inputs one picture(image),` fcgi_formula` outputs the formula in latex format.
`fcgi_formula` has only one working mode, local mode.
Client app uses http protocol to communicate with `fcgi_formula` server.
The sample code of sending post request in client app is:
```
response = requests.post(url, post_parameter)
```
The following are the detailed information about request parameters and response.
- a) Input parameters
- http url: such as: url= 'http://localhost:8080/cgi-bin/fcgi_py_formula'
- post parameter: this parameter should include these fields:
In Local mode(doing inference locally), only an “image” field is needed to be set.
- b) Response
The response of post request is json format, for example:
```
{'ret': 0, 'msg': 'ok', 'data': '1 1 1 v v ^ { 1 } + 7 . 7 9 o ^ { 1 } - o - 0 . 9 0 f ^ { 7 } s ^ { 7 }', 'time': 0.518}
```
One example of a client app for fcgi_ocr API is “*api-gateway/cgi-bin/test-script/test-demo/post_local_formula_py.py* ”.
- c) Notice
It provides only python API.
## handwritten API usage
`fcgi_handwritten` API is used to run inference on an image, and recognize handwritten chinese from an image. Client app inputs one picture(image), `fcgi_handwritten` outputs the text information in the picture.
`fcgi_handwritten` has one working mode, local mode.
Client app uses http protocol to communicate with `fcgi_handwritten` server.
The sample code of sending post request in client app is:
```
response = requests.post(url, post_parameter)
```
The following are the detailed information about request parameters and response.
- a) Input parameters
- http url: such as: url= 'http://localhost:8080/cgi-bin/fcgi_py_handwritten'
- post parameter: this parameter should include these fields:
One example of a client app for fcgi_handwritten API is “*api-gateway/cgi-bin/test-script/test-demo/post_local_handwritten_py.py* ”.
- c) Notice
It provides only python API.
## ppocr API usage
`fcgi_ppocr` API is used to run inference on an image, and recognize printed text from an image. Client app inputs one picture(image), `fcgi_ppocr` outputs the text information in the picture.
`fcgi_ppocr` has one working mode, local mode.
Client app uses http protocol to communicate with `fcgi_ppocr` server.
The sample code of sending post request in client app is:
```
response = requests.post(url, post_parameter)
```
The following are the detailed information about request parameters and response.
- a) Input parameters
- http url: such as: url= 'http://localhost:8080/cgi-bin/fcgi_py_ppocr'
- post parameter: this parameter should include these fields:
One example of a client app for fcgi_ppocr API is “*api-gateway/cgi-bin/test-script/test-demo/post_local_ppocr_py.py* ”.
- c) Notice
- It provides only python API.
## segmentation API usage
`fcgi_segmentation` API is used to run inference on an image, and recognize semantic segmentation from an image. Client app inputs one picture(image), `fcgi_segmentation` outputs a semantic segmentation picture.
`fcgi_segmentation` has one working mode, local mode.
Client app uses http protocol to communicate with `fcgi_segmentation` server.
The sample code of sending post request in client app is:
```
response = requests.post(url, post_parameter)
```
The following are the detailed information about request parameters and response.
- a) Input parameters
- http url: such as: url= 'http://localhost:8080/cgi-bin/fcgi_segmentation'
- post parameter: this parameter should include these fields:
One example of a client app for fcgi_segmentation API is “*api-gateway/cgi-bin/test-script/test-demo/post_local_segmentation_c.py* ”.
- c) Notice
It provides two types of API: both C++ and python API.
## super resolution API usage
`fcgi_super_resolution` API is used to run inference on an image, and convert a small picture to a large picture. Client app inputs one picture(image), `fcgi_super_resolution` outputs a large picture.
`fcgi_super_resolution` has one working mode, local mode.
Client app uses http protocol to communicate with `fcgi_super_resolution` server.
The sample code of sending post request in client app is:
```
response = requests.post(url, post_parameter)
```
The following are the detailed information about request parameters and response.
- a) Input parameters
- http url: such as: url= 'http://localhost:8080/cgi-bin/fcgi_super_resolution'
- post parameter: this parameter should include these fields:
One example of a client app for fcgi_super_resolution API is “*api-gateway/cgi-bin/test-script/test-demo/post_local_super_resolution_c.py *”.
- c) Notice
It provides two types of API: both C++ and python API.
## digitalnote API usage
digitalnote API is used to run inference on an image, .Recognize and output the handwriting, machine writing and formulas in the picture. Client app inputs one picture(image), `fcgi_digitalnote` outputs the handwriting, machine writing and formulas in the picture.
`fcgi_digitalnote` only has local mode and does not have remote mode.
Client app uses http protocol to communicate with `fcgi_digitalnote` server.
The sample code of sending post request in client app is:
```
response = requests.post(url, post_parameter)
```
The following are the detailed information about request parameters and response.
- a) Input parameters
- http url: such as: url= 'http://localhost:8080/cgi-bin/fcgi_digitalnote'
- post parameter: this parameter should include these fields:
|‘latex’|string|string|"365 234 "|Pixel coordinates of latex|
|‘handwritten’|string|string|"354 39 431 123 "|Pixel coordinates of latex|
|‘html’|int|{0,1}|0|0 for terminal client 1 for html clinet|
In Local mode(doing inference locally), “image” field ,formula, latex and html are needed to be set. Find the coordinates of a pixel in the formula from the picture and fill in the latex field. Find the coordinates of a pixel in the handwritten from the picture and fill in the handwritten field. Use spaces to connect coordinates. If you use a terminal client the html field is 0.
- b) Response
The response of post request is json format, for example
One example of a client app for fcgi_super_resolution API is “*api-gateway/cgi-bin/test-script/test-demo/post_local_digitalnote_c.py*”.
- c) Notice
It only provides python API.
It can speed up the inference. The picture does not need to be sent three times to get three different results. Handwriting, machine writing and formula can be called by one request.
## Video pipeline management (control) API usage
Video pipeline API is used to start or stop a video pipeline.
The following are the detailed information about request parameters and response.
- a) Request
- url: such as: url 'http://localhost:8080/cgi-bin/streaming'
|"parameter"|string|JSON string|"{"source":"device=/dev/video0","sink":"v4l2sink device=/dev/video2","resolution":"width=800,height=600" }"|optional, example is the default value|
- example:
```
$ curl -H "Content-Type:application/json" -X POST \
The response is a string, “0” means success, “1” means failure.
## Live ASR API usage (online ASR case)
`fcgi_live_asr` API is also a usage of Automatic-Speech-Recognition. It uses the same models as `fcgi_asr` API(ASR API usage in 10.1.2). The difference is that this API is an online ASR case while 10.1.2 is an offline ASR case. That means this live asr API continuously captures the voice from the MIC devices, do inference, and send out the sentences what the voice expressed.
`fcgi_live_asr` case has only one working mode - local mode. It doesn’t support proxy mode.
Client app uses http protocol to communicate with `fcgi_live_asr` server.
The sample code of sending post request in client app is:
```
response = requests.post(url, post_parameter)
```
The following are the detailed information about request parameters and response.
- a) Input parameters
- http url: such as: url= 'http://localhost:8080/cgi-bin/fcgi_live_asr'
- post parameter: this parameter should include these fields:
|Field name|Type|Range|Example|comments|
|----|---|----|----|-----|
|'mode'|Int|0,1,2|0 |To control the running mode of fcgi_live_asr service: 0: starting the live asr service 1: do inference and get the result sentences. 2: stop live ar service|
- b) Response
The response of post request is json format, for example:
```
Starting live asr ok!
HOW ARE YOU DOING
HELLO
HA
…………..
Stop live asr ok!
```
One example of a client app for fcgi_live_asr API is “*api-gateway/cgi-bin/test-script/test-demo/post_local_live_asr.py*”.
- c) Notice
Currently, this model only supports English utterance, not Mathaland.
It only provides C++ APIs.
In order to use this API, you need to enable the pulseaudio and health-monitor services.
(1) On the host PC, install the pulseaudio package if this package hasn't been installed.
For example:
```//json//
$> sudo apt-get install pulseaudio
```
(2) Enable the TCP protocol of the pulseaudio.
Edit the configuration file. for example:
```//json//
$> sudo vim /etc/pulse/default.pa
```
Find out the following tcp configuration:
```
#load-module module-native-protocol-tcp
```
Uncomment the tcp configuration(remove "#"):
```
load-module module-native-protocol-tcp
```
Save and quit the configuration file.
(3) Restart the pulseaudio service. For example:
```
$> sudo systemctl restart pulseaudio
```
(4) Running the health-monitor service on the host pc if you don't run it.
This service is used to monitor the CCAI container.
# gRPC APIs Manual
CCAI framework not only provides FGCI APIs, but also provides many gRPC APIs. Client APPs can do inference by calling gRPC APIs.
In the .proto file the service interface, ‘Inference’, is defined, and rpc methods, ‘OCR’, ‘Classification’, ‘FaceDetection’, ‘FacialLandmark’, ‘SimulationLib’ and ‘ASR’ are defined inside the service.
## OCR method
Request:
message Input
|Field name|Type|Value|comments|
|----|----|----|-----|
|buffer|bytes||.jpg or .png image file buffer|
Response:
message, Result
|Field name|Type|Value|comments|
|----|----|----|-----|
|json|string|example:<br>[<br> {<br> "itemcoord":{<br> "x":162,<br> "y":91,<br> "width":141,<br> "height":81<br> },<br> "itemstring":"intel"<br> },<br> {<br> "itemcoord":{<br> "x":205,<br> "y":153,<br> "width":175,<br> "height":77<br> }<br> "itemstring":"inside"<br> }<br>]|the field is json format string|
## ASR method
Request:
message Input
|Field name|Type|Value|comments|
|----|----|----|-----|
|buffer|bytes||.wav file buffer|
Response:
message Result
|Field name|Type|Value|comments|
|----|----|----|-----|
|json|string|example:<br>{<br> "text":"HOW ARE YOU DOING"<br>}|the field is json format string|
## Classification method
Request:
message Input
|Field name|Type|Value|comments|
|----|----|----|-----|
|buffer|bytes||.jpg or .png image file buffer|
Response:
message, Result
|Field name|Type|Value|comments|
|----|----|----|-----|
|json|string|example:<br>[<br> {<br>"Tag_name":"sandal","tag_confidence":0.743236<br> }<br>]|the field is json format string|
## FaceDetection method
Request:
message Input
|Field name|Type|Value|comments|
|----|----|----|-----|
|buffer|bytes||.jpg or .png image file buffer|
Response:
message, Result
|Field name|Type|Value|comments|
|----|----|----|-----|
|json|string|example:<br>[<br>{"x1":611,"y1":106,"x2":827,"y2":322},<br>{"x1":37,"y1":128,"x2":298,"y2":389}<br>]|the field is json format string|
## FacialLandmark method
Request:
message Input
|Field name|Type|Value|comments|
|----|----|----|-----|
|buffer|bytes||.jpg or .png image file buffer|
Response:
message, Result
|Field name|Type|Value|comments|
|----|----|----|-----|
|json|string|example:<br>[<br>{"x":684,"y":198},<br>{"x":664,"y":195},<br> …<br>]|the field is json format string|
## SimulationLib method
Request:
message Input
|Field name|Type|Value|comments|
|----|----|----|-----|
|stage|uint32|Range of {0, 1, 2, 3}|Specify working phases: initialization or inference.|
|model |string |Example: /home/xxx/models/wsj_dnn5b.xml |Model file, including path|
|batch |uint32| Example: 8 |batch size|
|device| string |CPU or GNA_HW| Inference device: CPU or GNA?|
|scale_factor| uint32| Example: 2048 |Used for GNA HW, provided by client app.|
|speech |bytes| |Speech data buffer|
Response:
message Result
|Field name|Type|Value|comments|
|----|----|----|-----|
|json|string|Example:<br>Initialization<br>{<br>"ret":0,<br>"msg":"ok",<br>"data":{<br>"input information(name:dimension)":{ "Parameter":[8,440]<br>},<br>"output information(name:dimension)":{ "affinetransform14/Fused_Add_":[8,3425] }<br> },<br>"time":0.223394<br> }<br>Inference:<br>{<br> "ret":0,<br> "msg":"ok",<br> "data":{<br> "inference result":"-11"<br> },<br> "time":0.014026<br>}|The field is a json format string. It includes the status of initialization or inference results. For initialization, the “data” field includes input/output information, such as name, shape etcs. For inference, the “data” field includes the status code of inference.|
|rawdata| bytes| |The output data of inference. It is often the score ark data.|
# Simulation lib
## What is simulation lib
For supporting those already existing applications which are using OpenVINO C++ APIs via CCAI, we introduced a simulation library named simlib. This library provided the same APIs like OpenVINO does, but will convert those APIs calling to CCAI REST/gRPC calling internally, so that under the most cases, those applications just need replace 1 or 2 header file including from OpenVINO header file(s) to simulation lib header file, and recompile the application with linkage to simulation lib. All existing logics within those applications will not need to change.
Note: So far, only limited OpenVINO APIs were available from simulation lib, the remaining APIs support are ongoing.
## How to make those applications work with simulation lib
Remark original OV header file and include our simulation header file like:
Note: please make sure the simulation lib was installed (by ```service-runtime-simlib_xxx.deb``` package), you can find it under ```/opt/intel/service_runtime/simlib/``` in host system, and make sure the service container is running so that the simulation lib can communicate with backend services.
# Low level APIs Manual
Runtime service library provides APIs for upper layers, such as for fcgi or grpc layer etc. Runtime library supports different inference engines, such as Openvino or Pytorch. But the runtime library only provides one set of APIs to the upper layer. Upper layers select the inference engine by passing parameters to runtime APIs.
Runtime APIs are *“simple” *APIs. *“simple”* means the number of APIs is limited. Although a few APIs, you can call these APIs to do inference for many cases, such as processing image, speech, or video etc. *“simple”* also means they can be used friendly and easily. For example, if you want to do inference on an image, you can finish this work by calling only one API, *vino_ie_pipeline_infer_image()*. You need not care about how to build up inference pipelines. They are opaque to the end user. All building work is done in the Runtime library.
The runtime service library APIs are implemented by two kinds of languages, C++ and python. So it provides two types of APIs. One type is C++ APIs, it can be called by C++ programs directly. Another is python APIs, it is prepared for python programs.
![image13.png](/temp/image13.png)
**Notice:**
There are two versions of C++ API. Version 0 is described in section 10.3.1(C++ APIs for Openvino Backend Engine). It only supports Openvino as an inference engine, and doesn't support pytorch engine.
Version 1 is described in section 10.3.3(C++ APIs for Different backend Engines). It supports both Openvino and Pytorch engie. Some APIs in version 0 can be replaced by APIs in version 1.
Some C++ APIs in version 0 will be deprecated in the future. I encourage you to try to use C++ APIs in version 1 if APIs in version 0 are marked “deprecated”.
## C++ APIs for Openvino Backend Engine(Version 0)
### Return value (deprecated)
```
/**
*@brief Status code of inference
*/
#define RT_INFER_ERROR -1 //inference error
#define RT_LOCAL_INFER_OK 0 //inference successfully on local
#define RT_REMOTE_INFER_OK 1 //inference successfully on remote server
```
Some APIs have two work modes. One mode is local mode, which means doing inference on local XPU. Another is proxy mode. In proxy mode, API forwards requests to the remote server (such as QQ server or Tencent server). The remote server does inference.
In local mode, the return value is
```RT_LOCAL_INFER_OK (success)``` or ```RT_INFER_ERROR (failure)```.
*@brief This is the parameters to do inference on remote server
*/
struct serverParams {
std::string url; //the address of server
std::string urlParam; //the post parameter of request
std::string response; //the response data of server
};
```
This parameter is used by the API in proxy mode. Set server address(serverParams.url) and request(serverParams.urlParam), get server response(serverParams.response).
This API is used by users to change API behavior. Users can set API working mode (such as local mode or proxy mode), or assign inference devices (XPU) in local mode.
- 1) API
```
/**
*@brief Set parameters to configure vino ie pipeline
*@param configuration Parameters set from the end user.
*/
int vino_ie_pipeline_set_parameters(struct userCfgParams& configuration);
```
- 2) parameter
```
/**
*@brief This is the parameters setting from end user
*/
struct userCfgParams {
bool isLocalInference; //do inference in local or remote
std::string inferDevice; //inference device: CPU, GPU or other device
};
```
```isLocalInference```: true – local mode, do inference in local XPU.
False – proxy mode, do inference on remote server.
```inferDevice```: inference device in local mode, you can select: CPU, GPU, GNA_AUTO etc.
- 3) example
```struct userCfgParams cfg{true, "CPU"};```
```int res = vino_ie_pipeline_set_parameters(cfg);```
- 4) Notice
This API setting is a global setting. That means this setting affects all the following APIs behaviors.
### image API (deprecated)
This API is used to do inference on images. It is related to image processing.
- 1) API
```
/**
*@brief Do inference for image
*@param image Images input for network
*@param additionalInput Other inputs of network(except image input)
*@param xmls Path of IE model file(xml)
*@param rawDetectionResults Outputs of network, they are raw data.
*@param remoteSeverInfo parameters to do inference on remote server
|image |std::vector<std::shared_ptr\<cv::Mat>> |The input data of the image. The data format of the image is cv::Mat. The input is a batch of images. The batch is expressed by std::vector<>. The vector size is batch size. Each item in the vector is a shared pointer, std::shared_ptr\<cv::Mat>, which points to one image data in the batch.|
|additionalInput| std::vector<std::vectorfloat> |For some networks, they have more than one input. This parameter is used for other inputs except image input. The type is also std::vector <>. Vector size is the number of inputs in a network except image input. For each input, the input data type is std::vector float. |
||std::string |The IE model file, which includes the file path. The file must be xml format.|
|rawDetectionResults |std::vector\<std::vectorfloat*> | The inference results. For some networks, they have more than one output port. This parameter is defined to std::vector<>. The vector size is the number of output ports. Each item in the vector is a pointer, which points to a vector(std::vector float), this vector is the inference result of one output port.|
|remoteServerInfo| struct serverParams |Server parameter. This is used in proxy mode. Please refer to 1.2 for detailed information.|
This ASR model only supports English, not Chinese.
### common API (deprecated)
“Common” means this API is used for cases other than image and ASR. For example, the TTS case. If the input/output data of the model meet the requirements of API, then this API can be used in this case.
- 1) API
```
/**
*@brief Do inference for common models
*@param inputData Data input for network. The data type is float.
*@param additionalInput Other inputs of network(except image input)
*@param xmls Path of IE model file(xml)
*@param rawDetectionResults Outputs of network, they are raw data.
*@param remoteSeverInfo parameters to do inference on remote server
*@return Status code of inference
*/
int vino_ie_pipeline_infer_common(std::vector<std::shared_ptr<std::vector<float>>>&
|inputData |std::vector<std::shared_ptr<std::vectorfloat>>| Input data for the network. Similar to the image parameter of image API. The input data is a batch of vectors. The batch vectors are expressed by std::vector<>. The vector size is batch size. Each item of vector is a share pointer, std::shared_ptr<std::vectorfloat>, which points to one float vector.|
|additionalInput |std::vector<std::vectorfloat>| For some networks, they have more than one input. This parameter is used for other inputs except inputData pin. The type is also std::vector<>. Vector size is the number of inputs in a network except inputData input port. For each input, the input data type is std::vector float. |
|xml| std::string |The IE model file, which includes the file path. The file must be xml format.|
|rawDetectionResults |std::vector<std::vectorfloat> | The inference results. For some networks, they have more than one output port. This parameter is defined to std::vector<>. The vector size is the number of output ports. Each item in the vector is a pointer, which points to one output result(std::vector float ), this vector is the inference result of one output port.|
|remoteServerInfo| struct serverParams |Server parameter. This is used in proxy mode. Please refer to 1.2 for detailed information.|
int res = vino_ie_pipeline_infer_common(encoder_vectors, additionalInput, model_file, rawDetectionResults, urlInfo);
```
- 4) Note
### video API
This API is used to run inference for video streaming. The video API includes two APIs: one is used for initializing models, another is used for running inference.
- 1) APIs
```
/**
*@brief Initialization before video inference
*@param modelXmlFile Path of IE model file(xml)
*@param deviceName Inference on which device: CPU, GPU or others
*@return Status code of inference
*/
int vino_ie_video_infer_init(const std::string& modelXmlFile,
const std::string& deviceName);
/**
*@brief Do inference for video frame
*@param frame Image frame input for network
*@param additionalInput Other inputs of network(except image input)
*@param modelXmlFile Path of IE model file(xml)
*@param rawDetectionResults Outputs of network, they are raw data.
*@return Status code of inference
*/
int vino_ie_video_infer_frame(const cv::Mat& frame,
|modelXmlFile| std::string |The IE model file, which includes the file path. The file must be xml format.|
|deviceName| std::string| Inference device. This parameter selects inference devices, XPU(CPU, GPU, or others). |
|frame| cv::Mat |Input video frame. Only one frame data, not support batch.|
|rawDetectionResults| std::vector<std::vectorfloat> |The inference results. For some networks, they have more than one output port. This parameter is defined to std::vector<>. The vector size is the number of output ports. Each item in the vector is a pointer, which points to an output data(std::vector float ), this vector is the inference result of one output port.|
|additionalInput |std::vector<std::vectorfloat>| For some networks, they have more than one input. This parameter is used for other inputs except image input. The type is also std::vector<>. Vector size is the number of inputs in a network except the image input port. For each input, the input data type is std::vector float . |
- (1) No policy logic in this API. The setting of policy API has no impact on this API.
- (2) It has only one working mode, local mode. Doesn’t have proxy mode.
### simulative OV lib API
This API is used to communicate with simulative OV lib. Simulative OV lib is the front-end application, while the runtime service library works as a back-end service. The front-end(the simulative OV lib) calls this API to run inference on the back-end.
This case includes two APIs, one is used for initializing models, another is used for inference.
- 1) APIs
```
/**
*@brief Initialization before doing simulate IE inference
*@param modelXmlFile Path of IE model file(xml)
*@param batch Batch size used in this model
*@param deviceName Inference on which device: CPU, GPU or others
*@param gnaScaler Scale factor used in GNA devices
*@param IOInformation IO information of the model. Returned to the caller
*@return Status code of inference
*/
int vino_simulate_ie_init(const std::string& modelXmlFile,
uint32_t& batch, //default=1
const std::string& deviceName,
uint32_t gnaScaler,
struct mock_data& IOInformation);
/**
*@brief Do inference for simulate IE
*@param audioData The speech data ready to do inference
*@param modelXmlFile Path of IE model file(xml)
*@param rawScoreResults Outputs of network, they are raw data.
*@return Status code of inference. It’s the StatusCode of OpenVino, such as
* OK, GENERAL_ERROR, INFER_NOT_STARTED or other values.
*/
int vino_simulate_ie_infer(std::vector<float>& audioData,
const std::string& modelXmlFile,
std::vector<float>& rawScoreResults);
```
- 2) parameters
```
/**
*@brief Interface data structure between simulation IE lib and real IE container
* It includes input/output information about the model.
*/
using mock_InputsDataMap = std::map<std::string,std::vector<unsignedlong>>; // <name,getDims()>
using mock_OutputsDataMap = std::map<std::string,std::vector<unsignedlong>>; // <name,getDims()>
struct mock_data {
mock_InputsDataMap inputBlobsInfo;
mock_OutputsDataMap outputBlobsInfo;
uint8_t layout; // NCHW = 1, NHWC = 2 ... must do mapping between uint8_t and enum type
};
```
```Struct mock_data``` includes the IO information of the network. IO information is required by the simulative OV lib. It is filled by the runtime inference library, and returned to simulative OV lib.
```mock_InputsDataMap``` is defined as a map, ```std::map<std::string```,```std::vector<unsignedlong>>```. It’s the input information of the network. In this map, the KEY is the input name, and the VALUE is the input dimension.
```mock_OutputsDataMap``` Same as ```mock_InputsDataMap```, it is also defined as a map: ```std::map<std::string```,```std::vector<unsignedlong>>```. It’s the output information of the network.
|Parameters|Type |Comments|
|----|---|-----|
|modelXmlFile |std::string| The IE model file, which includes the file path. The file must be xml format.|
StatusCode status = vino_simulate_ie_infer(input_float, model_file, output_float);
```
- 4) notice
- (1) No policy logic in this API. The setting of policy API has no impact on this API.
- (2) It has only one working mode, local mode. Doesn’t have proxy mode.
### Load Openvino Model from Buffer API
This api is used for loading a Openvino model from a buffer. In some cases, the Openvino model isn’t a file in the disk, it is located in the memory buffer. For these cases, we need to call this api to initialize the Openvino model.
- 1) API
```
/**
*@brief Initial, load model from buffer
*@param xmls a unique string to handle the inference entity
*@param model model buffer
*@param weights weights buffer
*@param batch batch size
*@param isImgInput whether input of model is image
*@return Status code
*/
int vino_ie_pipeline_init_from_buffer(std::string xmls,
const std::string &model,
const std::string &weights,
int batch,
bool isImgInput);
```
- 2) Parameter
|Parameters|Type |Comments|
|---|----|-----|
|xmls| std::string |a unique string to represent IE model|
|model |std::string| The memory buffer which includes the IE model. |
|weights| std::string |This memory buffer which includes the weight data.|
|batch |int| The batch size.|
|isImgInput |bool |Whether the input of the model is image data.|
### 0 Video pipeline management (construct) API
This set of APIs will help developers construct their own video pipelines and manage those pipelines in their life cycle.
This function below initializes the video pipeline environment. It should be called before calling other APIs. return value, 0 means success, non-zero means failure.
- 1) API
```int ccai_stream_init();```
This function below creates a video pipeline. `pipeline_name` is a string which should be supported by a video pipeline plugin, such as “`launcher.object_detection`”. `user_data` is plugin defined, and should be supported by the plugin.
This function returns a pointer to a `ccai_stream_pipeline` or `NULL` if the pipeline cannot be created.
|user_data |void * |plugin defined, supported by the plugin|
This function below starts a video pipeline. The pipe should be returned by ‘`ccai_stream_pipeline_create`’. user_data is plugin defined, and should be supported by the plugin. return value, 0 means success, non-zero means failure.
|user_data| void * |plugin defined, supported by the plugin|
This function below stops a video pipeline. pipe should be returned by ‘`ccai_stream_pipeline_create`’. user_data is plugin defined, and should be supported by the plugin. return value, 0 means success, non-zero means failure.
|user_data| void * |plugin defined, supported by the plugin|
This function below removes a video pipeline. pipe should be returned by ‘`ccai_stream_pipeline_create`’. user_data is plugin defined, and should be supported by the plugin. return value, 0 means success, non-zero means failure.
|user_data |void *| plugin defined, supported by the plugin|
### Live ASR API
ASR means Automatic Speech Recognition, speech-to-text. This API is implemented based on some Intel speech libraries. This API is similar to the ASR API(10.4.1.5). The difference is that this API does continuous inference and outputs the text while the previous ASR API only does one time inference.
- 1) API
```
/**
*@brief Continuously do inference for speech (ASR). Using intel speech libraries.
*@param mode Working status of ASR. Start/inference/stop
*@param samples Speech data buffer.
*@param sampleLength Buffer size of speech data
*@param bytesPerSample Size for each speech sample data (how many bytes for each sample)
*@param rh_utterance_transcription Text result of speech. (ASR result)
*@param config_path The file path for configuration file.
int res = vino_ie_pipeline_live_asr(1, samples, sampleLength, bytesPerSample, config_filename, "CPU", rh_utterance_transcription);
// do inference (mode==2)
res = vino_ie_pipeline_live_asr(2, samples, sampleLength, bytesPerSample, config_filename, "CPU", rh_utterance_transcription);
// stopping live asr mode(mode==0)
res = vino_ie_pipeline_live_asr(0, samples, sampleLength, bytesPerSample, config_filename, "CPU", rh_utterance_transcription);
```
- 4) Notice
This ASR model only supports English, not Chinese.
## Python API
Runtime service library also provides some python APIs to upper layers. These python APIs can be called by python APPs directly.
Python API provides the same functions as C++ API. So they are mapped one-on-one to the C++ APIs.
Python is a different language from C++, so the data structures used in python APIs are also different from C++ data structures. The following table lists the mapping of data structures between two languages.
|image| List[List[int]] |The image data. The data format of the image is list[int], and each image is expressed in one list[]. The outer list[] means batch images. The outer list length is batch size. The inner list[] is the image data.|
|Image_channel |int| This parameter defines the channels of the input image. For example: 3 – rgb, 1 – h |
|AdditionalInput| vectorVecFloat| Other inputs except image input. The meaning is the same as C++ API|
|Xmls| str |IE model file. The meaning is the same as C++ API.|
|rawDetectionResults| vectorVecFloat |The inference results. The meaning is the same as C++ API.|
|remoteServerInfo| serverParams| Server parameter. This is used in proxy mode. The meaning is the same as C++ API. |
|samples| List[int]| speech data, which format is PCM data. Each PCM sample is one short int data. |
|bytesPerSample |int | the bytes number for each speech sample data. For PCM data, the value should be 2, which means each sample data includes two bytes.|
|config_path |str |The configuration file for the ASR model. This configuration file is used by intel speech libraries|
|rh_utterance_transcription| vectorChar |the inference result for speech data. The data format is char.|
|remoteServerInfo| serverParams |Server parameter. This is used in proxy mode. The meaning is the same as C++ API.|
|mode|int |The working mode of the ASR process. <br>0 - stop to do inference<br>1 - start to do inference<br>2 - do inference|
|samples |List[int]| speech data, which format is PCM data. Each PCM sample is one short int data. |
|bytesPerSample| int | the bytes number for each speech sample data. For PCM data, the value should be 2, which means each sample data includes two bytes.|
|config_path |str |The configuration file for the ASR model. This configuration file is used by intel speech libraries|
|rh_utterance_transcription |vectorChar| the inference result for speech data. The data format is char.|
|device |std::string| The inference device: CPU or GNA|
res = rt_api.live_asr(mode, speech, sampwidth, model_xml, device, utt_res)
mode = 2 ## doing inference
res = rt_api.live_asr(mode, speech, sampwidth, model_xml, device, utt_res)
mode = 0 ## stopping inference
res = rt_api.live_asr(mode, speech, sampwidth, model_xml, device, utt_res)
```
- 4) Notice
The usage of this API is the same as C++ Live ASR API.
## C++ APIs for Different backend Engines (Version 1)
This set of C++ APIs(version 1) are the superset of the set of C++ APIs for Openvino backend engines (version 0). The difference between two versions is that version 1 supports different inference engines, such as Openvino, Pytorch, Onnx, and Tensorflow. You can use APIs in version 1 to do the same things as APIs in version0.
C++ APIs of version 1 are “standard” c++ APIs. In the future, some of the APIs in version 0 will be obselete. I encourage you to try to use C++ APIs in version 1.
### Return Value
```
/**
*@enum irtStatusCode
*@brief Status code of running inference
*/
enum irtStatusCode : int {
RT_REMOTE_INFER_OK = 1, //inference successfully on remote server
RT_LOCAL_INFER_OK = 0, //inference successfully on local HW
RT_INFER_ERROR = -1 //inference error
};
```
Some APIs have two working modes. One mode is local mode, which means doing inference on local XPU. Another is proxy mode. In proxy mode, API forwards requests to the remote server (such as QQ server or Tecent server). The remote server runs inference, and returns the result.
In local mode, the return value is
```RT_LOCAL_INFER_OK (success) or RT_INFER_ERROR (failure).```
In proxy mode, the return value is
```RT_REMOTE_INFER_OK (success) or RT_INFER_ERROR(failure)```
### Inference Engines
Currently, runtime libraries support four inference engines, they are Openvino, Pytorch, Onnx and Tensorflow..
```
/**
*@enum irtInferenceEngine
*@brief Inference engines supported in inference runtime library.
* Currently inference engines are Openvino and Pytorch.
*/
enum irtInferenceEngine : uint8_t {
OPENVINO = 0,
PYTORCH,
ONNXRT,
TENSORFLOW
};
```
### Image API
The usage of this API is the same as image API in version 0.
- 1) API
```
/**
*@brief Run inference from image
*@param tensorData Buffers for input/output tensors
*@param modelFile The model file, include path
*@param backendEngine Specify the inference engine, OpenVINO or pytorch.
*@param remoteSeverInfo Parameters to do inference on remote server
if ((desc = g_try_new0(struct ccai_stream_pipeline_desc, 1)) == NULL)
return -1;
desc->name = pipe_name;
desc->create = create_pipe;
desc->start = start_pipe;
desc->stop = stop_pipe;
desc->remove = remove_pipe;
desc->get_gst_pipeline = NULL;
desc->private_data = NULL;
ccai_stream_add_pipeline(desc);
return 0;
}
static void sample_plugin_exit()
{
}
/* 1. define a plugin */
CCAI_STREAM_PLUGIN_DEFINE(sample, "1.0",
CCAI_STREAM_PLUGIN_PRIORITY_DEFAULT,
sample_plugin_init, sample_plugin_exit)
```
In the source code, you must call or implement the following functions:
1. CCAI_STREAM_PLUGIN_DEFINE
Call this function to define a plugin, the video pipeline manager will load this plugin according to the information provided by this definition.
2. Implement init/exit function
The video pipeline manager will call init when the plugin is loaded, and call exit when the plugin is unloaded.
3. Call ccai_stream_add_pipeline in the init function, ccai_stream_add_pipeline will register the pipeline supported by the plugin to the video pipeline manager.
4. 4.Implement create/start/stop/remove function. When a client requests to start or stop a pipeline, the video pipeline manager will call those functions.