fix api and build docs

This commit is contained in:
bookug 2016-03-04 17:20:19 +08:00
parent 56ec81216a
commit e8d79c6ced
16 changed files with 912 additions and 3 deletions

3
.gitignore vendored
View File

@ -41,9 +41,6 @@
!/Database/
!/Database/**/
!/Join/
!/Join/**/
!/KVstore/
!/KVstore/**/

204
docs/API.md Normal file
View File

@ -0,0 +1,204 @@
**This Chapter guides you to use our API for accessing gStore.**
## Easy Examples
We provide JAVA and C++ API for gStore now. Please refer to example codes in `api/cpp/example` and `api/java/example`. To use the two examples to have a try, please ensure that executables have already been generated. Otherwise, just type `make` in the root directory of gStore to compile the codes, as well as API.
Next, **start up a gStore server by using `./gserver` command.** It is ok if you know a running usable gStore server and try to connect to it, but notice that **the server ip and port of server and client must be matched.**(you don't need to change any thing if using examples, just by default) Then, you need to compile the example codes in the directory gStore/api/. We provide a utility to do this, and you just need to type `make api_example` in the root directory of gStore. Or you can compile the codes by yourself, in this case please go to gStore/api/cpp/example/ and gStore/api/java/example/, respectively.
Finally, go to the example directory and run the corresponding executables. For C++, just use `./example` command to run it. And for Java, use `make run` command or `java -cp ../lib/GstoreJavaAPI.jar:. JavaAPIExample` to run it. Both the two executables will connect to a specified gStore server and do some load or query operations. Be sure that you see the query results in the terminal where you run the examples, otherwise please go to [Frequently Asked Questions](FAQ.md) for help or report it to us.(the report approach is described in [README](../README.md))
You are advised to read the example code carefully, as well as the corresponding Makefile. This will help you to understand the API, specially if you want to write your own programs based on the API interface.
- - -
## API structure
The API of gStore is placed in api/ directory in the root directory of gStore, whose contents are listed below:
- gStore/api/
- cpp/ (the C++ API)
- src/ (source code of C++ API, used to build the lib/libgstoreconnector.a)
- GstoreConnector.cpp (interfaces to interact with gStore server)
- GstoreConnector.h
- Makefile (compile and build lib)
- lib/ (where the static lib lies in)
- readme.txt
- libgstoreconnector.a (only exist after compiled, you need to link this lib when you use the C++ API)
- example/ (small example program to show the basic idea of using the C++ API)
- CppAPIExample.cpp
- Makefile
- java/ (the Java API)
- src/ (source code of Java API, used to build the lib/GstoreJavaAPI.jar)
- jgsc/ (the package which you need to import when you use the Java API)
- GstoreConnector.java (interfaces to interact with gStore server)
- Makefile (compile and build lib)
- lib/
- readme.txt
- GstoreJavaAPI.jar (only exist after compiled, you need to include this JAR in your class path)
- example/ (small example program to show the basic idea of using the Java API)
- JavaAPIExample.cpp
- Makefile
- - -
## C++ API
#### Interface
To use the C++ API, please place the phrase `#include "GstoreConnector.h"` in your cpp code. Functions in GstoreConnector.h should be called like below:
```
// initialize the Gstore server's IP address and port.
GstoreConnector gc("127.0.0.1", 3305);
// build a new database by a RDF file.
// note that the relative path is related to gserver.
gc.build("LUBM10.db", "example/LUBM_10.n3");
// then you can execute SPARQL query on this database.
std::string sparql = "select ?x where \
{ \
?x <rdf:type> <ub:UndergraduateStudent>. \
?y <ub:name> <Course1>. \
?x <ub:takesCourse> ?y. \
?z <ub:teacherOf> ?y. \
?z <ub:name> <FullProfessor1>. \
?z <ub:worksFor> ?w. \
?w <ub:name> <Department0>. \
}";
std::string answer = gc.query(sparql);
// unload this database.
gc.unload("LUBM10.db");
// also, you can load some exist database directly and then query.
gc.load("LUBM10.db");
// query a SPARQL in current database
answer = gc.query(sparql);
```
The original declaration of these functions are as below:
```
GstoreConnector();
GstoreConnector(string _ip, unsigned short _port);
GstoreConnector(unsigned short _port);
bool load(string _db_name);
bool unload(string _db_name);
bool build(string _db_name, string _rdf_file_path);
string query(string _sparql);
```
Notice:
1. When using GstoreConnector(), the default value for ip and port is 127.0.0.1 and 3305, respectively.
2. When using build(), the rdf_file_path(the second parameter) should be related to the position where gserver lies in.
3. Please remember to unload the database you have loaded, otherwise things may go wrong.(the errors may not be reported!)
#### Compile
You are advised to see gStore/api/cpp/example/Makefile for instructions on how to compile your code with the C++ API. Generally, what you must do is compile your own code to object with header in the C++ API, and link the object with static lib in the C++ API.
Let us assume that your source code is placed in test.cpp, whose position is ${TEST}, while the gStore project position is ${GSTORE}/gStore.(if using devGstore as name instead of gStore, then the path is ${GSTORE}/devGstore) Please go to the ${TEST} directory first:
> Use `g++ -c -I${GSTORE}/gStore/api/cpp/src/ test.cpp -o test.o` to compile your test.cpp into test.o, relative API header is placed in api/cpp/src/.
> Use `g++ -o test test.o -L${GSTORE}/gStore/api/cpp/lib/ -lgstoreconnector` to link your test.o with the libgstoreconnector.a(a static lib) in api/cpp/lib/.
Then you can type `./test` to execute your own program, which uses our C++ API. It is also advised for you to place relative compile commands in a Makefile, as well as other commands if you like.
- - -
## Java API
#### Interface
To use the Java API, please place the phrase `import jgsc.GstoreConnector;` in your java code. Functions in GstoreConnector.java should be called like below:
```
// initialize the Gstore server's IP address and port.
GstoreConnector gc = new GstoreConnector("127.0.0.1", 3305);
// build a new database by a RDF file.
// note that the relative path is related to gserver.
gc.build("LUBM10.db", "example/LUBM_10.n3");
// then you can execute SPARQL query on this database.
String sparql = "select ?x where "
+ "{"
+ "?x <rdf:type> <ub:UndergraduateStudent>. "
+ "?y <ub:name> <Course1>. "
+ "?x <ub:takesCourse> ?y. "
+ "?z <ub:teacherOf> ?y. "
+ "?z <ub:name> <FullProfessor1>. "
+ "?z <ub:worksFor> ?w. "
+ "?w <ub:name> <Department0>. "
+ "}";
String answer = gc.query(sparql);
// unload this database.
gc.unload("LUBM10.db");
// also, you can load some exist database directly and then query.
gc.load("LUBM10.db");
// query a SPARQL in current database
answer = gc.query(sparql);
```
The original declaration of these functions are as below:
```
GstoreConnector();
GstoreConnector(string _ip, unsigned short _port);
GstoreConnector(unsigned short _port);
bool load(string _db_name);
bool unload(string _db_name);
bool build(string _db_name, string _rdf_file_path);
string query(string _sparql);
```
Notice:
1. When using GstoreConnector(), the default value for ip and port is 127.0.0.1 and 3305, respectively.
2. When using build(), the rdf_file_path(the second parameter) should be related to the position where gserver lies in.
3. Please remember to unload the database you have loaded, otherwise things may go wrong.(the errors may not be reported!)
#### Compile
You are advised to see gStore/api/java/example/Makefile for instructions on how to compile your code with the Java API. Generally, what you must do is compile your own code to object with jar file in the Java API.
Let us assume that your source code is placed in test.java, whose position is ${TEST}, while the gStore project position is ${GSTORE}/gStore.(if using devGstore as name instead of gStore, then the path is ${GSTORE}/devGstore) Please go to the ${TEST} directory first:
> Use `javac -cp ${GSTORE}/gStore/api/java/lib/GstoreJavaAPI.jar test.java` to compile your test.java into test.class with the GstoreJavaAPI.jar(a jar package used in Java) in api/java/lib/.
Then you can type `java -cp ${GSTORE}/gStore/api/java/lib/GstoreJavaAPI.jar:. test` to execute your own program(notice that the ":." in command cannot be neglected), which uses our Java API. It is also advised for you to place relative compile commands in a Makefile, as well as other commands if you like.

57
docs/CHANGELOG.md Normal file
View File

@ -0,0 +1,57 @@
## Feb 28, 2016
We finish all documents for gStore, which makes it easy for you to use and contribute. There seems to be some problems with the api and server/client before, but we fix these bugs now.
## Nov 06, 2015
We merge several classes(like Bstr) and adjust the project structure, as well as the debug system.
In addition, most warnings are removed, except for warnings in Parser module, which is due to the use of ANTLR.
What is more, we change RangeValue module to Stream, and add Stream for ResultSet. We also better the gquery console, so now you can redirect query results to a specified file in the gsql console.
Unable to add Stream for IDlist due to complex operations, but this is not necessary. Realpath is used to supported soft links in the gquery console, but it not works in Gstore.(though works if not in Gstore)
- - -
## Oct 20, 2015
We add a gtest tool for utility, you can use it to query several datasets with their own queries.
In addition, gquery console is improved. Readline lib is used for input instead of fgets, and the gquery console can support commands history, modifying command and commands completion now.
What is more, we found and fix a bug in Database/(a pointer for debugging log is not set to NULL after fclose operation, so if you close one database and open another, the system will fail entirely because the system think that the debugging log is still open)
- - -
## Sep 25, 2015
We implement the version of B+Tree, and replace the old one.
After testing on DBpedia, LUBM, and WatDiv benchmark, we conclude that the new BTree performs more efficient than
the old version. For the same triple file, the new version spends shorter time on executing gload command.
Besides, the new version can handle the long literal objects efficiently, while triples whose object's length exceeds 4096 bytes result in frequent inefficient split operations on the old version BTree.
- - -
## Feb 2, 2015
We modify the RDF parser and SPARQL parser.
Under the new RDF parser, we also redesign the encode strategy, which reduces RDF file scanning times.
Now we can parse the standard SPARQL v1.1 grammar correctly, and can support basic graph pattern(BGP) SPARQL queries written by this standard grammar.
- - -
## Dec 11, 2014
We add API for C/CPP and JAVA.
- - -
## Nov 20, 2014
We share our gStore2.0 code as an open-source project under BSD license on github.

17
docs/DEMAND.md Normal file
View File

@ -0,0 +1,17 @@
*We have tested on linux server with CentOS 6.2 x86_64 and CentOS 6.6 x86_64. The version of GCC should be 4.4.7 or later.*
Item | Requirement
:-- | :--
operation system | Linux, such as CentOS, Ubuntu and so on
architecture | x86_64
disk size | according to size of dataset
memory size | according to size of dataset
glibc | version >= 2.14
gcc | version >= 4.4.7
g++ | version >= 4.4.7
make | need to be installed
readline | need to be installed
readline-devel | need to be installed
openjdk | needed if using Java api
openjdk-devel | needed if using Java api

14
docs/ESSAY.md Normal file
View File

@ -0,0 +1,14 @@
#### Essays and publications related with gStore are listed here:
- Lei Zou, M. Tamer Özsu,Lei Chen, Xuchuan Shen, Ruizhe Huang, Dongyan Zhao, [gStore: A Graph-based SPARQL Query Engine](http://www.icst.pku.edu.cn/intro/leizou/projects/papers/gStoreVLDBJ.pdf), VLDB Journal , 23(4): 565-590, 2014.
- Lei Zou, Jinghui Mo, Lei Chen,M. Tamer Özsu, Dongyan Zhao, [gStore: Answering SPARQL Queries Via Subgraph Matching](http://www.icst.pku.edu.cn/intro/leizou/projects/papers/p482-zou.pdf), Proc. VLDB 4(8): 482-493, 2011.
- Xuchuan Shen, Lei Zou, M. Tamer Özsu, Lei Chen, Youhuan Li, Shuo Han, Dongyan Zhao, [A Graph-based RDF Triple Store](http://www.icst.pku.edu.cn/intro/leizou/projects/papers/demo.pdf), in Proc. 31st International Conference on Data Engineering (ICDE), 2015; To appear (demo).
- Dong Wang, Lei Zou, Yansong Feng, Xuchuan Shen, Jilei Tian, and Dongyan Zhao, [S-store: An Engine for Large RDF Graph Integrating Spatial Information](http://www.icst.pku.edu.cn/intro/leizou/projects/papers/Store.pdf), in Proc. 18th International Conference on Database Systems for Advanced Applications (DASFAA), pages 31-47, 2013.
- Dong Wang, Lei Zou and Dongyan Zhao, [gst-Store: An Engine for Large RDF Graph Integrating Spatiotemporal Information](http://www.icst.pku.edu.cn/intro/leizou/projects/papers/edbtdemo2014.pdf), in Proc. 17th International Conference on Extending Database Technology (EDBT), pages 652-655, 2014 (demo).
- Lei Zou, Yueguo Chen, [A Survey of Large-Scale RDF Data Management](http://www.icst.pku.edu.cn/intro/leizou/documentation/pdf/2012CCCF.pdf), Comunications of CCCF Vol.8(11): 32-43, 2012 (Invited Paper, in Chinese).

58
docs/FAQ.md Normal file
View File

@ -0,0 +1,58 @@
#### Why does gStore report errors that the format of some RDF datasets are not supported?
gStore does not support all RDF formats currently, please see [formats](../test/format_question.txt) for details.
- - -
#### When I read on GitHub, why are some documents unable to be opened?
Codes, markdowns or other text files, and pictures can be read directly on GitHub. However, if you are using some light weight browsers like midori, for files in pdf type, please download them and read on your computer or other devices.
- - -
#### Why sometimes strange characters appear when I use gStore?
There are some documents's names are in Chinese, and you don't need to worry about it.
- - -
#### What is the .gitattributes file in this project?
We use [git-lfs](https://github.com/github/git-lfs) in our system, and the .gitattributes file is used to record the file types to be tracked by git-lfs. You are advised to use git-lfs also if you want to join us.(git-lfs is used to track datasets, pdf files and pictures here)
- - -
#### In centos7, if the watdiv.db(a generated database after gload) is copied or compressed/uncompressed, the size of watdiv.db will be different(generally increasing) if using `du -h` command to check?
It's the change of B+-trees' size in watdiv/kv_store/ that causes the change of the whole database's size. The reason is that in storage/Storage.cpp, many operations use fseek to move file pointer. As everyone knows, file is organized in blocks, and if we request for new block, file pointer may be moved beyond the end of this file(file operations are all achieved by C in gStore, no errors are reported), then contents will be written in the new position!
In **Advanced Programming In The Unix Environment**, "file hole" is used to describe this phenomenon. "file hole" will be filled with 0, and it's also one part of the file. You can use `ls -l` to see the size of file(computing the size of holes), while `du -h` command shows the size of blocks that directory/file occupies in system. Generally, the output of `du -h` is large than that of `ls -l`, but if "file hole" exists, the opposite is the case because the size of holes are neglected.
The actual size of files containing holes are fixed, while in some operation systems, holes will be transformed to contents(also 0) when copied. Operation `mv` will not affect the size if not across different devices.(only need to adjust the file tree index) However, `cp` and all kinds of compress methods need to scan the file and transfer data.(there are two ways to achieve `cp` command, neglect holes or not, while the output size of `ls -l` not varies)
It is valid to use "file hole" in C, and this is not an error, which means you can go on using gStore. We achieve a small [program](../test/hole.c) to describe the "file holes", you can download and try it yourself.
- - -
#### In gclient console, a database is built, queried, and then I quit the console. Next time I enter the console, load the originally imported database, but no output for any queries(originally the output is not empty)?
You need to unload the using database before quiting the gclient console, otherwise errors come.
- - -
#### If query results contain null value, how can I use the [full_test](../test/full_test.sh) utility? Tab separated method will cause problem here because null value cannot be checked!
You may use other programming language(for example, Python) to deal with the null value cases. For example, you can change null value in output to special character like ',', later you can use the [full_test](../test/full_test.sh) utility.
- - -
#### When I compile and run the API examples, it reports the "unable to connect to server" error?
Please use `./gserver` command to start up a gStore server first, and notice that the server ip and port must be matched.
- - -
#### When I use the Java API to write my own program, it reports "not found main class" error?
Please ensure that you include the position of your own program in class path of java. The whole command should be something like `java -cp /home/bookug/project/devGstore/api/java/lib/GstoreJavaAPI.jar:. JavaAPIExample`, and the ":." in this command cannot be neglected.

14
docs/INSTALL.md Normal file
View File

@ -0,0 +1,14 @@
gStore is a green software, and you just need to compile it with one command. Please run
`# make`
in the gStore root directory to compile the gStore code, link the ANTLR lib, and build executable "gload", "gquery", "gserver", "gclient". What is more, the api of gStore is also built now.
If you want to use API examples of gStore, please run `make api_example` to compile example codes for both C++ API and Java API. For details of API, please visit [API](API.md) chapter.
Use `make clean` command to clean all objects, executables, and use `make dist` command to clean all objects, executables, libs, datasets, databases, debug logs, temp/text files in the gStore root directory.
You are free to modify the source code of gStore and create your own project while respecting our work, and type `make tarball` command to compress all useful files into a .tar.gz file, which is easy to carry.
Type `make gtest` to compile the gtest program if you want to use this test utility. You can see the [HOW TO USE](USAGE.md) for details of gtest program.

38
docs/INTRO.md Normal file
View File

@ -0,0 +1,38 @@
**The first essay to come up with Gstore System is [gStore_VLDBJ](pdf/gStoreVLDBJ.pdf), and You can find related essays and publications in [Related Essays](ESSAY.md).**
## What Is gStore
gStore is a graph-based RDF data management system(or what is commonly called a "triple store") that maintains the graph structure of the original [RDF](http://www.w3.org/TR/rdf11-concepts/) data. Its data model is a labeled, directed multi edge graph, where each vertex corresponds to a subject or an object.
We represent a given [SPARQL](http://www.w3.org/TR/sparql11-overview/) query by a query graph Q. Query processing involves finding subgraph matches of Q over the RDF graph G, instead of joining tables in relational data management system. gStore incorporates an index over the RDF graph (called VS-tree) to speed up query processing. VS-tree is a height balanced tree with a number of associated pruning techniques to speed up subgraph matching.
**The gStore project is supported by the National Science Foundation of China (NSFC), Natural Sciences and Engineering Research Council (NSERC) of Canada, and Hong Kong RGC.**
- - -
## What Is New In gStore
There are three important features in gStore:
- gStore manages RDF repository from a graph database perspective.
- gStore supports both query and update efficiently.
- gStore can handle, in a uniform manner, different data types (strings and numerical data) and SPARQL queries with wild cards, aggregate, range operators(only theoretically, not achieved so far)
- - -
## Why gStore
After a series of test, we analyse and keep the result in [Test Results](TEST.md). gStore runs faster to answer complicated queries(for example, contain circles) than other database systems. For simple queries, both gStore and other database systems work well.
In addition, now is the big data era and more and more structured data is coming, while the original relational database systems(or database systems based on relational tables) cannot deal with them efficiently. In contrast, gStore can utilize the features of data structures, and improve the performance.
What is more, gStore is a high-extensible project. Many new ideas of graph database have be proposed, and most of them can be used in gStore. For example, some members of our group are designing a distributed gstore system.
- - -
## Open Source
The gStore source code is available as open-source code under the BSD license. You are welcome to use gStore, report bugs or suggestions, or join us to make gStore better. It is also ok for you to build all kinds of applications based on gStore, while respecting our work.

10
docs/LIMIT.md Normal file
View File

@ -0,0 +1,10 @@
1. gserver not robust enough, very easy to break out(need to deal with all kinds of exceptions).
2. queries related with uncertain predicates are not supported.
3. only `select` queries can be used now. All aggregate queries are not supported, as well as insert/modify/remove operations.
4. only support RDF datasets in N-Triples format
5. the cost of disk and memory is still very large

29
docs/MAIL.md Normal file
View File

@ -0,0 +1,29 @@
# People
**Li Zeng and Jiaqi Chen are responsible for the gStore now.**
## Faculty
- Lei Zou (Peking University) email:zoulei@pku.edu.cn
- M. Tamer Özsu (University of Waterloo)
- Lei Chen (Hong Kong University of Science and Technology)
- Dongyan Zhao (Peking Univeristy) email:zhaodongyan@pku.edu.cn
- - -
## Students
- Youhuan Li (Peking University) (PhD student) email:liyouhuan@pku.edu.cn
- Shuo Han (Peking University) (PhD student) email:hanshuo@pku.edu.cn
- Xuchuan Shen (Peking University) (Master's student, graduated) email:shenxuchuan@pku.edu.cn
- Dong Wang (Peking University) (PhD student, graduated) email:wangdong@pku.edu.cn
- - -
## Alumni
- Ruizhe Huang (Peking University) (Undergraudate intern, graduated)
- Jinhui Mo (Peking University) (Master's, graduated)
- Li Zeng (Peking University) (Undergraudate intern) email:zengli-syzz@pku.edu.cn
- Jiaqi Chen (Peking University) (Undergraudate intern) email:chenjiaqi93@163.com

34
docs/PLAN.md Normal file
View File

@ -0,0 +1,34 @@
## Improve The Core
- optimize the join operation of node candidates. multiple methods should be achieved, and design a score module to select a best one
- add numeric value query function. need to answer numeric range query efficiently and space consume cannot be too large
- add a control module to heuristically select an kind of index for a SPARQL query to filter(not always vstree)
- typedef all frequently used types, to avoid inconsistence and high modify cost
- - -
## Better The Interface
- build a console named gconsole, which provides all operations supported by gStore.(parser and auto-complete is required)
- write web interface for gStore, and a web page to operate on it, just like virtuoso
- - -
## Idea Collection Box
- to support soft links in console: realpath not work...(redefined in ANTLR?)
- store command history for consoles
- warnings remain in using Parser/(antlr)!(modify sparql.g 1.1 and regenerate). change name to avoid redefine problem, or go to use executable to parse
- build compress module(such as key-value module and stream module), but the latter just needs one-pass read/write, which may causes the compress method to be used both in disk and memory. all operations of string in memory can be changed to operations after compress: provide compress/archive interface, compare function. there are many compress algorithms to be chosen, then how to choose? what about utf-8 encoding problem? this method can lower the consume of memory and disk, but consumes more CPU. However, the time is decided by isomorphism. Simple compress is not good, but too complicated method will consume too much time, how to balance? (merge the continuous same characters, Huffman tree)
- mmap to speedup KVstore?
- the strategy for Stream:is 85% valid? consider sampling, analyse the size of result set and decide strategy? how to support order by: sort in memory if not put in file; otherwise, partial sort in memory, then put into file, then proceed external sorting

244
docs/STRUCT.md Normal file
View File

@ -0,0 +1,244 @@
**This chapter introduce the whole structure of the gStore system project.**
#### The core source codes are listed below:
- Bstr/ (represent strings of arbitrary length)
- Bstr.cpp (achieve functions)
- Bstr.h (class, members and functions definitions)
- Database/ (calling other core parts to deal with requests from interface part)
- Database.cpp (achieve functions)
- Database.h (class, members and functions definitions)
- Join/ (join the node candidates to get results)
- Join.cpp (achieve functions)
- Join.h (class, members,, and functions definitions)
- KVstore/ (a key-value store to swap between memory and disk)
- KVstore.cpp (interact with upper layers)
- KVstore.h
- heap/ (a heap of nodes whose content are in memory)
- Heap.cpp
- Heap.h
- node/ (all kinds of nodes in B+-tree)
- Node.cpp (the base class of IntlNode and LeafNode)
- Node.h
- IntlNode.cpp (internal nodes in B+-tree)
- IntlNode.h
- LeafNode.cpp (leaf nodes in B+-tree)
- LeafNode.h
- storage/ (swap contents between memory and disk)
- file.h
- Storage.cpp
- Storage.h
- tree/ (implement all tree operations and interfaces)
- Tree.cpp
- Tree.h
- Query/ (needed to answer SPARQL query)
- BasicQuery.cpp (basic type of queries without aggregate operations)
- BasicQuery.h
- IDList.cpp (candidate list of a node/variable in query)
- IDList.h
- ResultSet.cpp (keep the result set corresponding to a query)
- ResultSet.h
- SPARQLquery.cpp (deal with a entire SPARQL query)
- SPARQLquery.h
- Triple/ (deal with triples, a triple can be divided as subject(entity), predicate(entity), object(entity or literal))
- Triple.cpp
- Triple.h
- Signature/ (assign signatures for nodes and edges, but not for literals)
- SigEntry.cpp
- SigEntry.h
- Signature.cpp
- Signature.h
- VSTree/ (an tree index to prune more efficiently)
- EntryBuffer.cpp
- EntryBuffer.h
- LRUCache.cpp
- LRUCache.h
- VNode.cpp
- VNode.h
- VSTree.cpp
- VSTree.h
- - -
#### The parser part is listed below:
- Parser/
- DBParser.cpp
- DBParser.h
- RDFParser.cpp
- RDFParser.h
- SparqlParser.c
- SparqlParser.h
- SparqlLexer.c
- SparqlLexer.h
- TurtleParser.cpp
- TurtleParser.h
- Type.h
- - -
#### The utilities are listed below:
- Util/
- Util.cpp (headers, macros, typedefs, functions...)
- Util.h
- Stream.cpp (store and use temp results, which may be very large)
- Stream.h
- - -
#### The interface part is listed below:
- Server/ (client and server mode to use gStore)
- Client.cpp
- Client.h
- Operation.cpp
- Operation.h
- Server.cpp
- Server.h
- Socket.cpp
- Socket.h
- Main/ (a series of applications/main-program to operate on gStore)
- gload.cpp (import a RDF dataset)
- gquery.cpp (query a database)
- gserver.cpp (start up the gStore server)
- gclient.cpp (connect to a gStore server and interact)
- - -
#### More details
To acquire a deep understanding of gStore codes, please go to [Code Detail](pdf/代码目录及概览.pdf). See [use case](pdf/Gstore2.0_useCaseDoc.pdf) to understand the design of use cases, and see [OOA](pdf/OOA_class.pdf) and [OOD](pdf/OOD_class.pdf) for OOA design and OOD design, respectively.
If you want to know the sequence of a running gStore, please view the list below:
- [connect to server](jpg/A01-连接Server.jpg)
- [disconnect server](jpg/A02-断开与Server的连接.jpg)
- [load database](jpg/A03-加载数据库实例.jpg)
- [unload database](jpg/A04-卸载数据库实例.jpg)
- [create database](jpg/A05-创建数据库实例.jpg)
- [delete database](jpg/A06-删除数据库实例.jpg)
- [connect to database](jpg/A07-连接数据库实例.jpg)
- [disconnect database](jpg/A08-断开与数据库实例的连接.jpg)
- [show databases](jpg/A09-查看数据库实例列表.jpg)
- [SPARQL query](jpg/A10-查询SPARQL.jpg)
- [import RDF dataset](jpg/A11-导入RDF数据集.jpg)
- [insert a triple](jpg/A12-插入一条RDF三元组数据.jpg)
- [delete a triple](jpg/A13-删除一条RDF三元组数据.jpg)
- [create account](jpg/B01-创建账户.jpg)
- [delete account](jpg/B02-删除账户.jpg)
- [modify account authority](jpg/B03-修改账户权限.jpg)
- [compulsively unload database](jpg/B04-强制卸载数据库实例.jpg)
- [see account authority](jpg/B05-查看账户权限信息.jpg)
It is really not strange to see something different with the original design in the source code. And some designed functions may have not be achieved so far.
- - -
#### Others
The api/ folder in gStore is used to store API program, libs and examples, please go to [API](API.md) for details. And test/ is used to store a series test programs or utilities, such as gtest, full_test and so on. Chapters related with test/ are [How To Use](USAGE.md) and [Test Result](TEST.md). This project need an ANTLR lib to parse the SPARQL query, whose code is placed in tools/(also archived here) and the compiled libantlr.a is placed in lib/ directory.
We place some datasets and queries in data/ directory as examples, and you can try them to see how gStore works. Related instructions are in [How To Use](USAGE.md). The docs/ directory contains all kinds of documents of gStore, including a series of markdown files and two folders, pdf/ and jpg/. Files whose type is pdf are placed in pdf/ folder, while files with jpg type are placed in jpg/ folder.
You are advised to start from the [README](../README.md) in the gStore root directory, and visit other chapters only when needed. At last, you will see all documents from link to link if you are really interested in gStore.

20
docs/TEST.md Normal file
View File

@ -0,0 +1,20 @@
## Preparation
We have compared the performance of gStore with several other database systems, such as [Jena](http://jena.apache.org/), [Sesame](http://www.rdf4j.org/), [Virtuoso](http://virtuoso.openlinksw.com/) and so on. Contents to be compared are the time to build database, the size of the built database, the time to answer single SPARQL query and the matching case of single query's results. In addition, if the memory cost is very large(>20G), we will record the memory cost when running these database systems.(not accurate, just for your reference)
To ensure all database systems can run correctly on all datasets and queries, the format of datasets must be supported by all database systems and the queries should not contain update operations, aggregate operations and operations related with uncertain predicates. Notice that when measuring the time to answer queries, the time of loading database index should not be included. To ensure this principle, we load the database index first for some database systems, and warm up several times for others.
Datasets used here are WatDiv, Lubm, Bsbm and DBpedia. Some of them are provided by websites, and others are generated by algorithms. Queries are generated by algorithms or written by us.
The experiment environment is a CentOS server, whose memory size is 82G and disk size is 7T. We use [full_test](../test/full_test.sh) to do this test.
## Result
This program produces many logs placed in result.log/, load.log/ and time.log/. You can see that all results of all queries are matched by viewing files in result.log/, and the time cost and space cost of gStore to build database are larger than others by viewing files in load.log/. More precisely, there is an order of magnitude difference between gStore and others in the time/space cost of building database.
Through analysing time.log/, we can find that gStore behave better than others on very complicated queries(many variables, circles, etc). For other simple queries, there is not much difference between the time of these database systems.
Generally speaking, the memory cost of gStore when answering queries is higher than others. More complicated the query is and more large the dataset is, more apparent the phenomenon is.
You can find more detailed information in [test report](pdf/gstore测试报告.pdf). Notice that some questions in the test report have already be solved now.

4
docs/THANK.md Normal file
View File

@ -0,0 +1,4 @@
**This chapter lists people who inspire us or contribute to this project.**
*nobody now*

4
docs/TIPS.md Normal file
View File

@ -0,0 +1,4 @@
**This chapter introduces some useful tricks if you are using gStore to implement applications.**
*no tips available now*

165
docs/USAGE.md Normal file
View File

@ -0,0 +1,165 @@
## gStore currently includes four executables and others.
#### 1. gload
gload is used to build a new database from a RDF triple format file.
`# ./gload db_name rdf_triple_file_path`
For example, we build a database from LUBM_10.n3 which can be found in example folder.
[bookug@localhost gStore]$ ./gload LUBM10.db ./data/LUBM_10.n3
2015年05月21日 星期四 20时58分21秒 -0.484698 seconds
gload...
argc: 3 DB_store:db_LUBM10 RDF_data: ./data/LUBM_10.n3
begin encode RDF from : ./data/LUBM_10.n3 ...
- - -
#### 2. gquery
gquery is used to query an existing database with files containing SPARQL queries.(each file contains exact one SPARQL query)
Type `./gquery db_name query_file` to execute the SPARQL query retrieved from query_file in the database named db_name.
Use `./gquery --help` for detail information of gquery usage.
To enter the gquery console, type `./gquery db_name`. The program shows a command prompt("gsql>"), and you can type in a command here. Use `help` to see basic information of all commands, while `help command_t` shows details of a specified command.
Type `quit` to leave the gquery console.
For `sparql` command, input a file path which contains a single SPARQL query. (*answer redirecting to file is supported*)
When the program finish answering the query, it shows the command prompt again.
*gStore2.0 only support simple "select" queries(not for predicates) now.*
We also take LUBM_10.n3 as an example.
[bookug@localhost gStore]$ ./gquery LUBM10.db
gquery...
argc: 2 DB_store:db_LUBM10/
loadTree...
LRUCache initial...
LRUCache initial finish
finish loadCache
finish loadEntityID2FileLineMap
open KVstore
finish load
finish loading
Type `help` for information of all commands
Type `help command_t` for detail of command_t
gsql>sparql ./data/LUBM_q0.sql
... ...
Total time used: 4ms.
final result is :
<http://www.Department0.University0.edu/FullProfessor0>
<http://www.Department1.University0.edu/FullProfessor0>
<http://www.Department2.University0.edu/FullProfessor0>
<http://www.Department3.University0.edu/FullProfessor0>
<http://www.Department4.University0.edu/FullProfessor0>
<http://www.Department5.University0.edu/FullProfessor0>
<http://www.Department6.University0.edu/FullProfessor0>
<http://www.Department7.University0.edu/FullProfessor0>
<http://www.Department8.University0.edu/FullProfessor0>
<http://www.Department9.University0.edu/FullProfessor0>
<http://www.Department10.University0.edu/FullProfessor0>
<http://www.Department11.University0.edu/FullProfessor0>
<http://www.Department12.University0.edu/FullProfessor0>
<http://www.Department13.University0.edu/FullProfessor0>
<http://www.Department14.University0.edu/FullProfessor0>
Notice:
- "[empty result]" will be printed if no answer, and there is an empty line after all results.
- readline lib is used, so you can use <UP> arrow key in your keyboard to see command history, and use <LEFT> and <RIGHT> arrow key to move and modify your entire command.
- path completion is supported for utility. (not built-in command completion)
- - -
#### 3. gserver
gserver is a daemon. It should be launched first when accessing gStore by gclient or API. It communicates with client through socket.
[bookug@localhost gStore]$ ./gserver
port=3305
Wait for input...
You can also assign a custom port for listening.
[bookug@localhost gStore]$ ./gserver 3307
port=3307
Wait for input...
Notice: Multiple threads are not supported by gserver. If you start up gclient in more than one terminal in the same time, gserver will go down.
- - -
#### 4. gclient
gclient is designed as a client to send commands and receive feedbacks.
[bookug@localhost gStore]$ ./gclient
ip=127.0.0.1 port=3305
gsql>
You can also assign gserver's ip and port.
[bookug@localhost gStore]$ ./gclient 172.31.19.15 3307
ip=172.31.19.15 port=3307
gsql>
We can use these following commands now:
- `help` shows the information of all commands
- `import db_name rdf_triple_file_name` build a database from RDF triple file
- `load db_name` load an existing database
- `unload db_name` unload database, but will not delete it on disk, you can load it next time
- `sparql "query_string"` query the current database with a SPARQL query string(quoted by "")
- `show` displays the name of the current loaded database
Notice:
- at most one database can be loaded in the gclient console
- you can place ' ' or '\t' between different parts of command, but not use characters like ';'
- you should not place any space or tab ahead of the start of any command
- - -
#### 5. test utilities
A series of test program are placed in the test/ folder, and we will introduce the two useful ones: gtest.cpp and full_test.sh
**gtest is used to test gStore with multiple datasets and queries.**
To use gtest utility, please type `make gtest` to compile the gtest program first. Program gtest is a test tool to generate structural logs for datasets. Please type `./gtest --help` in the working directory for details.
**Please change paths in the test/gtest.cpp if needed.**
You should place the datasets and queries in this way:
DIR/WatDiv/database/*.nt
DIR/WatDiv/query/*.sql
Notice that DIR is the root directory where you place all datasets waiting to be used by gtest. And WatDiv is a class of datasets, as well as LUBM. Inside WatDiv(or LUBM, etc. please place all datasets(named with .nt) in a database/ folder, and place all queries(corresponding to datasets, named with .sql) in a query folder.
Then you can run the gtest program with specified parameters, and the output will be sorted into three logs in gStore root directory: load.log/(for database loading time and size), time.log/(for query time) and result.log/(for all query results, not the entire output strings, but the information to record the selected two database systems matched or not).
All logs produced by this program are in TSV format(separated with '\t'), you can load them into Calc/Excel/Gnumeric directly. Notice that time unit is ms, and space unit is kb.
**full_test.sh is used to compare the performance of gStore and other database systems on multiple datasets and queries.**
To use full_test.sh utility, please download the database system which you want to tats and compare, and set the exact position of database systems and datasets in this script. The name strategy should be the same as the requirements of gtest, as well as the logs strategy.
Only gStore and Jena are tested and compared in this script, but it is easy to add other database systems, if you would like to spend some time on reading this script. You may go to [test report](pdf/gstore测试报告.pdf) or [Frequently Asked Questions](FAQ.md) for help if you encounter a problem.