groonga - An open-source fulltext search engine and column store.

4.1. Fundamental operation

You can use groonga as a library of programming language C or an executable file. This tutorial explains how to use groonga as an executable file. Using its file, you can create and operate databases, start and connect to server, and so on.

4.1.1. Create database

You can create a new database in the following command.

Form

groonga -n DB_PATH_NAME

'-n' option specifies to create a database. DB_PATH_NAME specifies full-path of new database.

Groonga starts as interactive mode after you create a database with this command, and so groonga accepts commands from standard input. This mode is terminated with Ctrl-d.

Execution example:

% groonga -n /tmp/tutorial.db
> ctrl-d
%

4.1.2. Operate database

Form

groonga DB_PATH_NAME [COMMAND]

DB_PATH_NAME specifies full-path of existing database. If COMMAND is specified, result of COMMAND is returned.

With no COMMAND, this command starts groonga as interactive-mode. Groonga of this mode reads a command from standard input evaluates it repeatedly. This tutorial uses interactive-mode mainly.

For example, we will run the status command. This command returns status of groonga's execution.

Execution example:

> table_create --name Type --flags TABLE_HASH_KEY --key_type ShortText
[[0,1317212791.02322,0.03942904],true]
> column_create --table Type --name number --type Int32
[[0,1317212791.26314,0.124383285],true]
> column_create --table Type --name float --type Float
[[0,1317212791.58803,0.027924039],true]
> column_create --table Type --name string --type ShortText
[[0,1317212791.81654,0.040399047],true]
> column_create --table Type --name time --type Time
[[0,1317212792.05751,0.027354067],true]
> load --table Type
> [{"_key":"sample","number":12345,"float":42.195,"string":"GROONGA","time":1234567890.12}]
[[0,1317212792.28516,0.200775839],1]
> select --table Type
[[0,1317212792.68655,0.000199477],[[[1],[["_id","UInt32"],["_key","ShortText"],["time","Time"],["string","ShortText"],["number","Int32"],["float","Float"]],[1,"sample",1234567890.12,"GROONGA",12345,42.195]]]]

The mentioned above, results of executed commands are generally JSON style. The first element in a array of JSON has information of error-code, execution time, and so on. The second element has a result of exectuted command.

4.1.3. Commands

You can operate database with various commands via execution file of groonga or groonga server. There are forms of commands in the following:

Form1: COMMAND ARGUMENT1 ARGUMENT2 ..

Form2: COMMAND --ARAGUMENT1 VALUE1 --ARGUMENT2 VALUE2 ..

You can mix these forms in commands running.

In Form2, if you want to specify a value including some spaces or symbols("'()/), you should enclose its value with single-quote or double-quote.

For detail, you can see paragraph of "command" in groonga実行ファイル.

4.1.4. Basicaly commands

status
Show status of groonga process.
table_list
Show lists of tables defined in a database.
column_list
Show lists of columns defined in a table.
table_create
Add table to a database.
column_create
Add column to a table.
select
Search and show records included a table.
load
Insert record to a table.

4.1.5. Create table

table_create creates table.

In using groonga, to creating tables generally needed master key. Master key should be specified the types and the way to store.

We're going to explain the types in tutorial after. Please imagine it as expressing sort of data. How to store master key defines speed of search with master key and advisability of begins-with-match search. This is also explained in this tutorial later.

For example, we create 'Site' table. This table has master key of ShortText type, and the way to store its key is HASH.

Execution example:

> column_create --table Site --name link --type Site
[[0,1317212792.88872,0.060705006],true]
> load --table Site
> [{"_key":"http://example.org/","link":"http://example.net/"}]
[[0,1317212793.14984,0.200481934],1]
> select --table Site --output_columns _key,title,link._key,link.title --query title:@this
[[0,1317212793.55084,0.000485897],[[[1],[["_key","ShortText"],["title","ShortText"],["link._key","ShortText"],["link.title","ShortText"]],["http://example.org/","This is test record 1!","http://example.net/","test record 2."]]]]

4.1.7. Create columns

column_create command create columns.

We add a column named 'comment' that lets us store value whose type is ShortText.

Execution example:

> column_create --table Site --name title --flags COLUMN_SCALAR --type ShortText
[[0,1317212712.91734,0.077833747],true]
> select --table Site
[[0,1317212713.19572,0.000121119],[[[0],[["_id","UInt32"],["_key","ShortText"],["title","ShortText"]]]]]

COLUMN_SCALAR means this is normal column.

4.1.8. Create terminology table with fulltext-searching

This tutorial explains fulltext searching with entried data in groonga table.

We need terminology table in fulltext-searching. Terminology table is a table whose master key's values are words in text. We create 'Terms' table, it has type of master key value is ShortText.

Execution example:

> table_create --name Terms --flags TABLE_PAT_KEY|KEY_NORMALIZE --key_type ShortText --default_tokenizer TokenBigram
[[0,1317212713.39679,0.092312046],true]

Many parameters is specified in this execution example. You don't hove to understand all parameters. There are the simple explaination, but you can skipped.

In this examples, 'TABLE_PAT_KEY|KEY_NORMALIZE' stores master key in patricia-trie and entries each teminology after nomalized. The 'default_tokenizer' parametar specifies the way to tokenize target texts. In this examples, we specifies 'TokenBigram' as this parameter, and so we choose 'N-gram' generally called.

4.1.10. Load data

load is used to load data for groonga database. This command stores json-formatted data in a table.

Execution example:

> load --table Site
> [
> {"_key":"http://example.org/","title":"This is test record 1!"},
> {"_key":"http://example.net/","title":"test record 2."},
> {"_key":"http://example.com/","title":"test test record three."},
> {"_key":"http://example.net/afr","title":"test record four."},
> {"_key":"http://example.org/aba","title":"test test test record five."},
> {"_key":"http://example.com/rab","title":"test test test test record six."},
> {"_key":"http://example.net/atv","title":"test test test record seven."},
> {"_key":"http://example.org/gat","title":"test test record eight."},
> {"_key":"http://example.com/vdw","title":"test test record nine."},
> ]
[[0,1317212714.08816,2.203527402],9]

Let's make sure that its table has data with 'select' command.

Execution example:

> select --table Site
[[0,1317212716.49285,0.000270908],[[[9],[["_id","UInt32"],["_key","ShortText"],["title","ShortText"]],[1,"http://example.org/","This is test record 1!"],[2,"http://example.net/","test record 2."],[3,"http://example.com/","test test record three."],[4,"http://example.net/afr","test record four."],[5,"http://example.org/aba","test test test record five."],[6,"http://example.com/rab","test test test test record six."],[7,"http://example.net/atv","test test test record seven."],[8,"http://example.org/gat","test test record eight."],[9,"http://example.com/vdw","test test record nine."]]]]

4.1.11. Search data

'_id' and '_key' columns are unique in groonga's table, so let's search data in table using these columns.

You can search data using 'select' command with 'query' parameter.

Execution example:

> select --table Site --query _id:1
[[0,1317212716.69871,0.000308514],[[[1],[["_id","UInt32"],["_key","ShortText"],["title","ShortText"]],[1,"http://example.org/","This is test record 1!"]]]]

'_id:1' specified 'query' parameter means to search records whose '_id' column has '1'.

Let's search records with '_key' column.

Execution example:

> select --table Site --query "_key:\"http://example.org/\""
[[0,1317212716.9005,0.000478343],[[[1],[["_id","UInt32"],["_key","ShortText"],["title","ShortText"]],[1,"http://example.org/","This is test record 1!"]]]]

'_key:"http://example.org/"' specified 'query' parameter means to search records whose '_key' column has '"http://example.org/"'.

4.1.12. Fulltext searching

Using 'query' parameter, you can fulltext search with index.

Execution example:

> select --table Site --query title:@this
[[0,1317212717.10303,0.000581287],[[[1],[["_id","UInt32"],["_key","ShortText"],["title","ShortText"]],[1,"http://example.org/","This is test record 1!"]]]]

This command shows result of fulltext searching by string 'this' for 'title' column.

"title:@this" specified 'query' parameter means to search records whose 'title' column including 'this' string.

'select' command has parameter 'match_columns'.

If this parameter is specified, it means to search in columns specified 'match_columns' when 'query' parameter doesn't specify column-name condition.[1]_

If you specify 'match_columns' is 'title' and 'query' is 'this', you can take same result as above query.

Execution example:

> select --table Site --match_columns title --query this
[[0,1317212717.30596,0.000716439],[[[1],[["_id","UInt32"],["_key","ShortText"],["title","ShortText"]],[1,"http://example.org/","This is test record 1!"]]]]

4.1.13. Specify output column

'output_columns' parameter in 'select' command specifies columns shown in result of search.

If you want to specify some columns, you should separate column names by comma(,).

Execution example:

> select --table Site --output_columns _key,title,_score --query title:@test
[[0,1317212717.50916,0.00060758],[[[9],[["_key","ShortText"],["title","ShortText"],["_score","Int32"]],["http://example.org/","This is test record 1!",1],["http://example.net/","test record 2.",1],["http://example.com/","test test record three.",2],["http://example.net/afr","test record four.",1],["http://example.org/aba","test test test record five.",3],["http://example.com/rab","test test test test record six.",4],["http://example.net/atv","test test test record seven.",3],["http://example.org/gat","test test record eight.",2],["http://example.com/vdw","test test record nine.",2]]]]

"_score" column is added to The groonga's result. This column has the higher number, the more condition of fulltext seaching matches text.

4.1.14. Ranges to display

'select' command can display result in only specified ranges using 'offset' and 'limit' parameter. This parameters is useful when you want to show only a page in much result of searching.

'offset' parameter specifies starting point of result. If you want 'select' command to return from first records, this parameter specifies '0'.

'limit' parameter specifies how many records of searching result.

Execution example:

> select --table Site --offset 0 --limit 3
[[0,1317212717.71574,0.000238544],[[[9],[["_id","UInt32"],["_key","ShortText"],["title","ShortText"]],[1,"http://example.org/","This is test record 1!"],[2,"http://example.net/","test record 2."],[3,"http://example.com/","test test record three."]]]]
> select --table Site --offset 3 --limit 3
[[0,1317212717.91925,0.00023617],[[[9],[["_id","UInt32"],["_key","ShortText"],["title","ShortText"]],[4,"http://example.net/afr","test record four."],[5,"http://example.org/aba","test test test record five."],[6,"http://example.com/rab","test test test test record six."]]]]
> select --table Site --offset 7 --limit 3
[[0,1317212718.12219,0.00019999],[[[9],[["_id","UInt32"],["_key","ShortText"],["title","ShortText"]],[8,"http://example.org/gat","test test record eight."],[9,"http://example.com/vdw","test test record nine."]]]]

4.1.15. Sort

If you use 'sortby' parameter in 'select' command, this command sorts result of searching.

When 'sortby' parameter specifies column name, result is sorted in ascending-order to its column's value. This 'select' command also sort in descending-order when you add hyphen(-) before column name.

Execution example:

> select --table Site --sortby -_id
[[0,1317212718.32565,0.000385755],[[[9],[["_id","UInt32"],["_key","ShortText"],["title","ShortText"]],[9,"http://example.com/vdw","test test record nine."],[8,"http://example.org/gat","test test record eight."],[7,"http://example.net/atv","test test test record seven."],[6,"http://example.com/rab","test test test test record six."],[5,"http://example.org/aba","test test test record five."],[4,"http://example.net/afr","test record four."],[3,"http://example.com/","test test record three."],[2,"http://example.net/","test record 2."],[1,"http://example.org/","This is test record 1!"]]]]

For condition of sort, you can use '_score' column introduced in the paragraph of "Specify output column".

Execution example:

> select --table Site --query title:@test --output_columns _id,_score,title --sortby _score
[[0,1317212718.5331,0.000667311],[[[9],[["_id","UInt32"],["_score","Int32"],["title","ShortText"]],[1,1,"This is test record 1!"],[2,1,"test record 2."],[4,1,"test record four."],[3,2,"test test record three."],[9,2,"test test record nine."],[8,2,"test test record eight."],[7,3,"test test test record seven."],[5,3,"test test test record five."],[6,4,"test test test test record six."]]]]

If you want to specify some column names, you should use comma(,) between these names. In this case, when same value of records is existed in first column, this command sorts result of searching to value of second column.

Execution example:

> select --table Site --query title:@test --output_columns _id,_score,title --sortby _score,_id
[[0,1317212718.73819,0.00069225],[[[9],[["_id","UInt32"],["_score","Int32"],["title","ShortText"]],[1,1,"This is test record 1!"],[2,1,"test record 2."],[4,1,"test record four."],[3,2,"test test record three."],[8,2,"test test record eight."],[9,2,"test test record nine."],[5,3,"test test test record five."],[7,3,"test test test record seven."],[6,4,"test test test test record six."]]]]

footnote

[1]In now groonga's version, you can only use 'match_columns' parameter in the case of existing index of fulltext searching. This parameter cannot be use in searching for ordinary columns.