From 54df2afaa61c6a03cbb4a33c9b90fa572b6d07b8 Mon Sep 17 00:00:00 2001 From: Jesse Morgan Date: Sat, 17 Dec 2016 21:28:53 -0800 Subject: Berkeley DB 4.8 with rust build script for linux. --- db-4.8.30/examples_c/csv/README | 408 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 408 insertions(+) create mode 100644 db-4.8.30/examples_c/csv/README (limited to 'db-4.8.30/examples_c/csv/README') diff --git a/db-4.8.30/examples_c/csv/README b/db-4.8.30/examples_c/csv/README new file mode 100644 index 0000000..6a5fd13 --- /dev/null +++ b/db-4.8.30/examples_c/csv/README @@ -0,0 +1,408 @@ +/*- + * See the file LICENSE for redistribution information. + * + * Copyright (c) 2005-2009 Oracle. All rights reserved. + * + * $Id$ + */ + +The "comma-separated value" (csv) directory is a suite of three programs: + + csv_code: write "helper" code on which to build applications, + csv_load: import csv files into a Berkeley DB database, + csv_query: query databases created by csv_load. + +The goal is to allow programmers to easily build applications for using +csv databases. + +You can build the three programs, and run a sample application in this +directory. + +First, there's the sample.csv file: + + Adams,Bob,01/02/03,green,apple,37 + Carter,Denise Ann,04/05/06,blue,banana,38 + Eidel,Frank,07/08/09,red,cherry,38 + Grabel,Harriet,10/11/12,purple,date,40 + Indals,Jason,01/03/05,pink,orange,32 + Kilt,Laura,07/09/11,yellow,grape,38 + Moreno,Nancy,02/04/06,black,strawberry,38 + Octon,Patrick,08/10/12,magenta,kiwi,15 + +The fields are: + Last name, + First name, + Birthdate, + Favorite color, + Favorite fruit, + Age + +Second, there's a "description" of that csv file in sample.desc: + + version 1 { + LastName string + FirstName string + BirthDate + Color string index + Fruit string index + Age unsigned_long index + } + +The DESCRIPTION file maps one-to-one to the fields in the csv file, and +provides a data type for any field the application wants to use. (If +the application doesn't care about a field, don't specify a data type +and the csv code will ignore it.) The string "index" specifies there +should be a secondary index based on the field. + +The "field" names in the DESCRIPTION file don't have to be the same as +the ones in the csv file (and, as they may not have embedded spaces, +probably won't be). + +To build in the sample directory, on POSIX-like systems, type "make". +This first builds the program csv_code, which it then run, with the file +DESCRIPTION as an input. Running csv_code creates two additional files: +csv_local.c and csv_local.h. Those two files are then used as part of +the build process for two more programs: csv_load and csv_query. + +You can load now load the csv file into a Berkeley DB database with the +following command: + + % ./csv_load -h TESTDIR < sample.csv + +The csv_load command will create a directory and four databases: + + primary primary database + Age secondary index on Age field + Color secondary index on Color field + Fruit secondary index on Fruit field + +You can then query the database: + + % ./csv_query -h TESTDIR + Query: id=2 + Record: 2: + LastName: Carter + FirstName: Denise + Color: blue + Fruit: banana + Age: 38 + Query: color==green + Record: 1: + LastName: Adams + FirstName: Bob + Color: green + Fruit: apple + Age: 37 + +and so on. + +The csv_code process also creates source code modules that support +building your own applications based on this database. First, there +is the local csv_local.h include file: + + /* + * DO NOT EDIT: automatically built by csv_code. + * + * Record structure. + */ + typedef struct __DbRecord { + u_int32_t recno; /* Record number */ + + /* + * Management fields + */ + void *raw; /* Memory returned by DB */ + char *record; /* Raw record */ + size_t record_len; /* Raw record length */ + + u_int32_t field_count; /* Field count */ + u_int32_t version; /* Record version */ + + u_int32_t *offset; /* Offset table */ + + /* + * Indexed fields + */ + #define CSV_INDX_LASTNAME 1 + char *LastName; + + #define CSV_INDX_FIRSTNAME 2 + char *FirstName; + + #define CSV_INDX_COLOR 4 + char *Color; + + #define CSV_INDX_FRUIT 5 + char *Fruit; + + #define CSV_INDX_AGE 6 + u_long Age; + } DbRecord; + +This defines the DbRecord structure that is the primary object for this +csv file. As you can see, the intersting fields in the csv file have +mappings in this structure. + +Also, there are routines in the Dbrecord.c file your application can use +to handle DbRecord structures. When you retrieve a record from the +database the DbRecord structure will be filled in based on that record. + +Here are the helper routines: + + int + DbRecord_print(DbRecord *recordp, FILE *fp) + Display the contents of a DbRecord structure to the specified + output stream. + + int + DbRecord_init(const DBT *key, DBT *data, DbRecord *recordp) + Fill in a DbRecord from a returned database key/data pair. + + int + DbRecord_read(u_long key, DbRecord *recordp) + Read the specified record (DbRecord_init will be called + to fill in the DbRecord). + + int + DbRecord_discard(DbRecord *recordp) + Discard the DbRecord structure (must be called after the + DbRecord_read function), when the application no longer + needs the returned DbRecord. + + int + DbRecord_search_field_name(char *field, char *value, OPERATOR op) + Display the DbRecords where the field (named by field) has + the specified relationship to the value. For example: + + DbRecord_search_field_name("Age", "35", GT) + + would search for records with a "Age" field greater than + 35. + + int + DbRecord_search_field_number( + u_int32_t fieldno, char *value, OPERATOR op) + Display the DbRecords where the field (named by field) + has the specified relationship to the value. The field + number used as an argument comes from the csv_local.h + file, for example, CSV_INDX_AGE is the field index for + the "Age" field in this csv file. For example: + + DbRecord_search_field_number(CSV_INDX_AGE, 35, GT) + + would search for records with a "Age" field greater than + 35. + + Currently, the csv code only supports three types of data: + strings, unsigned longs and doubles. Others can easily be + added. + +The usage of the csv_code program is as follows: + + usage: csv_code [-v] [-c source-file] [-f input] [-h header-file] + -c output C source code file + -h output C header file + -f input file + -v verbose (defaults to off) + + -c A file to which to write the C language code. By default, + the file "csv_local.c" is used. + + -f A file to read for a description of the fields in the + csv file. By default, csv_code reads from stdin. + + -h A file to which to write the C language header structures. + By default, the file "csv_local.h" is used. + + -v The -v verbose flag outputs potentially useful debugging + information. + +There are two applications built on top of the code produced by +csv_code, csv_load and csv_query. + +The usage of the csv_load program is as follows: + + usage: csv_load [-v] [-F format] [-f csv-file] [-h home] [-V version] + -F format (currently supports "excel") + -f input file + -h database environment home directory + -v verbose (defaults to off) + + -F See "Input format" below. + + -f If an input file is specified using the -f flag, the file + is read and the records in the file are stored into the + database. By default, csv_load reads from stdin. + + -h If a database environment home directory is specified + using the -h flag, that directory is used as the + Berkeley DB directory. The default for -h is the + current working directory or the value of the DB_HOME + environment variable. + + -V Specify a version number for the input (the default is 1). + + -v The -v verbose flag outputs potentially useful debugging + information. It can be specified twice for additional + information. + +The usage of csv_query program is as follows: + + usage: csv_query [-v] [-c cmd] [-h home] + + -c A command to run, otherwise csv_query will enter + interactive mode and prompt for user input. + + -h If a database environment home directory is specified + using the -h flag, that directory is used as the + Berkeley DB directory. The default for -h is the + current working directory or the value of the DB_HOME + environment variable. + + -v The -v verbose flag outputs potentially useful debugging + information. It can be specified twice for additional + information. + +The query program currently supports the following commands: + + ? Display help screen + exit Exit program + fields Display list of field names + help Display help screen + quit Exit program + version Display database format version + field[op]value Display fields by value (=, !=, <, <=, >, >=, ~, !~) + +The "field[op]value" command allows you to specify a field and a +relationship to a value. For example, you could run the query: + + csv_query -c "price < 5" + +to list all of the records with a "price" field less than "5". + +Field names and all string comparisons are case-insensitive. + +The operators ~ and !~ do match/no-match based on the IEEE Std 1003.2 +(POSIX.2) Basic Regular Expression standard. + +As a special case, every database has the field "Id", which matches the +record number of the primary key. + +Input format: + The input to the csv_load utility is a text file, containing + lines of comma-separated fields. + + Blank lines are ignored. All non-blank lines must be comma-separated + lists of fields. + + By default: + (\000) bytes and unprintable characters are stripped, + input lines are (\012) separated, + commas cannot be escaped. + + If "-F excel" is specified: + (\000) bytes and unprintable characters are stripped, + input lines are (\015) separated, + bytes (\012) characters are stripped from the input, + commas surrounded by double-quote character (") are not + treated as field separators. + +Storage format: + Records in the primary database are stored with a 32-bit unsigned + record number as the key. + + Key/Data pair 0 is of the format: + [version] 32-bit unsigned int + [field count] 32-bit unsigned int + [raw record] byte array + + For example: + [1] + [5] + [field1,field2,field3,field4,field5] + + All other Key/Data pairs are of the format: + [version] 32-bit unsigned int + [offset to field 1] 32-bit unsigned int + [offset to field 2] 32-bit unsigned int + [offset to field 3] 32-bit unsigned int + ... 32-bit unsigned int + [offset to field N] 32-bit unsigned int + [offset past field N] 32-bit unsigned int + [raw record] byte array + + For example: + [1] + [0] + [2] + [5] + [9] + [14] + [19] + [a,ab,abc,abcd,abcde] + 012345678901234567890 << byte offsets + 0 1 2 + + So, field 3 of the data can be directly accessed by using + the "offset to field 3", and the length of the field is + the "((offset to field 4) - (offset to field 3)) - 1". + +Limits: + The csv program stores the primary key in a 32-bit unsigned + value, limiting the number of records in the database. New + records are inserted after the last existing record, that is, + new records are not inserted into gaps left by any deleted + records. This will limit the total number of records stored in + any database. + +Versioning: + Versioning is when a database supports multiple versions of the + records. This is likely to be necessary when dealing with large + applications and databases, as record fields change over time. + + The csv application suite does not currently support versions, + although all of the necessary hooks are there. + + The way versioning will work is as follows: + + The XXX.desc file needs to support multiple version layouts. + + The generated C language structure defined should be a superset + of all of the interesting fields from all of the version + layouts, regardless of which versions of the csv records those + fields exist in. + + When the csv layer is asked for a record, the record's version + will provide a lookup into a separate database of field lists. + That is, there will be another database which has key/data pairs + where the key is a version number, and the data is the field + list. At that point, it's relatively easy to map the fields + to the structure as is currently done, except that some of the + fields may not be filled in. + + To determine if a field is filled in, in the structure, the + application has to have an out-of-band value to put in that + field during DbRecord initialization. If that's a problem, the + alternative would be to add an additional field for each listed + field -- if the additional field is set to 1, the listed field + has been filled in, otherwise it hasn't. The csv code will + support the notion of required fields, so in most cases the + application won't need to check before simply using the field, + it's only if a field isn't required and may be filled in that + the check will be necessary. + +TODO: + Csv databases are not portable between machines of different + byte orders. To make them portable, all of the 32-bit unsigned + int fields currently written into the database should be + converted to a standard byte order. This would include the + version number and field count in the column-map record, and the + version and field offsets in the other records. + + Add Extended RE string matches. + + Add APIs to replace the reading of a schema file, allow users to + fill in a DbRecord structure and do a put on it. (Hard problem: + how to flag fields that aren't filled in.) + + Add a second sample file, and write the actual versioning code. -- cgit v1.2.3