Berkeley DB 4.8 with rust build script for linux.

author: Jesse Morgan <jesse@jesterpm.net> 2016-12-17 21:28:53 -0800
committer: Jesse Morgan <jesse@jesterpm.net> 2016-12-17 21:28:53 -0800
commit: 54df2afaa61c6a03cbb4a33c9b90fa572b6d07b8 (patch)
tree: 18147b92b969d25ffbe61935fb63035cac820dd0 /db-4.8.30/examples_c/csv/README
1 files changed, 408 insertions, 0 deletions
diff --git a/db-4.8.30/examples_c/csv/README b/db-4.8.30/examples_c/csv/README
new file mode 100644
index 0000000..6a5fd13
--- /dev/null
+++ b/db-4.8.30/examples_c/csv/README
@@ -0,0 +1,408 @@
+/*-
+ * See the file LICENSE for redistribution information.
+ *
+ * Copyright (c) 2005-2009 Oracle.  All rights reserved.
+ *
+ * $Id$
+ */
+
+The "comma-separated value" (csv) directory is a suite of three programs:
+
+	csv_code:  write "helper" code on which to build applications,
+	csv_load:  import csv files into a Berkeley DB database,
+	csv_query: query databases created by csv_load.
+
+The goal is to allow programmers to easily build applications for using
+csv databases.
+
+You can build the three programs, and run a sample application in this
+directory.
+
+First, there's the sample.csv file:
+
+	Adams,Bob,01/02/03,green,apple,37
+	Carter,Denise Ann,04/05/06,blue,banana,38
+	Eidel,Frank,07/08/09,red,cherry,38
+	Grabel,Harriet,10/11/12,purple,date,40
+	Indals,Jason,01/03/05,pink,orange,32
+	Kilt,Laura,07/09/11,yellow,grape,38
+	Moreno,Nancy,02/04/06,black,strawberry,38
+	Octon,Patrick,08/10/12,magenta,kiwi,15
+
+The fields are:
+	Last name,
+	First name,
+	Birthdate,
+	Favorite color,
+	Favorite fruit,
+	Age
+
+Second, there's a "description" of that csv file in sample.desc:
+
+	version 1 {
+		LastName	string
+		FirstName	string
+		BirthDate
+		Color		string index
+		Fruit		string index
+		Age		unsigned_long index
+	}
+
+The DESCRIPTION file maps one-to-one to the fields in the csv file, and
+provides a data type for any field the application wants to use.  (If
+the application doesn't care about a field, don't specify a data type
+and the csv code will ignore it.)  The string "index" specifies there
+should be a secondary index based on the field.
+
+The "field" names in the DESCRIPTION file don't have to be the same as
+the ones in the csv file (and, as they may not have embedded spaces,
+probably won't be).
+
+To build in the sample directory, on POSIX-like systems, type "make".
+This first builds the program csv_code, which it then run, with the file
+DESCRIPTION as an input.  Running csv_code creates two additional files:
+csv_local.c and csv_local.h.  Those two files are then used as part of
+the build process for two more programs: csv_load and csv_query.
+
+You can load now load the csv file into a Berkeley DB database with the
+following command:
+
+	% ./csv_load -h TESTDIR < sample.csv
+
+The csv_load command will create a directory and four databases:
+
+	primary		primary database
+	Age		secondary index on Age field
+	Color		secondary index on Color field
+	Fruit		secondary index on Fruit field
+
+You can then query the database:
+
+	% ./csv_query -h TESTDIR
+	Query: id=2
+	Record: 2:
+		LastName: Carter
+		FirstName: Denise
+		Color: blue
+		Fruit: banana
+		Age: 38
+	Query: color==green
+	Record: 1:
+		LastName: Adams
+		FirstName: Bob
+		Color: green
+		Fruit: apple
+		Age: 37
+
+and so on.
+
+The csv_code process also creates source code modules that support
+building your own applications based on this database.  First, there
+is the local csv_local.h include file:
+
+	/*
+	 *  DO NOT EDIT: automatically built by csv_code.
+	 *
+	 * Record structure.
+	 */
+	typedef struct __DbRecord {
+		u_int32_t	 recno;		/* Record number */
+
+		/*
+		 * Management fields
+		 */
+		void		*raw;		/* Memory returned by DB */
+		char		*record;	/* Raw record */
+		size_t		 record_len;	/* Raw record length */
+
+		u_int32_t	 field_count;	/* Field count */
+		u_int32_t	 version;	/* Record version */
+
+		u_int32_t	*offset;	/* Offset table */
+
+		/*
+		 * Indexed fields
+		 */
+	#define	CSV_INDX_LASTNAME	1
+		char		*LastName;
+
+	#define	CSV_INDX_FIRSTNAME	2
+		char		*FirstName;
+
+	#define	CSV_INDX_COLOR	4
+		char		*Color;
+
+	#define	CSV_INDX_FRUIT	5
+		char		*Fruit;
+
+	#define	CSV_INDX_AGE	6
+		u_long		 Age;
+	} DbRecord;
+
+This defines the DbRecord structure that is the primary object for this
+csv file.  As you can see, the intersting fields in the csv file have
+mappings in this structure.
+
+Also, there are routines in the Dbrecord.c file your application can use
+to handle DbRecord structures.  When you retrieve a record from the
+database the DbRecord structure will be filled in based on that record.
+
+Here are the helper routines:
+
+	int
+	DbRecord_print(DbRecord *recordp, FILE *fp)
+		Display the contents of a DbRecord structure to the specified
+		output stream.
+
+	int
+	DbRecord_init(const DBT *key, DBT *data, DbRecord *recordp)
+		Fill in a DbRecord from a returned database key/data pair.
+
+	int
+	DbRecord_read(u_long key, DbRecord *recordp)
+		Read the specified record (DbRecord_init will be called
+		to fill in the DbRecord).
+
+	int
+	DbRecord_discard(DbRecord *recordp)
+		Discard the DbRecord structure (must be called after the
+		DbRecord_read function), when the application no longer
+		needs the returned DbRecord.
+
+	int
+	DbRecord_search_field_name(char *field, char *value, OPERATOR op)
+		Display the DbRecords where the field (named by field) has
+		the specified relationship to the value.  For example:
+
+		DbRecord_search_field_name("Age", "35", GT)
+
+		would search for records with a "Age" field greater than
+		35.
+
+	int
+	DbRecord_search_field_number(
+	    u_int32_t fieldno, char *value, OPERATOR op)
+		Display the DbRecords where the field (named by field)
+		has the specified relationship to the value.  The field
+		number used as an argument comes from the csv_local.h
+		file, for example, CSV_INDX_AGE is the field index for
+		the "Age" field in this csv file.  For example:
+
+		DbRecord_search_field_number(CSV_INDX_AGE, 35, GT)
+
+		would search for records with a "Age" field greater than
+		35.
+
+	Currently, the csv code only supports three types of data:
+	strings, unsigned longs and doubles.  Others can easily be
+	added.
+
+The usage of the csv_code program is as follows:
+
+	usage: csv_code [-v] [-c source-file] [-f input] [-h header-file]
+		-c	output C source code file
+		-h	output C header file
+		-f	input file
+		-v	verbose (defaults to off)
+
+	-c      A file to which to write the C language code.  By default,
+		the file "csv_local.c" is used.
+
+	-f      A file to read for a description of the fields in the
+		csv file.  By default, csv_code reads from stdin.
+
+	-h	A file to which to write the C language header structures.
+		By default, the file "csv_local.h" is used.
+
+	-v      The -v verbose flag outputs potentially useful debugging
+		information.
+
+There are two applications built on top of the code produced by
+csv_code, csv_load and csv_query.
+
+The usage of the csv_load program is as follows:
+
+	usage: csv_load [-v] [-F format] [-f csv-file] [-h home] [-V version]
+		-F	format (currently supports "excel")
+		-f      input file
+		-h      database environment home directory
+		-v      verbose (defaults to off)
+
+	-F	See "Input format" below.
+
+	-f      If an input file is specified using the -f flag, the file
+		is read and the records in the file are stored into the
+		database.  By default, csv_load reads from stdin.
+
+	-h      If a database environment home directory is specified
+		using the -h flag, that directory is used as the
+		Berkeley DB directory.  The default for -h is the
+		current working directory or the value of the DB_HOME
+		environment variable.
+
+	-V	Specify a version number for the input (the default is 1).
+
+	-v      The -v verbose flag outputs potentially useful debugging
+		information.  It can be specified twice for additional
+		information.
+
+The usage of csv_query program is as follows:
+
+	usage: csv_query [-v] [-c cmd] [-h home]
+
+	-c      A command to run, otherwise csv_query will enter
+		interactive mode and prompt for user input.
+
+	-h      If a database environment home directory is specified
+		using the -h flag, that directory is used as the
+		Berkeley DB directory.  The default for -h is the
+		current working directory or the value of the DB_HOME
+		environment variable.
+
+	-v      The -v verbose flag outputs potentially useful debugging
+		information.  It can be specified twice for additional
+		information.
+
+The query program currently supports the following commands:
+
+	?               Display help screen
+	exit            Exit program
+	fields          Display list of field names
+	help            Display help screen
+	quit            Exit program
+	version         Display database format version
+	field[op]value  Display fields by value (=, !=, <, <=, >, >=, ~, !~)
+
+The "field[op]value" command allows you to specify a field and a
+relationship to a value.  For example, you could run the query:
+
+	csv_query -c "price < 5"
+
+to list all of the records with a "price" field less than "5".
+
+Field names and all string comparisons are case-insensitive.
+
+The operators ~ and !~ do match/no-match based on the IEEE Std 1003.2
+(POSIX.2) Basic Regular Expression standard.
+
+As a special case, every database has the field "Id", which matches the
+record number of the primary key.
+
+Input format:
+	The input to the csv_load utility is a text file, containing
+	lines of comma-separated fields.
+
+	Blank lines are ignored.  All non-blank lines must be comma-separated
+	lists of fields.
+
+	By default:
+		<nul> (\000) bytes and unprintable characters are stripped,
+		input lines are <nl> (\012) separated,
+		commas cannot be escaped.
+
+	If "-F excel" is specified:
+		<nul> (\000) bytes and unprintable characters are stripped,
+		input lines are <cr> (\015) separated,
+		<nl> bytes (\012) characters are stripped from the input,
+		commas surrounded by double-quote character (") are not
+		treated as field separators.
+
+Storage format:
+	Records in the primary database are stored with a 32-bit unsigned
+	record number as the key.
+
+	Key/Data pair 0 is of the format:
+		[version]		32-bit unsigned int
+		[field count]		32-bit unsigned int
+		[raw record]		byte array
+
+	For example:
+		[1]
+		[5]
+		[field1,field2,field3,field4,field5]
+
+	All other Key/Data pairs are of the format:
+		[version]		32-bit unsigned int
+		[offset to field 1]	32-bit unsigned int
+		[offset to field 2]	32-bit unsigned int
+		[offset to field 3]	32-bit unsigned int
+		...			32-bit unsigned int
+		[offset to field N]	32-bit unsigned int
+		[offset past field N]	32-bit unsigned int
+		[raw record]		byte array
+
+	For example:
+		[1]
+		[0]
+		[2]
+		[5]
+		[9]
+		[14]
+		[19]
+		[a,ab,abc,abcd,abcde]
+		 012345678901234567890		<< byte offsets
+		 0	   1	     2
+
+	So, field 3 of the data can be directly accessed by using
+	the "offset to field 3", and the length of the field is
+	the "((offset to field 4) - (offset to field 3)) - 1".
+
+Limits:
+	The csv program stores the primary key in a 32-bit unsigned
+	value, limiting the number of records in the database.  New
+	records are inserted after the last existing record, that is,
+	new records are not inserted into gaps left by any deleted
+	records.  This will limit the total number of records stored in
+	any database.
+
+Versioning:
+	Versioning is when a database supports multiple versions of the
+	records.  This is likely to be necessary when dealing with large
+	applications and databases, as record fields change over time.
+
+	The csv application suite does not currently support versions,
+	although all of the necessary hooks are there.
+
+	The way versioning will work is as follows:
+
+	The XXX.desc file needs to support multiple version layouts.
+
+	The generated C language structure defined should be a superset
+	of all of the interesting fields from all of the version
+	layouts, regardless of which versions of the csv records those
+	fields exist in.
+
+	When the csv layer is asked for a record, the record's version
+	will provide a lookup into a separate database of field lists.
+	That is, there will be another database which has key/data pairs
+	where the key is a version number, and the data is the field
+	list.  At that point, it's relatively easy to map the fields
+	to the structure as is currently done, except that some of the
+	fields may not be filled in.
+
+	To determine if a field is filled in, in the structure, the
+	application has to have an out-of-band value to put in that
+	field during DbRecord initialization.  If that's a problem, the
+	alternative would be to add an additional field for each listed
+	field -- if the additional field is set to 1, the listed field
+	has been filled in, otherwise it hasn't.  The csv code will
+	support the notion of required fields, so in most cases the
+	application won't need to check before simply using the field,
+	it's only if a field isn't required and may be filled in that
+	the check will be necessary.
+
+TODO:
+	Csv databases are not portable between machines of different
+	byte orders.  To make them portable, all of the 32-bit unsigned
+	int fields currently written into the database should be
+	converted to a standard byte order.  This would include the
+	version number and field count in the column-map record, and the
+	version and field offsets in the other records.
+
+	Add Extended RE string matches.
+
+	Add APIs to replace the reading of a schema file, allow users to
+	fill in a DbRecord structure and do a put on it.  (Hard problem:
+	how to flag fields that aren't filled in.)
+
+	Add a second sample file, and write the actual versioning code.
author	Jesse Morgan <jesse@jesterpm.net>	2016-12-17 21:28:53 -0800
committer	Jesse Morgan <jesse@jesterpm.net>	2016-12-17 21:28:53 -0800
commit	54df2afaa61c6a03cbb4a33c9b90fa572b6d07b8 (patch)
tree	18147b92b969d25ffbe61935fb63035cac820dd0 /db-4.8.30/examples_c/csv/README