Added technical FAQ.

author: Brian S. O'Neill <bronee@gmail.com> 2006-10-24 05:06:11 +0000
committer: Brian S. O'Neill <bronee@gmail.com> 2006-10-24 05:06:11 +0000
commit: c52e4c055a9fa0f59a1ca19538d88377b51de44b (patch)
tree: d29293b8a1e6cd491ae0c5d74eedba1885386705
parent: 79347bcbe76acba5e4b9d94ce25d2d445fe7670f (diff)
2 files changed, 327 insertions, 0 deletions
diff --git a/src/site/fml/technical-faq.fml b/src/site/fml/technical-faq.fml
new file mode 100644
index 0000000..74aee55
--- /dev/null
+++ b/src/site/fml/technical-faq.fml
@@ -0,0 +1,326 @@
+<?xml version="1.0"?>
+<faqs id="FAQ" title="Frequently Asked Technical Questions">
+ <part id="General">
+
+   <faq id="process-killed">
+     <question>What happens when a Carbonado process is killed while in the middle of a transaction?</question>
+     <answer>
+       <p>
+Carbonado uses shutdown hooks to make sure that all in progress transactions
+are properly rolled back. If you hard-kill a process (kill -9), then the
+shutdown won't get run. This can cause a problem when using BDB. In either
+case, db_recover must be run to prevent future data corruption. BDB-JE is not
+affected, however, as it automatically runs recovery upon restart.
+       </p>
+     </answer>
+   </faq>
+
+   <faq id="replicated-bootstrap">
+     <question> How do I bootstrap a replicated repository?</question>
+     <answer>
+<p>
+By running a resync operation programmatically. The ReplicatedRepository has a
+ResyncCapability which has a single method named "resync". It accepts a
+Storable class type, a throttle parameter, and an optional filter. Consult the
+<a href="http://carbonado.sourceforge.net/apidocs/com/amazon/carbonado/capability/ResyncCapability.html">Javadocs</a> for more info.
+</p>
+<p>
+In your application you might find it convenient to add an administrative
+command to invoke the resync operation. This makes it easy to repair any
+inconsistencies that might arise over time.
+</p>
+     </answer>
+   </faq>
+
+   <faq id="deadlock">
+     <question> I sometimes see lock timeout errors and deadlocks, what is going on?</question>
+     <answer>
+<p>
+A lock timeout may be caused by an operation that for whatever reason took too
+long, or it may also indicate a deadlock. By default, Carbonado uses a lock
+timeout of 0.5 seconds for BDB based repositories. It can be changed by calling
+setLockTimeout on the repository builder.
+</p>
+<p>
+Deadlocks may be caused by:
+</p>
+<p>
+<ol>
+<li>Application lock acquisition order</li>
+<li>BDB page split</li>
+<li>Index update</li>
+</ol>
+</p>
+<p>
+In the first case, applications usually cause deadlock situations by updating a
+record within the same transaction that previously read it. This causes the
+read lock to be upgraded to a write lock, which is inherently deadlock
+prone. To prevent this problem, switch the transaction to update mode. This
+causes all acquired read locks to be upgradable, usually by acquiring a write
+lock from the start.
+</p>
+<p>
+Another cause of this deadlock is when you iterate over a cursor, updating
+entries as you go. To resolve this, either copy the cursor entries to a list
+first, or operate within a transaction which is in update mode.
+</p>
+<p>
+The second case, BDB page split, is a problem in the regular BDB product.
+It is not a problem with BDB-JE. When inserting records into a BDB,
+it may need to rebalance the b-tree structure. It does this by splitting a leaf
+node and updating the parent node. To update the parent node, a write lock must
+be acquired but another thread might have a read lock on it while trying to
+lock the leaf node being split.
+</p>
+<p>
+There is no good solution to the BDB page split deadlock. Instead, your
+application must be coded to catch deadlocks are retry transactions.  They are
+more common when filling up a new BDB.
+</p>
+<p>
+The third case, index updates, is caused by updating a record while another
+thread is using the index for finding the record. Carbonado's indexing strategy
+can be coded to defer index updates when this happens, but it currently does
+not. In the meantime, there is no general solution.
+</p>
+<p>
+Lock timeouts (or locks not granted) may be caused by:
+</p>
+<p>
+<ol>
+<li>Failing to exit all transactions</li>
+<li>Open cursors with REPEATABLE_READ isolation</li>
+<li>Heavy concurrency</li>
+</ol>
+</p>
+<p>
+If any transactions are left open, then any locks it acquired don't get
+released. Over time the database lock table fills up. When using BDB, the
+"db_stat -c" command can show information on the lock table. Running
+"db_recover" can clear any stuck locks. To avoid this problem, always run
+transactions within a try-finally statement and exit the transaction in the
+finally section.
+</p>
+<p>
+By default, BDB transactions have REPEATABLE_READ isolation level. This means
+that all locks acquired when iterating cursors within the transaction are not
+released until the transaction exits. This can cause the lock table to fill
+up. To work around this, enter the transaction with an explicit isolation level
+of READ_COMMITTED which releases read locks sooner.
+</p>
+<p>
+Applications that have a high number of active threads can cause general lock
+contention. BDB-JE uses row-level locks, and so lock timeouts caused by
+contention are infrequent. The regular BDB product uses page-level locks, thus
+increasing the likelyhood of lock contention.
+</p>
+     </answer>
+   </faq>
+
+   <faq id="subselect">
+     <question>How do I perform a subselect?</question>
+     <answer>
+<p>
+Carbonado query filters do not support subselects, athough it can be
+emulated. Suppose the query you wish to execute looks something like this in
+SQL:
+</p>
+<p><pre>
+select * from foo where foo.name in (select name from bar where ...)
+</pre></p>
+<p>
+This can be emulated by querying bar, and for each result, fetching foo.
+</p>
+<p><pre>
+// Note that the query is ordered by name.
+Cursor&lt;Bar&gt; barCursor = barStorage.query(...).orderBy("name").fetch();
+String lastNameSeen = null;
+while (barCursor.hasNext()) {
+    Bar bar = barCursor.next();
+    if (lastNameSeen != null &amp;&amp; lastNameSeen.equals(bar.getName()) {
+        continue;
+    }
+    lastNameSeen = bar.getName();
+    Foo foo = fooStorage.query("name = ?").with(lastNameSeen).tryLoadOne();
+    if (foo != null) {
+        // Got a result, do something with it.
+        ...
+    }
+}
+</pre></p>
+<p>
+For best performance, you might want to make sure that Foo has an index on its name property.
+</p>
+<p>
+You may track the feature request <a href="http://sourceforge.net/tracker/index.php?func=detail&amp;aid=1578197&amp;group_id=171277&amp;atid=857357">here</a>.
+</p>
+    </answer>
+   </faq>
+
+   <faq id="table-generation">
+     <question>Does Carbonado support generating SQL tables from Storables?</question>
+     <answer>
+<p>
+No, it does not. Carbonado instead requires that your Storable definition
+matches a table, if using the JDBC repository. When using a repository that has
+no concept of tables, like the BDB repositories, the Storable is the canonical
+definition. In that case, changes to the Storable effectively change the
+"table". In addition, properties can be added and removed, and older records
+can still be read.
+</p>
+<p>
+Although it is technically feasible for Carbonado to support generating SQL
+tables, Storable definitions are not expressive enough to cover all the
+features that can go into a table. For example, you cannot currently define a
+foreign key constraint in Carbonado.
+</p>
+     </answer>
+   </faq>
+
+   <faq id="isnull">
+     <question>How do I query for "IS NULL"?</question>
+     <answer>
+<p>
+Carbonado treats nulls as ordinary values wherever possible, so nothing special
+needs to be done. That is, just search for null like any other value. The query
+call might look like:
+</p>
+<p><pre>
+Query&lt;MyType&gt; query = storage.query("value = ?").with(null);
+Cursor&lt;MyType&gt; = query.fetch();
+...
+</pre></p>
+<p>
+When using the JDBC repository, the generated SQL will contain the "IS NULL"
+phrase in the WHERE clause.
+</p>
+     </answer>
+   </faq>
+
+   <faq id="sql-debugging">
+     <question>How do I see generated SQL?</question>
+     <answer>
+       <p>
+To see the SQL statements generated by the JDBC repository, you can install a
+JDBC DataSource that logs all activity. Provided in the JDBC repository package
+is the LoggingDataSource class, which does this. As a convenience, it can be
+installed simply by calling setDataSourceLogging(true) on the
+JDBCRepositoryBuilder.
+       </p>
+     </answer>
+   </faq>
+
+   <faq id="jdbc-indexes">
+     <question>What happens if JDBC repository cannot get index info?</question>
+     <answer>
+       <p>
+The JDBC repository checks if the Storable alternate keys match those defined
+in the database. To do this, it tries to get the index info. If the user
+account does not have permissions, a message is logged and this check is
+skipped. This should not cause any harm, unless the alternate keys don't
+match. This can cause unexpected errors when using the replicated repository.
+       </p>
+     </answer>
+   </faq>
+
+   <faq id="mysql-increment">
+     <question>How do I use MySQL auto-increment columns?</question>
+     <answer>
+       <p>
+As of 2006-10-23, Carbonado MySQL support is very thin. The @Sequence
+annotation is intended to be used for mapping to auto-increment columns, if the
+database does not support proper sequences. Until support is added,
+auto-increment columns will not work.
+       </p>
+     </answer>
+   </faq>
+
+   <faq id="unique">
+     <question>Can I do the equivalent of a "unique" constraint?</question>
+     <answer>
+       <p>
+The @AlternateKeys annotation is provided specifically for this purpose. Both
+@PrimaryKey and @AlternateKeys define unique indexes. The only real difference
+is that there can be only one primary, but many alternate keys are allowed.
+       </p>
+     </answer>
+   </faq>
+
+   <faq id="join-cache">
+     <question>How does one manually flush the Carbonado join cache?</question>
+     <answer>
+
+       <p>
+The Carbonado join cache is a lazy-read cache, local to a Storable instance. It
+is not a global write-through cache, and so no flushing is necessary.
+</p>
+<p>
+The first time a join property has been accessed, a reference is saved in the
+master Storable instance. This optimization makes the most sense when filtering
+based on a join property. The query loads the join property, and you'll likely
+want it too. This avoids a double load.
+       </p>
+
+     </answer>
+   </faq>
+
+   <faq id="evolution">
+     <question>How can schemas evolve?</question>
+     <answer>
+       <p>
+Independent repositories, like BDB support automatic schema evolution. You may
+freely add or remove non-primary key properties and still load older
+storables. Changes to primary key properties is not supported, since they
+define a clustered index. Also, property data types cannot be changed except if
+a boxed property is changed to a non-boxed primitive and vice versa.
+</p>
+<p>
+Every storable persisted by Carbonado in BDB starts with a layout version,
+which defines the set of properties encoded. Carbonado separately persists the
+mapping from layout version to property set, such that when it decodes a
+storable it knows what properties to expect.
+</p>
+<p>
+When adding or removing properties, existing persisted storables are not
+immediately modified. If you remove a property and add it back, you can recover
+the property value still encoded in the existing storables. Property values are
+not fully removed from an existing storable instance until it is explicitly
+updated. At this time, the layout version used is the current one, and the
+removed property values are lost.
+</p>
+<p>
+When loading a storable which does not have a newly added property, the
+property value is either null, 0, or false, depending on the data type. You can
+call the isPropertyUninitialized method on the storable to determine if this
+default property value is real or not.
+</p>
+<p>
+In order to change a property type to something that cannot be automatically
+converted, the change must be performed in phases. First, define a new
+property, with a different name. Then load all the existing storables and
+update them, setting the new property value. Next, remove the old property. To
+potentially free up storage you can update all the storables again. If you wish
+the newly added property to retain the original name, follow these steps again
+in a similar fashion to change it.
+</p>
+     </answer>
+   </faq>
+
+   <faq id="iterate-all">
+     <question>How do I iterate over all storable types in a repository?</question>
+     <answer>
+       <p>
+Given a repository and an appropriately set classpath, can we iterate through
+all the various storables held in the repository without actually knowing what
+the repository might hold in advance?
+</p>
+<p>
+Repositories that implement StorableInfoCapability provide this
+functionality. The reason its a capability is that some repos (JDBC) don't have
+a registry of storables. BDB based ones do, and so this capability works.
+</p>
+     </answer>
+   </faq>
+
+ </part>
+</faqs>
diff --git a/src/site/site.xml b/src/site/site.xml
index 21008db..d924834 100644
--- a/src/site/site.xml
+++ b/src/site/site.xml
@@ -20,6 +20,7 @@
       <item name="Subversion" href="http://svn.sourceforge.net/viewvc/carbonado/trunk/Carbonado/"/>
       <item name="Sourceforge" href="http://sourceforge.net/projects/carbonado/"/>
       <item name="User Guide (pdf)" href="docs/CarbonadoGuide.pdf"/>
+      <item name="FAQ" href="technical-faq.html"/>
       <item name="License" href="license.html"/>
       <item name="Trademark Policy" href="trademark.html"/>
     </menu>
author	Brian S. O'Neill <bronee@gmail.com>	2006-10-24 05:06:11 +0000
committer	Brian S. O'Neill <bronee@gmail.com>	2006-10-24 05:06:11 +0000
commit	c52e4c055a9fa0f59a1ca19538d88377b51de44b (patch)
tree	d29293b8a1e6cd491ae0c5d74eedba1885386705
parent	79347bcbe76acba5e4b9d94ce25d2d445fe7670f (diff)