From c52e4c055a9fa0f59a1ca19538d88377b51de44b Mon Sep 17 00:00:00 2001 From: "Brian S. O'Neill" Date: Tue, 24 Oct 2006 05:06:11 +0000 Subject: Added technical FAQ. --- src/site/fml/technical-faq.fml | 326 +++++++++++++++++++++++++++++++++++++++++ src/site/site.xml | 1 + 2 files changed, 327 insertions(+) create mode 100644 src/site/fml/technical-faq.fml diff --git a/src/site/fml/technical-faq.fml b/src/site/fml/technical-faq.fml new file mode 100644 index 0000000..74aee55 --- /dev/null +++ b/src/site/fml/technical-faq.fml @@ -0,0 +1,326 @@ + + + + + + What happens when a Carbonado process is killed while in the middle of a transaction? + +

+Carbonado uses shutdown hooks to make sure that all in progress transactions +are properly rolled back. If you hard-kill a process (kill -9), then the +shutdown won't get run. This can cause a problem when using BDB. In either +case, db_recover must be run to prevent future data corruption. BDB-JE is not +affected, however, as it automatically runs recovery upon restart. +

+
+
+ + + How do I bootstrap a replicated repository? + +

+By running a resync operation programmatically. The ReplicatedRepository has a +ResyncCapability which has a single method named "resync". It accepts a +Storable class type, a throttle parameter, and an optional filter. Consult the +Javadocs for more info. +

+

+In your application you might find it convenient to add an administrative +command to invoke the resync operation. This makes it easy to repair any +inconsistencies that might arise over time. +

+
+
+ + + I sometimes see lock timeout errors and deadlocks, what is going on? + +

+A lock timeout may be caused by an operation that for whatever reason took too +long, or it may also indicate a deadlock. By default, Carbonado uses a lock +timeout of 0.5 seconds for BDB based repositories. It can be changed by calling +setLockTimeout on the repository builder. +

+

+Deadlocks may be caused by: +

+

+

    +
  1. Application lock acquisition order
  2. +
  3. BDB page split
  4. +
  5. Index update
  6. +
+

+

+In the first case, applications usually cause deadlock situations by updating a +record within the same transaction that previously read it. This causes the +read lock to be upgraded to a write lock, which is inherently deadlock +prone. To prevent this problem, switch the transaction to update mode. This +causes all acquired read locks to be upgradable, usually by acquiring a write +lock from the start. +

+

+Another cause of this deadlock is when you iterate over a cursor, updating +entries as you go. To resolve this, either copy the cursor entries to a list +first, or operate within a transaction which is in update mode. +

+

+The second case, BDB page split, is a problem in the regular BDB product. +It is not a problem with BDB-JE. When inserting records into a BDB, +it may need to rebalance the b-tree structure. It does this by splitting a leaf +node and updating the parent node. To update the parent node, a write lock must +be acquired but another thread might have a read lock on it while trying to +lock the leaf node being split. +

+

+There is no good solution to the BDB page split deadlock. Instead, your +application must be coded to catch deadlocks are retry transactions. They are +more common when filling up a new BDB. +

+

+The third case, index updates, is caused by updating a record while another +thread is using the index for finding the record. Carbonado's indexing strategy +can be coded to defer index updates when this happens, but it currently does +not. In the meantime, there is no general solution. +

+

+Lock timeouts (or locks not granted) may be caused by: +

+

+

    +
  1. Failing to exit all transactions
  2. +
  3. Open cursors with REPEATABLE_READ isolation
  4. +
  5. Heavy concurrency
  6. +
+

+

+If any transactions are left open, then any locks it acquired don't get +released. Over time the database lock table fills up. When using BDB, the +"db_stat -c" command can show information on the lock table. Running +"db_recover" can clear any stuck locks. To avoid this problem, always run +transactions within a try-finally statement and exit the transaction in the +finally section. +

+

+By default, BDB transactions have REPEATABLE_READ isolation level. This means +that all locks acquired when iterating cursors within the transaction are not +released until the transaction exits. This can cause the lock table to fill +up. To work around this, enter the transaction with an explicit isolation level +of READ_COMMITTED which releases read locks sooner. +

+

+Applications that have a high number of active threads can cause general lock +contention. BDB-JE uses row-level locks, and so lock timeouts caused by +contention are infrequent. The regular BDB product uses page-level locks, thus +increasing the likelyhood of lock contention. +

+
+
+ + + How do I perform a subselect? + +

+Carbonado query filters do not support subselects, athough it can be +emulated. Suppose the query you wish to execute looks something like this in +SQL: +

+

+select * from foo where foo.name in (select name from bar where ...)
+

+

+This can be emulated by querying bar, and for each result, fetching foo. +

+

+// Note that the query is ordered by name.
+Cursor<Bar> barCursor = barStorage.query(...).orderBy("name").fetch();
+String lastNameSeen = null;
+while (barCursor.hasNext()) {
+    Bar bar = barCursor.next();
+    if (lastNameSeen != null && lastNameSeen.equals(bar.getName()) {
+        continue;
+    }
+    lastNameSeen = bar.getName();
+    Foo foo = fooStorage.query("name = ?").with(lastNameSeen).tryLoadOne();
+    if (foo != null) {
+        // Got a result, do something with it.
+        ...
+    }
+}
+

+

+For best performance, you might want to make sure that Foo has an index on its name property. +

+

+You may track the feature request here. +

+
+
+ + + Does Carbonado support generating SQL tables from Storables? + +

+No, it does not. Carbonado instead requires that your Storable definition +matches a table, if using the JDBC repository. When using a repository that has +no concept of tables, like the BDB repositories, the Storable is the canonical +definition. In that case, changes to the Storable effectively change the +"table". In addition, properties can be added and removed, and older records +can still be read. +

+

+Although it is technically feasible for Carbonado to support generating SQL +tables, Storable definitions are not expressive enough to cover all the +features that can go into a table. For example, you cannot currently define a +foreign key constraint in Carbonado. +

+
+
+ + + How do I query for "IS NULL"? + +

+Carbonado treats nulls as ordinary values wherever possible, so nothing special +needs to be done. That is, just search for null like any other value. The query +call might look like: +

+

+Query<MyType> query = storage.query("value = ?").with(null);
+Cursor<MyType> = query.fetch();
+...
+

+

+When using the JDBC repository, the generated SQL will contain the "IS NULL" +phrase in the WHERE clause. +

+
+
+ + + How do I see generated SQL? + +

+To see the SQL statements generated by the JDBC repository, you can install a +JDBC DataSource that logs all activity. Provided in the JDBC repository package +is the LoggingDataSource class, which does this. As a convenience, it can be +installed simply by calling setDataSourceLogging(true) on the +JDBCRepositoryBuilder. +

+
+
+ + + What happens if JDBC repository cannot get index info? + +

+The JDBC repository checks if the Storable alternate keys match those defined +in the database. To do this, it tries to get the index info. If the user +account does not have permissions, a message is logged and this check is +skipped. This should not cause any harm, unless the alternate keys don't +match. This can cause unexpected errors when using the replicated repository. +

+
+
+ + + How do I use MySQL auto-increment columns? + +

+As of 2006-10-23, Carbonado MySQL support is very thin. The @Sequence +annotation is intended to be used for mapping to auto-increment columns, if the +database does not support proper sequences. Until support is added, +auto-increment columns will not work. +

+
+
+ + + Can I do the equivalent of a "unique" constraint? + +

+The @AlternateKeys annotation is provided specifically for this purpose. Both +@PrimaryKey and @AlternateKeys define unique indexes. The only real difference +is that there can be only one primary, but many alternate keys are allowed. +

+
+
+ + + How does one manually flush the Carbonado join cache? + + +

+The Carbonado join cache is a lazy-read cache, local to a Storable instance. It +is not a global write-through cache, and so no flushing is necessary. +

+

+The first time a join property has been accessed, a reference is saved in the +master Storable instance. This optimization makes the most sense when filtering +based on a join property. The query loads the join property, and you'll likely +want it too. This avoids a double load. +

+ +
+
+ + + How can schemas evolve? + +

+Independent repositories, like BDB support automatic schema evolution. You may +freely add or remove non-primary key properties and still load older +storables. Changes to primary key properties is not supported, since they +define a clustered index. Also, property data types cannot be changed except if +a boxed property is changed to a non-boxed primitive and vice versa. +

+

+Every storable persisted by Carbonado in BDB starts with a layout version, +which defines the set of properties encoded. Carbonado separately persists the +mapping from layout version to property set, such that when it decodes a +storable it knows what properties to expect. +

+

+When adding or removing properties, existing persisted storables are not +immediately modified. If you remove a property and add it back, you can recover +the property value still encoded in the existing storables. Property values are +not fully removed from an existing storable instance until it is explicitly +updated. At this time, the layout version used is the current one, and the +removed property values are lost. +

+

+When loading a storable which does not have a newly added property, the +property value is either null, 0, or false, depending on the data type. You can +call the isPropertyUninitialized method on the storable to determine if this +default property value is real or not. +

+

+In order to change a property type to something that cannot be automatically +converted, the change must be performed in phases. First, define a new +property, with a different name. Then load all the existing storables and +update them, setting the new property value. Next, remove the old property. To +potentially free up storage you can update all the storables again. If you wish +the newly added property to retain the original name, follow these steps again +in a similar fashion to change it. +

+
+
+ + + How do I iterate over all storable types in a repository? + +

+Given a repository and an appropriately set classpath, can we iterate through +all the various storables held in the repository without actually knowing what +the repository might hold in advance? +

+

+Repositories that implement StorableInfoCapability provide this +functionality. The reason its a capability is that some repos (JDBC) don't have +a registry of storables. BDB based ones do, and so this capability works. +

+
+
+ +
+
diff --git a/src/site/site.xml b/src/site/site.xml index 21008db..d924834 100644 --- a/src/site/site.xml +++ b/src/site/site.xml @@ -20,6 +20,7 @@ + -- cgit v1.2.3