What happens when a Carbonado process is killed while in the middle of a transaction?

Carbonado uses shutdown hooks to make sure that all in progress transactions are properly rolled back. If you hard-kill a process (kill -9), then the shutdown won't get run. This can cause a problem when using BDB, and db_recover must be run to prevent future data corruption. BDB-JE is not affected, however, as it automatically runs recovery upon restart.

How do I bootstrap a replicated repository?

By running a resync operation programmatically. The ReplicatedRepository has a ResyncCapability which has a single method named "resync". It accepts a Storable class type, a throttle parameter, and an optional filter. Consult the Javadocs for more info.

In your application you might find it convenient to add an administrative command to invoke the resync operation. This makes it easy to repair any inconsistencies that might arise over time.

I sometimes see lock timeout errors and deadlocks, what is going on?

A lock timeout may be caused by an operation that for whatever reason took too long, or it may also indicate a deadlock. By default, Carbonado uses a lock timeout of 0.5 seconds for BDB based repositories. It can be changed by calling setLockTimeout on the repository builder.

Deadlocks may be caused by:

  1. Application lock acquisition order
  2. BDB page split
  3. Index update

In the first case, applications usually cause deadlock situations by updating a record within the same transaction that previously read it. This causes the read lock to be upgraded to a write lock, which is inherently deadlock prone. To prevent this problem, switch the transaction to update mode. This causes all acquired read locks to be upgradable, usually by acquiring a write lock from the start.

Another cause of this deadlock is when you iterate over a cursor, updating entries as you go. To resolve this, either copy the cursor entries to a list first, or operate within a transaction which is in update mode.

The second case, BDB page split, is a problem in the regular BDB product. It is not a problem with BDB-JE. When inserting records into a BDB, it may need to rebalance the b-tree structure. It does this by splitting a leaf node and updating the parent node. To update the parent node, a write lock must be acquired but another thread might have a read lock on it while trying to lock the leaf node being split.

There is no good solution to the BDB page split deadlock. Instead, your application must be coded to catch deadlocks are retry transactions. They are more common when filling up a new BDB.

The third case, index updates, is caused by updating a record while another thread is using the index for finding the record. Carbonado's indexing strategy can be coded to defer index updates when this happens, but it currently does not. In the meantime, there is no general solution.

Lock timeouts (or locks not granted) may be caused by:

  1. Failing to exit all transactions
  2. Open cursors with REPEATABLE_READ isolation
  3. Heavy concurrency

If any transactions are left open, then any locks it acquired don't get released. Over time the database lock table fills up. When using BDB, the "db_stat -c" command can show information on the lock table. Running "db_recover" can clear any stuck locks. To avoid this problem, always run transactions within a try-finally statement and exit the transaction in the finally section.

By default, BDB transactions have REPEATABLE_READ isolation level. This means that all locks acquired when iterating cursors within the transaction are not released until the transaction exits. This can cause the lock table to fill up. To work around this, enter the transaction with an explicit isolation level of READ_COMMITTED which releases read locks sooner.

Applications that have a high number of active threads can cause general lock contention. BDB-JE uses row-level locks, and so lock timeouts caused by contention are infrequent. The regular BDB product uses page-level locks, thus increasing the likelyhood of lock contention.

How do I perform a subselect?

Carbonado query filters do not support subselects, athough it can be emulated. Suppose the query you wish to execute looks something like this in SQL:

select * from foo where foo.name in (select name from bar where ...)

This can be emulated by querying bar, and for each result, fetching foo.

// Note that the query is ordered by name.
Cursor<Bar> barCursor = barStorage.query(...).orderBy("name").fetch();
String lastNameSeen = null;
while (barCursor.hasNext()) {
    Bar bar = barCursor.next();
    if (lastNameSeen != null && lastNameSeen.equals(bar.getName()) {
        continue;
    }
    lastNameSeen = bar.getName();
    Foo foo = fooStorage.query("name = ?").with(lastNameSeen).tryLoadOne();
    if (foo != null) {
        // Got a result, do something with it.
        ...
    }
}

For best performance, you might want to make sure that Foo has an index on its name property.

You may track the feature request here.

Does Carbonado support generating SQL tables from Storables?

No, it does not. Carbonado instead requires that your Storable definition matches a table, if using the JDBC repository. When using a repository that has no concept of tables, like the BDB repositories, the Storable is the canonical definition. In that case, changes to the Storable effectively change the "table". In addition, properties can be added and removed, and older records can still be read.

Although it is technically feasible for Carbonado to support generating SQL tables, Storable definitions are not expressive enough to cover all the features that can go into a table. For example, you cannot currently define a foreign key constraint in Carbonado.

How do I query for "IS NULL"?

Carbonado treats nulls as ordinary values wherever possible, so nothing special needs to be done. That is, just search for null like any other value. The query call might look like:

Query<MyType> query = storage.query("value = ?").with(null);
Cursor<MyType> = query.fetch();
...

When using the JDBC repository, the generated SQL will contain the "IS NULL" phrase in the WHERE clause.

How do I see generated SQL?

To see the SQL statements generated by the JDBC repository, you can install a JDBC DataSource that logs all activity. Provided in the JDBC repository package is the LoggingDataSource class, which does this. As a convenience, it can be installed simply by calling setDataSourceLogging(true) on the JDBCRepositoryBuilder.

What happens if JDBC repository cannot get index info?

The JDBC repository checks if the Storable alternate keys match those defined in the database. To do this, it tries to get the index info. If the user account does not have permissions, a message is logged and this check is skipped. This should not cause any harm, unless the alternate keys don't match. This can cause unexpected errors when using the replicated repository.

How do I use MySQL auto-increment columns?

As of 2006-10-23, Carbonado MySQL support is very thin. The @Sequence annotation is intended to be used for mapping to auto-increment columns, if the database does not support proper sequences. Until support is added, auto-increment columns will not work.

Can I do the equivalent of a "unique" constraint?

The @AlternateKeys annotation is provided specifically for this purpose. Both @PrimaryKey and @AlternateKeys define unique indexes. The only real difference is that there can be only one primary, but many alternate keys are allowed.

How does one manually flush the Carbonado join cache?

The Carbonado join cache is a lazy-read cache, local to a Storable instance. It is not a global write-through cache, and so no flushing is necessary.

The first time a join property has been accessed, a reference is saved in the master Storable instance. This optimization makes the most sense when filtering based on a join property. The query loads the join property, and you'll likely want it too. This avoids a double load.

How can schemas evolve?

Independent repositories like BDB support automatic schema evolution. You may freely add or remove non-primary key properties and still load older storables. Changes to primary key properties is not supported, since they define a clustered index. Also, property data types cannot be changed except if a boxed property is changed to a non-boxed primitive and vice versa.

Every storable persisted by Carbonado in BDB starts with a layout version, which defines the set of properties encoded. Carbonado separately persists the mapping from layout version to property set, such that when it decodes a storable it knows what properties to expect.

When adding or removing properties, existing persisted storables are not immediately modified. If you remove a property and add it back, you can recover the property value still encoded in the existing storables. Property values are not fully removed from an existing storable instance until it is explicitly updated. At this time, the layout version used is the current one, and the removed property values are lost.

When loading a storable which does not have a newly added property, the property value is either null, 0, or false, depending on the data type. You can call the isPropertyUninitialized method on the storable to determine if this default property value is real or not.

In order to change a property type to something that cannot be automatically converted, the change must be performed in phases. First, define a new property, with a different name. Then load all the existing storables and update them, setting the new property value. Next, remove the old property. To potentially free up storage you can update all the storables again. If you wish the newly added property to retain the original name, follow these steps again in a similar fashion to change it.

How do I iterate over all storable types in a repository?

Given a repository and an appropriately set classpath, can we iterate through all the various storables held in the repository without actually knowing what the repository might hold in advance?

Repositories that implement StorableInfoCapability provide this functionality. The reason its a capability is that some repos (JDBC) don't have a registry of storables. BDB based ones do, and so this capability works.