What happens when a Carbonado process is killed while in the middle of a transaction?

Carbonado uses shutdown hooks to make sure that all in progress transactions are properly rolled back. If you hard-kill a process (kill -9), then the shutdown won't get run. This can cause a problem when using BDB, and db_recover must be run to prevent future data corruption. BDB-JE is not affected, however, as it automatically runs recovery upon restart.

How do I bootstrap a replicated repository?

By running a resync operation programmatically. The ReplicatedRepository has a ResyncCapability which has a single method named "resync". It accepts a Storable class type, a throttle parameter, and an optional filter. Consult the Javadocs for more info.

In your application you might find it convenient to add an administrative command to invoke the resync operation. This makes it easy to repair any inconsistencies that might arise over time.

I sometimes see lock timeout errors and deadlocks, what is going on?

A lock timeout may be caused by an operation that for whatever reason took too long, or it may also indicate a deadlock. By default, Carbonado uses a lock timeout of 0.5 seconds for BDB based repositories. It can be changed by calling setLockTimeout on the repository builder.

Deadlocks may be caused by:

  1. Application lock acquisition order
  2. BDB page split
  3. Index update

In the first case, applications usually cause deadlock situations by updating a record within the same transaction that previously read it. This causes the read lock to be upgraded to a write lock, which is inherently deadlock prone. To prevent this problem, switch the transaction to update mode. This causes all acquired read locks to be upgradable, usually by acquiring a write lock from the start.

Another cause of this deadlock is when you iterate over a cursor, updating entries as you go. To resolve this, either copy the cursor entries to a list first, or operate within a transaction which is in update mode.

The second case, BDB page split, is a problem in the regular BDB product. It is not a problem with BDB-JE. When inserting records into a BDB, it may need to rebalance the b-tree structure. It does this by splitting a leaf node and updating the parent node. To update the parent node, a write lock must be acquired but another thread might have a read lock on it while trying to lock the leaf node being split.

There is no good solution to the BDB page split deadlock. Instead, your application must be coded to catch deadlocks are retry transactions. They are more common when filling up a new BDB.

The third case, index updates, is caused by updating a record while another thread is using the index for finding the record. Carbonado's indexing strategy can be coded to defer index updates when this happens, but it currently does not. In the meantime, there is no general solution.

Lock timeouts (or locks not granted) may be caused by:

  1. Failing to exit all transactions
  2. Open cursors with REPEATABLE_READ isolation
  3. Heavy concurrency

If any transactions are left open, then any locks it acquired don't get released. Over time the database lock table fills up. When using BDB, the "db_stat -c" command can show information on the lock table. Running "db_recover" can clear any stuck locks. To avoid this problem, always run transactions within a try-finally statement and exit the transaction in the finally section.

By default, BDB transactions have REPEATABLE_READ isolation level. This means that all locks acquired when iterating cursors within the transaction are not released until the transaction exits. This can cause the lock table to fill up. To work around this, enter the transaction with an explicit isolation level of READ_COMMITTED which releases read locks sooner.

Applications that have a high number of active threads can cause general lock contention. BDB-JE uses row-level locks, and so lock timeouts caused by contention are infrequent. The regular BDB product uses page-level locks, thus increasing the likelyhood of lock contention.

How do I perform a subselect?

Carbonado query filters do not support subselects, athough it can be emulated. Suppose the query you wish to execute looks something like this in SQL:

select * from foo where foo.name in (select name from bar where ...)

This can be emulated by querying bar, and for each result, fetching foo.

// Note that the query is ordered by name.
Cursor<Bar> barCursor = barStorage.query(...).orderBy("name").fetch();
String lastNameSeen = null;
while (barCursor.hasNext()) {
    Bar bar = barCursor.next();
    if (lastNameSeen != null && lastNameSeen.equals(bar.getName()) {
        continue;
    }
    lastNameSeen = bar.getName();
    Foo foo = fooStorage.query("name = ?").with(lastNameSeen).tryLoadOne();
    if (foo != null) {
        // Got a result, do something with it.
        ...
    }
}

For best performance, you might want to make sure that Foo has an index on its name property.

You may track the feature request here.

Does Carbonado support generating SQL tables from Storables?

No, it does not. Carbonado instead requires that your Storable definition matches a table, if using the JDBC repository. When using a repository that has no concept of tables, like the BDB repositories, the Storable is the canonical definition. In that case, changes to the Storable effectively change the "table". In addition, properties can be added and removed, and older records can still be read.

Although it is technically feasible for Carbonado to support generating SQL tables, Storable definitions are not expressive enough to cover all the features that can go into a table. For example, you cannot currently define a foreign key constraint in Carbonado.

How do I query for "IS NULL"?

Carbonado treats nulls as ordinary values wherever possible, so nothing special needs to be done. That is, just search for null like any other value. The query call might look like:

Query<MyType> query = storage.query("value = ?").with(null);
Cursor<MyType> = query.fetch();
...

When using the JDBC repository, the generated SQL will contain the "IS NULL" phrase in the WHERE clause.

How do I see generated SQL?

To see the SQL statements generated by the JDBC repository, you can install a JDBC DataSource that logs all activity. Provided in the JDBC repository package is the LoggingDataSource class, which does this. As a convenience, it can be installed simply by calling setDataSourceLogging(true) on the JDBCRepositoryBuilder.

Alternatively, you can call Query.printNative(), which by default prints the native query to standard out. When using the JDBC repository, this will print the SQL statement.

How do I use MySQL auto-increment columns?

Carbonado version 1.1 has thin support for MySQL. Version 1.2 (in the 1.2-dev branch) supports an @Automatic annotation for supporting MySQL auto-increment columns.

Can I do the equivalent of a "unique" constraint?

The @AlternateKeys annotation is provided specifically for this purpose. Both @PrimaryKey and @AlternateKeys define unique indexes. The only real difference is that there can be only one primary, but many alternate keys are allowed.

What kind of caching does Carbonado provide?

Carbonado does not require repository implementations to perform any caching. If you're using just the JDBC repository, there's no cache. A general purpose caching repository was in development, but it was shelved because there was no immediate need for it. The replicated repository however, can be considered to be a complete cache.

The only built in caching is for join properties on Storable instances. It just lazily sets the join result to an internal field of the Storable instance. The join property value is not shared with other Storable instances.

How does one manually flush the Carbonado join cache?

The Carbonado join cache is a lazy-read cache, local to a Storable instance. It is not a global write-through cache, and so no flushing is necessary.

The first time a join property has been accessed, a reference is saved in the master Storable instance. This optimization makes the most sense when filtering based on a join property. The query loads the join property, and you'll likely want it too. This avoids a double load.

How can schemas evolve?

Independent repositories like BDB support automatic schema evolution. You may freely add or remove non-primary key properties and still load older storables. Changes to primary key properties is not supported, since they define a clustered index. Also, property data types cannot be changed except if a boxed property is changed to a non-boxed primitive and vice versa.

Every storable persisted by Carbonado in BDB starts with a layout version, which defines the set of properties encoded. Carbonado separately persists the mapping from layout version to property set, such that when it decodes a storable it knows what properties to expect.

When adding or removing properties, existing persisted storables are not immediately modified. If you remove a property and add it back, you can recover the property value still encoded in the existing storables. Property values are not fully removed from an existing storable instance until it is explicitly updated. At this time, the layout version used is the current one, and the removed property values are lost.

When loading a storable which does not have a newly added property, the property value is either null, 0, or false, depending on the data type. You can call the isPropertyUninitialized method on the storable to determine if this default property value is real or not.

In order to change a property type to something that cannot be automatically converted, the change must be performed in phases. First, define a new property, with a different name. Then load all the existing storables and update them, setting the new property value. Next, remove the old property. To potentially free up storage you can update all the storables again. If you wish the newly added property to retain the original name, follow these steps again in a similar fashion to change it.

How do I iterate over all storable types in a repository?

Given a repository and an appropriately set classpath, can we iterate through all the various storables held in the repository without actually knowing what the repository might hold in advance?

Repositories that implement StorableInfoCapability provide this functionality. The reason it's a capability is that some repos (JDBC) don't have a registry of storables. BDB based ones do, and so this capability works for that.

Are explicit transactions required to ensure index integrity?

The short answer is no -- index integrity is ensured automatically. More details follow:

When using the JDBC repository, it is up to the database vendor to ensure that insert/update/delete operations include index updates within an implicit auto-commit transaction. All the major database vendors do this properly already, so nothing special needs to be done here.

When using a BDB backed repository, it is up to Carbonado to ensure implicit transactions are used. Carbonado sets up BDB to be in transaction mode, and there's no Carbonado level config to disable this. So you're always using BDB with transactions, and that is good. When you do a lone Carbonado insert/update/delete operation, it will pass null to BDB for the transaction object, which implies auto-commit. BDB will automatically enter a tiny transaction to protect that little change.

If the Storable you're updating has any indexes on it, a Carbonado trigger is installed that updates the affected indexes when you do an insert/update/delete. The presence of the trigger changes how the auto-generated Storable behaves. The insert/update/delete operation enters a transaction automatically, and it doesn't commit until all triggers have run. Index updates are therefore guarded by transactions, even if you don't explicitly specify one. In addition, all changes made by your own triggers are always guarded by a transaction.

How do I delete Storables returned by a Cursor without deadlocks?

The cursor iteration and delete operations must be enclosed in the same transaction. Auto-commit delete while iterating over a cursor fails for some databases, BDB and BDB-JE in particular. Although BDB supports a delete operation on the cursor itself, the transaction requirement remains.

A workaround exists when using BDB-JE, which works only due to its use of record-level locks. Calling Cursor.hasNext() forces the cursor to move past the current record, releasing the lock on the record to be deleted. BDB native uses page locks, so this trick will only work in the occasional case that the next record is on another page.

The BDB-JE cursor implementation could be changed to automatically move to the next record, but this reduces portability. Also, the cursor should not move past the current record automatically if in a transaction. It would allow another thread to sneak in and modify the record. An isolation level of repeatable read would be required to keep the lock.