summaryrefslogtreecommitdiff
path: root/src/site/fml/technical-faq.fml
blob: 4d33328ef2115ae720ad6d1d7b4db85e8248f9bc (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
<?xml version="1.0"?>
<faqs id="FAQ" title="Frequently Asked Technical Questions">
 <part id="General">

   <faq id="process-killed">
     <question>What happens when a Carbonado process is killed while in the middle of a transaction?</question>
     <answer>
       <p>
Carbonado uses shutdown hooks to make sure that all in progress transactions
are properly rolled back. If you hard-kill a process (kill -9), then the
shutdown won't get run. This can cause a problem when using BDB, and
db_recover must be run to prevent future data corruption. BDB-JE is not
affected, however, as it automatically runs recovery upon restart.
       </p>
     </answer>
   </faq>

   <faq id="replicated-bootstrap">
     <question> How do I bootstrap a replicated repository?</question>
     <answer>
<p>
By running a resync operation programmatically. The ReplicatedRepository has a
ResyncCapability which has a single method named "resync". It accepts a
Storable class type, a throttle parameter, and an optional filter. Consult the
<a href="http://carbonado.sourceforge.net/apidocs/com/amazon/carbonado/capability/ResyncCapability.html">Javadocs</a> for more info.
</p>
<p>
In your application you might find it convenient to add an administrative
command to invoke the resync operation. This makes it easy to repair any
inconsistencies that might arise over time.
</p>
     </answer>
   </faq>

   <faq id="deadlock">
     <question> I sometimes see lock timeout errors and deadlocks, what is going on?</question>
     <answer>
<p>
A lock timeout may be caused by an operation that for whatever reason took too
long, or it may also indicate a deadlock. By default, Carbonado uses a lock
timeout of 0.5 seconds for BDB based repositories. It can be changed by calling
setLockTimeout on the repository builder.
</p>
<p>
Deadlocks may be caused by:
</p>
<p>
<ol>
<li>Application lock acquisition order</li>
<li>BDB page split</li>
<li>Index update</li>
</ol>
</p>
<p>
In the first case, applications usually cause deadlock situations by updating a
record within the same transaction that previously read it. This causes the
read lock to be upgraded to a write lock, which is inherently deadlock
prone. To prevent this problem, switch the transaction to update mode. This
causes all acquired read locks to be upgradable, usually by acquiring a write
lock from the start.
</p>
<p>
Another cause of this deadlock is when you iterate over a cursor, updating
entries as you go. To resolve this, either copy the cursor entries to a list
first, or operate within a transaction which is in update mode.
</p>
<p>
The second case, BDB page split, is a problem in the regular BDB product.
It is not a problem with BDB-JE. When inserting records into a BDB,
it may need to rebalance the b-tree structure. It does this by splitting a leaf
node and updating the parent node. To update the parent node, a write lock must
be acquired but another thread might have a read lock on it while trying to
lock the leaf node being split.
</p>
<p>
There is no good solution to the BDB page split deadlock. Instead, your
application must be coded to catch deadlocks are retry transactions.  They are
more common when filling up a new BDB.
</p>
<p>
The third case, index updates, is caused by updating a record while another
thread is using the index for finding the record. Carbonado's indexing strategy
can be coded to defer index updates when this happens, but it currently does
not. In the meantime, there is no general solution.
</p>
<p>
Lock timeouts (or locks not granted) may be caused by:
</p>
<p>
<ol>
<li>Failing to exit all transactions</li>
<li>Open cursors with REPEATABLE_READ isolation</li>
<li>Heavy concurrency</li>
</ol>
</p>
<p>
If any transactions are left open, then any locks it acquired don't get
released. Over time the database lock table fills up. When using BDB, the
"db_stat -c" command can show information on the lock table. Running
"db_recover" can clear any stuck locks. To avoid this problem, always run
transactions within a try-finally statement and exit the transaction in the
finally section.
</p>
<p>
By default, BDB transactions have REPEATABLE_READ isolation level. This means
that all locks acquired when iterating cursors within the transaction are not
released until the transaction exits. This can cause the lock table to fill
up. To work around this, enter the transaction with an explicit isolation level
of READ_COMMITTED which releases read locks sooner.
</p>
<p>
Applications that have a high number of active threads can cause general lock
contention. BDB-JE uses row-level locks, and so lock timeouts caused by
contention are infrequent. The regular BDB product uses page-level locks, thus
increasing the likelyhood of lock contention.
</p>
     </answer>
   </faq>

   <faq id="subselect">
     <question>How do I perform a subselect?</question>
     <answer>
<p>
Carbonado query filters do not support subselects, athough it can be
emulated. Suppose the query you wish to execute looks something like this in
SQL:
</p>
<p><pre>
select * from foo where foo.name in (select name from bar where ...)
</pre></p>
<p>
This can be emulated by querying bar, and for each result, fetching foo.
</p>
<p><pre>
// Note that the query is ordered by name.
Cursor&lt;Bar&gt; barCursor = barStorage.query(...).orderBy("name").fetch();
String lastNameSeen = null;
while (barCursor.hasNext()) {
    Bar bar = barCursor.next();
    if (lastNameSeen != null &amp;&amp; lastNameSeen.equals(bar.getName()) {
        continue;
    }
    lastNameSeen = bar.getName();
    Foo foo = fooStorage.query("name = ?").with(lastNameSeen).tryLoadOne();
    if (foo != null) {
        // Got a result, do something with it.
        ...
    }
}
</pre></p>
<p>
For best performance, you might want to make sure that Foo has an index on its name property.
</p>
<p>
You may track the feature request <a href="http://sourceforge.net/tracker/index.php?func=detail&amp;aid=1578197&amp;group_id=171277&amp;atid=857357">here</a>.
</p>
    </answer>
   </faq>

   <faq id="table-generation">
     <question>Does Carbonado support generating SQL tables from Storables?</question>
     <answer>
<p>
No, it does not. Carbonado instead requires that your Storable definition
matches a table, if using the JDBC repository. When using a repository that has
no concept of tables, like the BDB repositories, the Storable is the canonical
definition. In that case, changes to the Storable effectively change the
"table". In addition, properties can be added and removed, and older records
can still be read.
</p>
<p>
Although it is technically feasible for Carbonado to support generating SQL
tables, Storable definitions are not expressive enough to cover all the
features that can go into a table. For example, you cannot currently define a
foreign key constraint in Carbonado.
</p>
     </answer>
   </faq>

   <faq id="isnull">
     <question>How do I query for "IS NULL"?</question>
     <answer>
<p>
Carbonado treats nulls as ordinary values wherever possible, so nothing special
needs to be done. That is, just search for null like any other value. The query
call might look like:
</p>
<p><pre>
Query&lt;MyType&gt; query = storage.query("value = ?").with(null);
Cursor&lt;MyType&gt; = query.fetch();
...
</pre></p>
<p>
When using the JDBC repository, the generated SQL will contain the "IS NULL"
phrase in the WHERE clause.
</p>
     </answer>
   </faq>

   <faq id="sql-debugging">
     <question>How do I see generated SQL?</question>
     <answer>
       <p>
To see the SQL statements generated by the JDBC repository, you can install a
JDBC DataSource that logs all activity. Provided in the JDBC repository package
is the LoggingDataSource class, which does this. As a convenience, it can be
installed simply by calling setDataSourceLogging(true) on the
JDBCRepositoryBuilder.
</p>
<p>
Alternatively, you can call Query.printNative(), which by default prints the
native query to standard out. When using the JDBC repository, this will print
the SQL statement.
       </p>
     </answer>
   </faq>

   <faq id="mysql-increment">
     <question>How do I use MySQL auto-increment columns?</question>
     <answer>
       <p>
Carbonado version 1.1 has thin support for MySQL. Version 1.2 (in the 1.2-dev branch)
supports an @Automatic annotation for supporting MySQL auto-increment columns.
       </p>
     </answer>
   </faq>

   <faq id="unique">
     <question>Can I do the equivalent of a "unique" constraint?</question>
     <answer>
       <p>
The @AlternateKeys annotation is provided specifically for this purpose. Both
@PrimaryKey and @AlternateKeys define unique indexes. The only real difference
is that there can be only one primary, but many alternate keys are allowed.
       </p>
     </answer>
   </faq>

   <faq id="caching">
     <question>What kind of caching does Carbonado provide?</question>
     <answer>
       <p>
Carbonado does not require repository implementations to perform any
caching. If you're using just the JDBC repository, there's no cache. A general
purpose caching repository was in development, but it was shelved because there
was no immediate need for it. The replicated repository however, can be
considered to be a complete cache.
</p>
<p>
The only built in caching is for join properties on Storable instances. It just
lazily sets the join result to an internal field of the Storable instance. The
join property value is not shared with other Storable instances.
        </p>
     </answer>
   </faq>

   <faq id="join-cache">
     <question>How does one manually flush the Carbonado join cache?</question>
     <answer>

       <p>
The Carbonado join cache is a lazy-read cache, local to a Storable instance. It
is not a global write-through cache, and so no flushing is necessary.
</p>
<p>
The first time a join property has been accessed, a reference is saved in the
master Storable instance. This optimization makes the most sense when filtering
based on a join property. The query loads the join property, and you'll likely
want it too. This avoids a double load.
       </p>

     </answer>
   </faq>

   <faq id="evolution">
     <question>How can schemas evolve?</question>
     <answer>
       <p>
Independent repositories like BDB support automatic schema evolution. You may
freely add or remove non-primary key properties and still load older
storables. Changes to primary key properties is not supported, since they
define a clustered index. Also, property data types cannot be changed except if
a boxed property is changed to a non-boxed primitive and vice versa.
</p>
<p>
Every storable persisted by Carbonado in BDB starts with a layout version,
which defines the set of properties encoded. Carbonado separately persists the
mapping from layout version to property set, such that when it decodes a
storable it knows what properties to expect.
</p>
<p>
When adding or removing properties, existing persisted storables are not
immediately modified. If you remove a property and add it back, you can recover
the property value still encoded in the existing storables. Property values are
not fully removed from an existing storable instance until it is explicitly
updated. At this time, the layout version used is the current one, and the
removed property values are lost.
</p>
<p>
When loading a storable which does not have a newly added property, the
property value is either null, 0, or false, depending on the data type. You can
call the isPropertyUninitialized method on the storable to determine if this
default property value is real or not.
</p>
<p>
In order to change a property type to something that cannot be automatically
converted, the change must be performed in phases. First, define a new
property, with a different name. Then load all the existing storables and
update them, setting the new property value. Next, remove the old property. To
potentially free up storage you can update all the storables again. If you wish
the newly added property to retain the original name, follow these steps again
in a similar fashion to change it.
</p>
     </answer>
   </faq>

   <faq id="iterate-all">
     <question>How do I iterate over all storable types in a repository?</question>
     <answer>
       <p>
Given a repository and an appropriately set classpath, can we iterate through
all the various storables held in the repository without actually knowing what
the repository might hold in advance?
</p>
<p>
Repositories that implement StorableInfoCapability provide this
functionality. The reason it's a capability is that some repos (JDBC) don't have
a registry of storables. BDB based ones do, and so this capability works for that.
</p>
     </answer>
   </faq>

   <faq id="index-integrity">
     <question>Are explicit transactions required to ensure index integrity?</question>
     <answer>
       <p>
The short answer is no -- index integrity is ensured automatically. More details follow:
</p>
<p>
When using the JDBC repository, it is up to the database vendor to ensure that
insert/update/delete operations include index updates within an implicit
auto-commit transaction. All the major database vendors do this properly
already, so nothing special needs to be done here.
</p>
<p>
When using a BDB backed repository, it is up to Carbonado to ensure implicit
transactions are used. Carbonado sets up BDB to be in transaction mode, and
there's no Carbonado level config to disable this. So you're always using BDB
with transactions, and that is good. When you do a lone Carbonado
insert/update/delete operation, it will pass null to BDB for the transaction
object, which implies auto-commit. BDB will automatically enter a tiny
transaction to protect that little change.
</p>
<p>
If the Storable you're updating has any indexes on it, a Carbonado trigger is
installed that updates the affected indexes when you do an
insert/update/delete. The presence of the trigger changes how the
auto-generated Storable behaves. The insert/update/delete operation enters a
transaction automatically, and it doesn't commit until all triggers have
run. Index updates are therefore guarded by transactions, even if you don't
explicitly specify one. In addition, all changes made by your own triggers are
always guarded by a transaction.
</p>
     </answer>
   </faq>

   <faq id="delete-from-cursor">
     <question> How do I delete Storables returned by a Cursor without deadlocks?</question>
     <answer>
       <p>
The cursor iteration and delete operations must be enclosed in the same
transaction. Auto-commit delete while iterating over a cursor fails for some
databases, BDB and BDB-JE in particular. Although BDB supports a delete
operation on the cursor itself, the transaction requirement remains.
</p>
<p>
A workaround exists when using BDB-JE, which works only due to its use of
record-level locks. Calling Cursor.hasNext() forces the cursor to move past the
current record, releasing the lock on the record to be deleted. BDB native uses
page locks, so this trick will only work in the occasional case that the next
record is on another page.
</p>
<p>
The BDB-JE cursor implementation could be changed to automatically move to the
next record, but this reduces portability. Also, the cursor should not move
past the current record automatically if in a transaction. It would allow
another thread to sneak in and modify the record. An isolation level of
repeatable read would be required to keep the lock.
</p>
     </answer>
   </faq>

 </part>
</faqs>