1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
|
<?xml version="1.0"?>
<faqs id="FAQ" title="Frequently Asked Technical Questions">
<part id="General">
<faq id="process-killed">
<question>What happens when a Carbonado process is killed while in the middle of a transaction?</question>
<answer>
<p>
Carbonado uses shutdown hooks to make sure that all in progress transactions
are properly rolled back. If you hard-kill a process (kill -9), then the
shutdown won't get run. This can cause a problem when using BDB, and
db_recover must be run to prevent future data corruption. BDB-JE is not
affected, however, as it automatically runs recovery upon restart.
</p>
</answer>
</faq>
<faq id="replicated-bootstrap">
<question> How do I bootstrap a replicated repository?</question>
<answer>
<p>
By running a resync operation programmatically. The ReplicatedRepository has a
ResyncCapability which has a single method named "resync". It accepts a
Storable class type, a throttle parameter, and an optional filter. Consult the
<a href="http://carbonado.sourceforge.net/apidocs/com/amazon/carbonado/capability/ResyncCapability.html">Javadocs</a> for more info.
</p>
<p>
In your application you might find it convenient to add an administrative
command to invoke the resync operation. This makes it easy to repair any
inconsistencies that might arise over time.
</p>
</answer>
</faq>
<faq id="deadlock">
<question> I sometimes see lock timeout errors and deadlocks, what is going on?</question>
<answer>
<p>
A lock timeout may be caused by an operation that for whatever reason took too
long, or it may also indicate a deadlock. By default, Carbonado uses a lock
timeout of 0.5 seconds for BDB based repositories. It can be changed by calling
setLockTimeout on the repository builder.
</p>
<p>
Deadlocks may be caused by:
</p>
<p>
<ol>
<li>Application lock acquisition order</li>
<li>BDB page split</li>
<li>Index update</li>
</ol>
</p>
<p>
In the first case, applications usually cause deadlock situations by updating a
record within the same transaction that previously read it. This causes the
read lock to be upgraded to a write lock, which is inherently deadlock
prone. To prevent this problem, switch the transaction to update mode. This
causes all acquired read locks to be upgradable, usually by acquiring a write
lock from the start.
</p>
<p>
Another cause of this deadlock is when you iterate over a cursor, updating
entries as you go. To resolve this, either copy the cursor entries to a list
first, or operate within a transaction which is in update mode.
</p>
<p>
The second case, BDB page split, is a problem in the regular BDB product.
It is not a problem with BDB-JE. When inserting records into a BDB,
it may need to rebalance the b-tree structure. It does this by splitting a leaf
node and updating the parent node. To update the parent node, a write lock must
be acquired but another thread might have a read lock on it while trying to
lock the leaf node being split.
</p>
<p>
There is no good solution to the BDB page split deadlock. Instead, your
application must be coded to catch deadlocks are retry transactions. They are
more common when filling up a new BDB.
</p>
<p>
The third case, index updates, is caused by updating a record while another
thread is using the index for finding the record. Carbonado's indexing strategy
can be coded to defer index updates when this happens, but it currently does
not. In the meantime, there is no general solution.
</p>
<p>
Lock timeouts (or locks not granted) may be caused by:
</p>
<p>
<ol>
<li>Failing to exit all transactions</li>
<li>Open cursors with REPEATABLE_READ isolation</li>
<li>Heavy concurrency</li>
</ol>
</p>
<p>
If any transactions are left open, then any locks it acquired don't get
released. Over time the database lock table fills up. When using BDB, the
"db_stat -c" command can show information on the lock table. Running
"db_recover" can clear any stuck locks. To avoid this problem, always run
transactions within a try-finally statement and exit the transaction in the
finally section.
</p>
<p>
By default, BDB transactions have REPEATABLE_READ isolation level. This means
that all locks acquired when iterating cursors within the transaction are not
released until the transaction exits. This can cause the lock table to fill
up. To work around this, enter the transaction with an explicit isolation level
of READ_COMMITTED which releases read locks sooner.
</p>
<p>
Applications that have a high number of active threads can cause general lock
contention. BDB-JE uses row-level locks, and so lock timeouts caused by
contention are infrequent. The regular BDB product uses page-level locks, thus
increasing the likelyhood of lock contention.
</p>
</answer>
</faq>
<faq id="subselect">
<question>How do I perform a subselect?</question>
<answer>
<p>
Carbonado query filters do not support subselects, athough it can be
emulated. Suppose the query you wish to execute looks something like this in
SQL:
</p>
<p><pre>
select * from foo where foo.name in (select name from bar where ...)
</pre></p>
<p>
This can be emulated by querying bar, and for each result, fetching foo.
</p>
<p><pre>
// Note that the query is ordered by name.
Cursor<Bar> barCursor = barStorage.query(...).orderBy("name").fetch();
String lastNameSeen = null;
while (barCursor.hasNext()) {
Bar bar = barCursor.next();
if (lastNameSeen != null && lastNameSeen.equals(bar.getName()) {
continue;
}
lastNameSeen = bar.getName();
Foo foo = fooStorage.query("name = ?").with(lastNameSeen).tryLoadOne();
if (foo != null) {
// Got a result, do something with it.
...
}
}
</pre></p>
<p>
For best performance, you might want to make sure that Foo has an index on its name property.
</p>
<p>
You may track the feature request <a href="http://sourceforge.net/tracker/index.php?func=detail&aid=1578197&group_id=171277&atid=857357">here</a>.
</p>
</answer>
</faq>
<faq id="table-generation">
<question>Does Carbonado support generating SQL tables from Storables?</question>
<answer>
<p>
No, it does not. Carbonado instead requires that your Storable definition
matches a table, if using the JDBC repository. When using a repository that has
no concept of tables, like the BDB repositories, the Storable is the canonical
definition. In that case, changes to the Storable effectively change the
"table". In addition, properties can be added and removed, and older records
can still be read.
</p>
<p>
Although it is technically feasible for Carbonado to support generating SQL
tables, Storable definitions are not expressive enough to cover all the
features that can go into a table. For example, you cannot currently define a
foreign key constraint in Carbonado.
</p>
</answer>
</faq>
<faq id="isnull">
<question>How do I query for "IS NULL"?</question>
<answer>
<p>
Carbonado treats nulls as ordinary values wherever possible, so nothing special
needs to be done. That is, just search for null like any other value. The query
call might look like:
</p>
<p><pre>
Query<MyType> query = storage.query("value = ?").with(null);
Cursor<MyType> = query.fetch();
...
</pre></p>
<p>
When using the JDBC repository, the generated SQL will contain the "IS NULL"
phrase in the WHERE clause.
</p>
</answer>
</faq>
<faq id="sql-debugging">
<question>How do I see generated SQL?</question>
<answer>
<p>
To see the SQL statements generated by the JDBC repository, you can install a
JDBC DataSource that logs all activity. Provided in the JDBC repository package
is the LoggingDataSource class, which does this. As a convenience, it can be
installed simply by calling setDataSourceLogging(true) on the
JDBCRepositoryBuilder.
</p>
</answer>
</faq>
<faq id="jdbc-indexes">
<question>What happens if JDBC repository cannot get index info?</question>
<answer>
<p>
The JDBC repository checks if the Storable alternate keys match those defined
in the database. To do this, it tries to get the index info. If the user
account does not have permissions, a message is logged and this check is
skipped. This should not cause any harm, unless the alternate keys don't
match. This can cause unexpected errors when using the replicated repository.
</p>
</answer>
</faq>
<faq id="mysql-increment">
<question>How do I use MySQL auto-increment columns?</question>
<answer>
<p>
As of 2006-10-23, Carbonado MySQL support is very thin. The @Sequence
annotation is intended to be used for mapping to auto-increment columns, if the
database does not support proper sequences. Until support is added,
auto-increment columns will not work.
</p>
</answer>
</faq>
<faq id="unique">
<question>Can I do the equivalent of a "unique" constraint?</question>
<answer>
<p>
The @AlternateKeys annotation is provided specifically for this purpose. Both
@PrimaryKey and @AlternateKeys define unique indexes. The only real difference
is that there can be only one primary, but many alternate keys are allowed.
</p>
</answer>
</faq>
<faq id="join-cache">
<question>How does one manually flush the Carbonado join cache?</question>
<answer>
<p>
The Carbonado join cache is a lazy-read cache, local to a Storable instance. It
is not a global write-through cache, and so no flushing is necessary.
</p>
<p>
The first time a join property has been accessed, a reference is saved in the
master Storable instance. This optimization makes the most sense when filtering
based on a join property. The query loads the join property, and you'll likely
want it too. This avoids a double load.
</p>
</answer>
</faq>
<faq id="evolution">
<question>How can schemas evolve?</question>
<answer>
<p>
Independent repositories like BDB support automatic schema evolution. You may
freely add or remove non-primary key properties and still load older
storables. Changes to primary key properties is not supported, since they
define a clustered index. Also, property data types cannot be changed except if
a boxed property is changed to a non-boxed primitive and vice versa.
</p>
<p>
Every storable persisted by Carbonado in BDB starts with a layout version,
which defines the set of properties encoded. Carbonado separately persists the
mapping from layout version to property set, such that when it decodes a
storable it knows what properties to expect.
</p>
<p>
When adding or removing properties, existing persisted storables are not
immediately modified. If you remove a property and add it back, you can recover
the property value still encoded in the existing storables. Property values are
not fully removed from an existing storable instance until it is explicitly
updated. At this time, the layout version used is the current one, and the
removed property values are lost.
</p>
<p>
When loading a storable which does not have a newly added property, the
property value is either null, 0, or false, depending on the data type. You can
call the isPropertyUninitialized method on the storable to determine if this
default property value is real or not.
</p>
<p>
In order to change a property type to something that cannot be automatically
converted, the change must be performed in phases. First, define a new
property, with a different name. Then load all the existing storables and
update them, setting the new property value. Next, remove the old property. To
potentially free up storage you can update all the storables again. If you wish
the newly added property to retain the original name, follow these steps again
in a similar fashion to change it.
</p>
</answer>
</faq>
<faq id="iterate-all">
<question>How do I iterate over all storable types in a repository?</question>
<answer>
<p>
Given a repository and an appropriately set classpath, can we iterate through
all the various storables held in the repository without actually knowing what
the repository might hold in advance?
</p>
<p>
Repositories that implement StorableInfoCapability provide this
functionality. The reason its a capability is that some repos (JDBC) don't have
a registry of storables. BDB based ones do, and so this capability works.
</p>
</answer>
</faq>
</part>
</faqs>
|