Age | Commit message (Collapse) | Author |
|
The previous solution would make use of concurrent get's to couch
backend in a pool of threads to implement the get_docs() and
get_all_docs() CouchDatabase backend methods.
This commit replaces those by a simpler implementation use the
`_all_docs` couchdb view api. It passes all needed IDs to the view and
r etrieves all documents with content in the same request.
A comparison between both implementations shows an improvement of at
least 15 times for large number of documents. The table below shows the
time for different implementations of get_all_docs() for different
number of documents and threads versus _all_docs implementation:
+-------+-----------------+------------------+-------------+
| | threads | _all_docs | improvement |
+-------+-----------------+------------------+-------------+
| 10 | 0.0728030204773 | 0.00782012939453 | 9.3 |
| 100 | 0.609349966049 | 0.0377721786499 | 16.1 |
| 1000 | 5.86522197723 | 0.370730876923 | 15.8 |
| 10000 | 66.1713931561 | 3.61764383316 | 18.3 |
+-------+-----------------+------------------+-------------+
|
|
|
|
|
|
|
|
|
|
The couch backend makes use of attachments and multipart structure for
writing the document to the couch database. For that to work, the order
in which attachments are described must match the actual order in which
attachments are written to the couch http stream.
This was not being properly taken care of, and eventually the json
serializer was arbitrarilly ordering the attachments description in a
way that it didn't match the actual order of attachments writing.
This commit fixes that by using json.dumps() sort_keys parameter and
making sure conflicts are always written before content.
|
|
Design documents are slow and we already have alternatives to all uses
we used to make of them, so this commit completelly removes all usage of
design documents.
|
|
When compared to plain couch document get, the use of the simplest view
functions takes around double the time, while the use of the simplest
list function can take more than 8 times:
get 100 docs:
total: 0.440337 secs
mean: 0.004403
query 100 views:
total: 0.911425 secs
mean: 0.009114
query 100 lists:
total: 3.711537 secs
mean: 0.037115
Besides that, the current implementation of sync metadata storage over
couch is dependent of timestamps of document puts, what can lead to
metadata corruption if the clock of the system is changed for any
reason.
Because of these reasons, we seek to change the implementation of
database metadata. This commit implements the storage of transaction log
data on couch documents with special ids, in the form "gen-xxxxxxxxxx",
where the x's are replaced by the generation index.
Each generation document holds a dictionary containing the generation,
doc_id and transaction_id for the changed document. For each modified
document, a generation document is inserted holding the transaction
metadata.
|
|
|
|
modifying original PR [0] by cristoph to account for the recent
vendoring of l2db code, which means we no longer depend on u1db/dirspec.
I expect the whole mess about the venv setup to be further simplified
pretty soon, since we are going to merge most of the leap.* packages
into a couple of repos.
[0] https://github.com/leapcode/soledad/pull/327
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- move tests to root directory
- split tests in different subdirectories
- setup a small package with common test dependencies in /testing/test_soledad
- add tox.ini that will:
- install the test_soledad package and other test dependencies
- install soledad common, client, server from the repository
- run tests contianed in /testing/tests directory using pytest
This commit also removes all oauth code from tests, as we have removed the
u1db dependency (by importing it into the repo and naming it l2db) and don't
neet oauth at all right now.
|
|
|
|
From this moment on, we embed a fork of u1db called l2db.
|
|
|
|
|
|
This moves the reactor time to the loopingcall period.
This is necessary as the decryption is now deferred to a thread.
The test will exit before the task is executed otherwise.
|
|
It was breaking E126 and E202 before
|
|
EncryptedSyncTestCase.test_sync_very_large_files is still getting an
excessive amount of memory on very slow machines (specially on old
spinning magnetic disks). This commit checks each doc at a time instead
of getting them all. More refinement is necessary for this test to pass
on any machine.
|
|
Old versions of pip do not accept the --trusted-host option and will complain
when trying to upgrade pip from wheel. To fix that we upgrade pip from usual
location instead of doing it from wheel.
|
|
Pep8 was warning about assignment of lambdas. These lambdas should
be partials
|
|
|
|
|
|
For the case where the user already has data synced, this commit will
migrate the docs_received table to have the column sync_id.
That is required by the refactoring in the previous commits.
|
|
This commit adds tests for doc ordering and encdecpool control
(start/stop). Also optimizes by deleting in batch and checking for a
sequence in memory before asking the local staging for documents.
|
|
The constructor method of Soledad was receiving two arguments for user
id. One of them was optional with None as default. It could cause an
inconsistent state with uuid set but userid unset.
This change remove the optional user_id argument from initialization
method and return the uuid if anyone call Soledad.userid method.
|
|
|
|
|
|
|
|
|
|
Shared db locking was used to avoid the case in which two different devices
try to store/modify remotelly stored secrets at the same time. We want to
avoid remote locks because of the problems they create, and prefer to crash
locally.
For the record, we are currently using the user's password to encrypt the
secrets stored in the server, and while we continue to do this we will have to
re-encrypt the secrets and update the remote storage whenever the user changes
her password.
|
|
|
|
|
|
|
|
- Use dbsyncer (SQLCipherU1DBSync) instead of SQLCipherDatabase
as only the first one supports multiple threads while syncing
and is actually used by Soledad.sync
|
|
database_security parameter was either undocumented or incomplete. This
commit adds a few more doc to make it consistent with latest changes.
Closes #7689
|
|
|
|
|
|
All batching code has no effect by default with this commit. Since we
know that this is a dangerous new feature we will enable them only on
our test servers and check them manually before setting it as default
or adding more configuration features.
Use SyncTarget and server conf file to enable it for testing.
|
|
Generation cache was removed for simple processing and it should not got
back, but during a batch the server wont change its generation. So a
little trick to hold that temporary information until batch finishes is
needed.
|
|
Batch support is optional. This commit adds a 'batching' configuration
option to disable it.
|
|
This commit adds checking for consistency on batch. When a doc is needed
during a batched sync and it doesnt exists on database, current code
will make a partial batch to avoid processing like it doesnt exist.
|
|
Using _bulk_docs api from CouchDB we can put all docs at a single
request. Also, prefetching all ids removes the need to HEAD
requests during the batch.
|
|
Created two methods on the backend to start and finish a batch. A dict of
callbacks is available to defer actions for the last document, allowing
temporary (changing often) metadata to be recorded only once.
Using those methods we will also be able to put all docs in one go on
the CouchDatabase implementation, but that is another step.
|
|
On real usage the docs will arrive shuffled and pool will be reused
after many decrypts. This test asserts that everything ended up clear
between execution and no inconsistency is left over for the next run.
|