Age | Commit message (Collapse) | Author |
|
We have been discussing about this merge for a while.
Its main goal is to simplify things: code navigation, but also
packaging.
The rationale is that the code is more cohesive in this way, and there's
only one source package to install.
Dependencies that are only for the server or the client will not be
installed by default, and they are expected to be provided by the
environment. There are setuptools extras defined for the client and the
server.
Debianization is still expected to split the single source package into
3 binaries.
Another avantage is that the documentation can now install a single
package with a single step, and therefore include the docstrings into
the generated docs.
- Resolves: #8896
|
|
|
|
|
|
|
|
|
|
With this commit all tests on py34 tox environment
are collected.
|
|
|
|
|
|
|
|
|
|
Fixes setup.cfg, adding current exclude rules, simplified tox.ini to use
setup.cfg and fixed all.
|
|
batch is slower than usual insert for a single doc, so, if a document
exceeds the buffer, commit the batch (if any) and put the huge load by
traditional insert.
refactor coming.
|
|
Batching is now decided by server, this commits enables it.
|
|
Will put a file object on doc json string if read_content is False,
otherwise it will fetch and fill as usual. This is useful for improving
server througput on sync download stream by receiving a bulk-get without
attachments and consume the file-objects as they come.
|
|
couchdb lib returns a file object representing the attachment. This
commit dumps the read() call into the wsgi write() call. Doc
representation uses 2 lines also, separating metadata from content.
|
|
We were using 1 transaction per doc, which is bad.
Reference:
http://stackoverflow.com/questions/1711631/improve-insert-per-second-performance-of-sqlite
Code now uses 1 transaction for the whole sync.
|
|
Instead of getting the attachments as the generator runs, get_docs will
now get as needed. Also, deepcopy solves a memory issue where we were
feeding the couchdb lib view with blobs while modifying it
unintentionally.
|
|
|
|
|
|
create_cmd lacked an explanation and check_schema_versions lacked
reasoning on why it defaults to False.
|
|
CouchServerState is spread across test codebase and this option is
intended to be used only on server startup. This commit makes it default
to False and explicitly set it to True on where it's necessary.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ensure_ddoc doesnt make sense anymore as we dont have any ddoc other
than _security, which has its own method for setting. 'ensure_security'
is explicit and is set internally when user is creating a database,
otherwise it will be False as it's only used during creation. This isn't
exposed externally (of couch module) to avoid confusion.
This confusion was making create-user-db fail to create a security ddoc
as it wasn't passing ensure_ddocs=True.
-- Resolves: #8388
|
|
If we create the gen document before saving the actual document in
couch, we may run into problems if more than one client is syncing and
trying to save documents with the same id at the same time.
By moving the gen document creation to after the actual document save in
couch, we rely on couch/u1db resolution of conflicts before actually
allocating a new generation, and the problem above doesn't occur.
|
|
|
|
The use of a lock to allocate the next generation of a change in couch
backend suffers from at least 2 problems:
1. all modification to the couch database would have to be made through
a soledad server entrypoint, otherwise the lock would have no effect.
2. introducing a lock makes code uglier, harder to debug, and prone to
undesired blocks.
The solution implemented by this commit is not so elegant, but works for
what we need right now. Now, concurrent threads updating the couch
database will race for the allocation of a new generation, and retry
when they fail to do so.
There's no high risk of getting blocked for too much time in the while
loop because (1) there's always one thread that wins (what makes the
expected number of retries to be N/2 if N is the number of concurrent
threads), and (2) the number of concurrent attempts to update the user
database is limited by the number of devices syncing at the same time.
|
|
The previous solution would make use of concurrent get's to couch
backend in a pool of threads to implement the get_docs() and
get_all_docs() CouchDatabase backend methods.
This commit replaces those by a simpler implementation use the
`_all_docs` couchdb view api. It passes all needed IDs to the view and
r etrieves all documents with content in the same request.
A comparison between both implementations shows an improvement of at
least 15 times for large number of documents. The table below shows the
time for different implementations of get_all_docs() for different
number of documents and threads versus _all_docs implementation:
+-------+-----------------+------------------+-------------+
| | threads | _all_docs | improvement |
+-------+-----------------+------------------+-------------+
| 10 | 0.0728030204773 | 0.00782012939453 | 9.3 |
| 100 | 0.609349966049 | 0.0377721786499 | 16.1 |
| 1000 | 5.86522197723 | 0.370730876923 | 15.8 |
| 10000 | 66.1713931561 | 3.61764383316 | 18.3 |
+-------+-----------------+------------------+-------------+
|
|
|
|
|
|
|
|
|
|
The couch backend makes use of attachments and multipart structure for
writing the document to the couch database. For that to work, the order
in which attachments are described must match the actual order in which
attachments are written to the couch http stream.
This was not being properly taken care of, and eventually the json
serializer was arbitrarilly ordering the attachments description in a
way that it didn't match the actual order of attachments writing.
This commit fixes that by using json.dumps() sort_keys parameter and
making sure conflicts are always written before content.
|
|
Design documents are slow and we already have alternatives to all uses
we used to make of them, so this commit completelly removes all usage of
design documents.
|
|
When compared to plain couch document get, the use of the simplest view
functions takes around double the time, while the use of the simplest
list function can take more than 8 times:
get 100 docs:
total: 0.440337 secs
mean: 0.004403
query 100 views:
total: 0.911425 secs
mean: 0.009114
query 100 lists:
total: 3.711537 secs
mean: 0.037115
Besides that, the current implementation of sync metadata storage over
couch is dependent of timestamps of document puts, what can lead to
metadata corruption if the clock of the system is changed for any
reason.
Because of these reasons, we seek to change the implementation of
database metadata. This commit implements the storage of transaction log
data on couch documents with special ids, in the form "gen-xxxxxxxxxx",
where the x's are replaced by the generation index.
Each generation document holds a dictionary containing the generation,
doc_id and transaction_id for the changed document. For each modified
document, a generation document is inserted holding the transaction
metadata.
|
|
|
|
modifying original PR [0] by cristoph to account for the recent
vendoring of l2db code, which means we no longer depend on u1db/dirspec.
I expect the whole mess about the venv setup to be further simplified
pretty soon, since we are going to merge most of the leap.* packages
into a couple of repos.
[0] https://github.com/leapcode/soledad/pull/327
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- move tests to root directory
- split tests in different subdirectories
- setup a small package with common test dependencies in /testing/test_soledad
- add tox.ini that will:
- install the test_soledad package and other test dependencies
- install soledad common, client, server from the repository
- run tests contianed in /testing/tests directory using pytest
This commit also removes all oauth code from tests, as we have removed the
u1db dependency (by importing it into the repo and naming it l2db) and don't
neet oauth at all right now.
|
|
|
|
From this moment on, we embed a fork of u1db called l2db.
|