From 8e55aec94fe3e8c696540802320e84d44a1a629f Mon Sep 17 00:00:00 2001 From: drebs Date: Thu, 28 Sep 2017 12:07:44 -0300 Subject: [doc] add blobs doc --- docs/reference/attachments.rst | 52 ++++++----------- docs/reference/blobs.rst | 125 ++++++++++++++++++++++++++++++++++++++++ docs/reference/incoming_box.rst | 8 ++- 3 files changed, 147 insertions(+), 38 deletions(-) create mode 100644 docs/reference/blobs.rst (limited to 'docs/reference') diff --git a/docs/reference/attachments.rst b/docs/reference/attachments.rst index 9561edcf..29387b36 100644 --- a/docs/reference/attachments.rst +++ b/docs/reference/attachments.rst @@ -1,5 +1,3 @@ -.. _blobs-spec: - Document attachments ==================== @@ -24,13 +22,22 @@ Document attachments were introduced as a means to efficiently store large payloads of binary data while avoiding the need to wait for their transfer to have access to the documents' contents. +Server-side +----------- + +In the server, attachments are stored as :ref:`blobs`. See +:ref:`http-blobs-api` for more information on how to interact with the server +using HTTP. + +The :ref:`IBlobsBackend ` interface is provided, so in the +future there can be different ways to store attachments in the server side +(think of a third-party storage, for example). Currently, the +:ref:`filesystem-backend` is the only one that implements that interface. + Client-side ----------- -In the client, attachments are stored as (SQLite) BLOBs in a separate SQLCipher -database. Encryption of data before it's sent to the server is the same used -for Soledad documents' content during usual synchronization process (AES-256 -GCM mode). +In the client, attachments are relations between JSON documents and blobs. See :ref:`client-side-attachment-api` for reference. @@ -52,52 +59,27 @@ the store that created it, and can put/get/delete an attachment: state = yield doc.get_attachment_state() dirty = yield doc.is_dirty() + assert state == AttachmentStates.NONE assert dirty == False yield doc.put_attachment(open('hackers.txt')) state = yield doc.get_attachment_state() dirty = yield doc.is_dirty() + assert state | AttachmentState.LOCAL assert dirty == True yield soledad.put_doc(doc) dirty = yield doc.is_dirty() + assert dirty == False yield doc.upload_attachment() state = yield doc.get_attachment_state() + assert state | AttachmentState.REMOTE assert state == AttachmentState.SYNCED fd = yield doc.get_attachment() assert fd.read() == open('hackers.txt').read() - -Server-side ------------ - -In the server, a simple REST API is served by a `Twisted Resource -`_ -and attachments are stored in the filesystem as they come in without -modification. - -A token is used to allow listing, getting, putting and deleting attachments. It -has to be added as an HTTP auth header, as in:: - - Authorization: Token - -Check out the :ref:`server-side-attachments-rest-api` for more information on -how to interact with the server using HTTP. - -The :ref:`IBlobsBackend ` interface is provided, so in the -future there can be different ways to store attachments in the server side -(think of a third-party storage, for example). Currently, the -:ref:`FilesystemBlobsBackend ` is the only backend -that implements that interface. - -Some characteristics of the :ref:`FilesystemBlobsBackend -` are: - -* Configurable storage path. -* Quota support. -* Username, blob_id and user storage directory sanitization. diff --git a/docs/reference/blobs.rst b/docs/reference/blobs.rst new file mode 100644 index 00000000..1150a34c --- /dev/null +++ b/docs/reference/blobs.rst @@ -0,0 +1,125 @@ +.. _blobs: + +Blobs +===== + +The first versions of Soledad used to store all data as JSON documents, which +means that binary data was also being treated as strings. This has many +drawbacks, as increased memory usage and difficulties to do transfer and crypto +in a proper binary pipeline. + +Starting with version **0.10.0**, Soledad now has a proper blob infrastructure +that decouples payloads from metadata both in storage and in the +synchronization process. + + +Server-side +----------- + +Soledad Server provides two different REST APIs for interacting with blobs: + +* A public **HTTP Blobs API**, providing the *Blobs* service for Soledad Client + (i.e. actual users of the infrastructure). + +* A local **HTTP Incoming Box API**, providing the delivery part of the + :ref:`incoming-box` service, currently used for the MX mail delivery. + +Authentication is handled differently for each of the endpoints, see +:ref:`authentication` for more details. + + +.. _http-blobs-api: + +HTTP Blobs API +~~~~~~~~~~~~~~ + +The public endpoint provides the following REST API for interacting with the +*Blobs* service: + +=========================== ========== ================================= ============================ +path method action accepted query string fields +=========================== ========== ================================= ============================ +``/blobs/{uuid} ``GET`` Get a list of blobs. filtered by ``namespace``, ``filter_flag``, ``order_by`` + a flag. +``/blobs/{uuid}/{blob_id}`` ``GET`` Get the contents of a blob. ``namespace`` +``/blobs/{uuid}/{blob_id}`` ``PUT`` Create a blob. The content of the ``namespace`` + blob should be sent in the body + of the request. +``/blobs/{uuid}/{blob_id}`` ``POST`` Set the flags for a blob. A list ``namespace`` + of flags should be sent in the + body of the request. +``/blobs/{uuid}/{blob_id}`` ``DELETE`` Delete a blob. ``namespace`` +============================ ============ ================================= + +The Blobs service supports *namespaces*. All requests can be modified by the +``namespace`` query string parameter, and the results will be restricted to +a certain namespace. When no namespace explicitelly given, the ``default`` +namespace is used. + +When listing blobs, the results can be filtered by flag and/or ordered by date +using the ``filter_flag`` and ``order_by`` query string parameters. The +possible values for ``order_by`` are ``date`` or ``+date`` for increasing +order, or ``-date`` for decreasing order. + + +HTTP Incoming Box API +~~~~~~~~~~~~~~~~~~~~~ + +The local endpoint provides the following REST API for interacting with the +:ref:`incoming-box` service. + +============================== ========== ================================= +path method action +============================== ========== ================================= +``/incoming/{uuid}/{blob_id}`` ``PUT`` Create an incoming blob. The content of the blob should be sent in the body of the request. +============================== ========== ================================= + +All blobs created using this API are inserted under the namespace ``MX`` and +flagged as ``PENDING``. + + +.. _filesystem-backend: + +Filesystem backend +~~~~~~~~~~~~~~~~~~ + +On the server side, all blobs are currently stored in the filesystem, under +``/var/lib/soledad/blobs`` by default. Blobs are split in subdirectories +according to the user's uuid, the namespace, and the 3-letter and 6-letter +prefixes of the blobs uuid to prevent too many files in the same directory. +A second file with the extension ``.flags`` stores the flags for a blob. + +As an example, a ``PUT`` request to +``/incoming/68625dcb68dab741adf29c7159ccff96/c56da69b25a9a11ec2f408a559ccffc6`` +would result in the following:: + + /var/lib/soledad/blobs + └── 68625dcb68dab741adf29c7159ccff96 + └── MX + └── c56 + └── c56da6 + ├── c56da69b25a9a11ec2f408a559ccffc6 + └── c56da69b25a9a11ec2f408a559ccffc6.flags + + +Client-side +----------- + +On the client-side, blobs can be managed using the BlobManager API. The +BlobManager is responsible for managing storage of blobs both in local and +remote storages + +All data is stored locally in the ``blobs`` table of a SQLCipher database +called ``{uuid}_blobs.db`` that lies in the same directory as the Soledad +Client's JSON documents database. Both databases are encrypted with the same +symmetric secret. All actions performed locally are mirrored remotelly using +the :ref:`http-blobs-api`. + +The BlobManager supports *namespaces* and *flags* and can list local and remote +blobs, possibly filtering by flags and ordering by date (increasingly or +decreasingly). It has helper methods to send or fetch all missing blobs, thus +aiding in synchronization of local and remote data. + +When uploading, the content of the blob is encrypted with a symmetric secret +prior to being sent to the server. When downloading, the content of the blob is +decrypted accordingly. diff --git a/docs/reference/incoming_box.rst b/docs/reference/incoming_box.rst index cbea6d32..04d3084c 100644 --- a/docs/reference/incoming_box.rst +++ b/docs/reference/incoming_box.rst @@ -1,5 +1,7 @@ -Soledad "Incoming Box" Specification -==================================== +.. _incoming-box: + +Incoming Box +============ *A mechanism for Trusted Applications to write encrypted data for a given user into the Soledad Server, which will sync it to the client to be processed afterwards.* @@ -38,7 +40,7 @@ NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and Terminology ----------- -- ``Blob`` refers to an encrypted payload that is stored by soledad server as-is. It is assumed that the blob was end-to-end encrypted by an external component before reaching the server. (See the :ref:`Blobs Spec ` for more detail) +- ``Blob`` refers to an encrypted payload that is stored by soledad server as-is. It is assumed that the blob was end-to-end encrypted by an external component before reaching the server. (See the :ref:`blobs` for more detail) - A ``BlobsBackend`` implementation is a particular backend setup by the Soledad Server that stores all the blobs that a given user owns. For now, only a filesystem backend is provided. - An ``Incoming Message`` makes reference to the representation of an abstract entity that matches exactly one message item, no matter how it is stored (ie, docs vs. blobs, single blob vs chunked, etc). It can represent one Email Message, one URL, an uploaded File, etc. For the purpose of the email use case, an Incoming Message refers to the encrypted message that MX has delivered to the incoming endpoint, which is pgp-encrypted, and can have been further obfuscated. - By ``Message Processing`` we understand the sequence of downloading an incoming message, decrypting it, transform it in any needed way, and deleting the original incoming message. -- cgit v1.2.3