krsd - A Kerberised RemoteStorage server implementation

Introduction

remotestorage is an open specification for personal data storage. It is supposed to replace the currently popular proprietary "cloud storage" protocols using an open standard and thereby promoting the separation of applications and their data on the web.

For more information, check out these links:

Overview

krsd brings three things: * a HTTP endpoint implementing remotestorage: /storage/{user} * a HTTP endpoint implementing webfinger: /.well-known/webfinger

User management is based on Kerberos identities. The SPNEGO mechanism specified in RFC 4559 is used to this end. Briefly put, this is a GSS-API exchange embedded as base64 in WWW-Authenticate and Authorization headers of the Negotiate type.

User accounts are unrelated to accounts on the server hosting this service. Storage is both available to users and to services, and each is identified with their usual principal names, including forms like these:

xmpp/xmpp.arpa2.net@ARPA2.NET john@ARPA2.NET john/admin@ARPA2.NET

These forms are translated to paths on the local filesystem as described below, under "Storage system".

krsd is a fork of the useful work on rs-serve, and the respective locations of the original and derived work are: https://github.com/remotestorage/rs-serve https://github.com/arpa2/krsd The name has been changed to avoid confusion with application users. It is not clear to date if this project will live its own life or get integrated with the original branch. In the first case, a separate name brings clarity; in the second, it has been harmless.

krsd is entirely written in C, using mostly POSIX library functions. It relies on a few portable libraries, see the list under "Dependencies" below. It does however currently use the signalfd() system call, which is only available on Linux. (this is a solvable problem though, if you want to be able to run on another system, please open an issue to ask for help)

remoteStorage

The currently implemented protocol version is "draft-dejong-remotestorage-01". It has been modified to permit implied, that is HTTP-level, authentication and authorisation. Demo code is available in a separate directory.

Currently the following features are supported:

  • CORS support for all verbs (TODO: May not work at present)
  • GET, PUT, DELETE requests on files and folders
  • Opaque version strings (in directory listings and "ETag" header)
  • Conditional GET, PUT and DELETE requests ("If-Match", "If-None-Match" headers)
  • Protection of all non-public paths via authentication by the browser at the HTTP level, and authorisation based on a .k5remotestorage file in the destination file system
  • Special handling of public paths (i.e. those starting with /public/), such that requests on non-directory paths succeed without authorization.
  • HEAD requests on files and folders with "Content-Length" header (not part of remotestorage-01, only enabled when --experimental flag is given)

webfinger

The webfinger implementation only serves information about remotestorage and is currently not extensible. The hostname part of user addresses is expected to be the hostname set for the rs-serve instance. This currently defaults to "local.dev" and can be overridden with the --hostname option. Virtual hosting (== hosting storage for multiple domains from a single instance) is currently not supported.

The pathname returned ermits for the krsd to parse out two components of the pathname to which it stores: /storage/domain.tld/user/ should be at the beginning of the URI if it is to be accepted as a remoteStorage URI on krsd. Anything after this is taken as a path into the storage structures used.

authorization

This version of rs-serve cannot employ the original authorization backend and frontend, because it removes the implicit bearer flow described by OAuth2, for reasons of security. Instead, it relies on the implicit and time-constrained access provided by Kerberos.

It might have been possible to rely on a bearer token obtained from a Kerberos-specific OAuth2 node, which would have made minimal or no changes to rs-serve, but the downside of that would be that the intermediate form of the bearer tokens provide access for all times, whereas Kerberos tickets passed directly put a time constraint on the exchange. (This point however, is negotiable. It seems that OAuth2 permits it. Encryption to resource and information-rich handling in the IDP would be possible, in a site managed and controlled by the user's side of things. Let's talk?)

Another option might have been to package Kerberos tickets in bearer tokens, but that would have introduced an extra intermediate party with no added value but unpacking / repacking the token, so it was deemed less attractive.

The current implementation does not constraint access to particular users, or groups of users. It is very likely that such a facility will be added in a future version, possibly based on central management infrastructure.

The --auth-uri option now causes a warning, but is not fatal.

Storage system

The payload data of the remotestorage endpoint is stored on the local filesystem within the respective user's target directory. Determining this directory is done by splitting the principal name at the @ sign and reshaping it according to these steps:

  1. Initially, the pathname of a realm is a fixed string that is shared by all principals.

    Example 0a.

    A fallback default could be /var/lib/krs/

    Example 0b.

    When using OpenAFS as a backing store for this service, the starting point /afs/ may be more useful.

  2. Firstly, the realm is removed and translated into its related domain name, which should be supported in the surrounding infrastructure. In case of generic hosting, the surrounding infrastructure could be the DNS, with DNSSEC validated for sites that use it. The result from this mapping is the DNS domain name, represented as lowercase characters without a trailing dot.

    Example 1a.

    ARPA2.NET is setup in the local infrastructure, and its domain name is found to be arpa2.net -- which is used as a pathname component.

    Example 1b.

    ARPA2.NET is not setup in local infrastructure, and it is looked up in DNS under the same domain name, arpa2.net -- and if that site thought it sufficiently important to sign with DNSSEC, then this will be validated. What is looked up under the domain name is a TXT record named _kerberos, holding what should match literally with the realm name, case included. In case of a mismatch, fail to help the user. Since the realm is being looked up, rather than a DNS hostname, there will not be a trace up to parent domains.

    What we now concatenate the DNS-name to the installation directory as another subdirectory level.

  3. Secondly, the local part (before the @ sign) is treated literally as a path, with one or more levels. Note that this means that a slash is treated as a directory separator. The last level is, once again, a directory name.

    Example 2a.

    john@ARPA2.NET will append john/ to the path found so far.

    Example 2b.

    john/admin@ARPA2.NET will append john/admin/ to the path found so far.

    Example 2c.

    http/chitchat.arpa2.net will append http/chitchat.arpa2.net/ to the path found so far.

  1. Thirdly, a fixed directory name such as remotestorage/ is appended to the path name. This is done to separate remote storage from local storage and from other remote storage components, and to make all the other directories unavailable. This is of no use to local storage, but it is immensely useful when dealing with storage on an OpenAFS share that is also employed for other purposes.

The result is a concrete path into a filesystem. A few complete examples may be helpful at this point.

Example. The user john@ARPA2.NET could be found in DNS zone arpa2.net, and dependent on local settings his files could end up in mounted locations like:

/afs/arpa2.net/john/remotestorage/... /var/lib/krs/arpa2.net/john/remotestorage/...

Example. After changing user-ID to john/admin under the same realm, the files accessible to John are found in places like:

/afs/arpa2.net/john/admin/remotestorage/... /var/lib/krs/arpa2.net/john/admin/remotestorage/...

Example. When a server is not acting on behalf of a user (through S4U with Constrained Delegation) but on its own title, it depends on its principal name. For example, xmpp/xmpp.arpa2.net@ARPA2.NET could be found on:

/afs/arpa2.net/xmpp/xmpp.arpa2.net/remotestorage/... /var/lib/krs/arpa2.net/xmpp/xmpp.arpa2.net/remotestorage/...

The filesystem path is configured in the webfinger profile, and may or may not relate to the Kerberos Principal Name used to access the resource. For full flexibility, the remotestorage directory should hold a file named .k5remotestorage which must hold the Kerberos Principal Name on a line of its own, if it is to have read/write access to anything underneath this directory.

The reliance on filesystem paths implies a few noteworthy restrictions:

  • The remotestorage endpoint cannot be used to store both a directory and a file under the same path (ignoring the trailing slash). That means you cannot store /foo/bar/baz and /foo/bar, but only one of them. This is a natural restriction of traditional filesystems, that is currently well adhered to by all apps using remotestorage (as far as I know).
  • MIME types may not be exact for files that were added "out-of-band", that is not added via the remotestorage protocol, but by copying to the remotestorage/ directory by other means. krsd stores MIME type and character encoding under the "user.mime_type" and "user.charset" extended attributes, given these are supported by the underlying filesystem. When these attributes aren't set, a MIME type is guessed using libmagic, which may not always yield desirable results. (for example an empty file, created using "touch" will be transmitted via remotestorage with a Content-Type header of "inode/x-empty; charset=binary") If even libmagic fails to make sense of a file, the Content-Type is set to "application/octet-stream; charset=binary".
  • Filesystem privileges must be setup to grant access to the user. In the case of OpenAFS, this will be based on the user, and krsd will act on behalf of the user when storing files. In this case, Constrained Delegation must be permissive to S4U2Proxy use, and the principal ticket for the user must probably be created proxiable. The details of this have not been ironed out yet.

Installing

These steps should enable you to install krsd.

Dependencies

  • GNU make
  • pkg-config (or tweak the Makefile)
  • gcc
  • libc
  • libevent (>= 2.0)
  • libmagic
  • libattr
  • BerkeleyDB

On Debian based systems, this should give you all you need:

apt-get install build-essential libevent-dev libmagic-dev libattr1-dev libssl-dev libdb-dev pkg-config

If you want to develop, you may also want debug symbols and valgrind (required by leakcheck.sh script):

apt-get install libevent-dbg valgrind

Getting the code

Given you are reading this file, you probably have the code already, but just to be sure:

Currently the krsd code is hosted on github.

You can browse it online, at:

https://github.com/arpa2/krsd

or close it using git:

git clone git://github.com/arpa2/krsd.git

Note that krsd itself is a clone of rs-serve, with the distinctive feature that krsd introduces Kerberos authentication.

Building

Given you have all dependencies installed, simply run

make

and you should be good to go.

Installing system-wide

To install the krsd binary to /usr/bin, run

make install

as a privileged user.

To install somewhere else, tweak the Makefile first.

This will also install an init script to /etc/init.d/krsd and a default configuration to /etc/default/krsd.

On Debian based systems (i.e. when "update-rc.d" is present), "make install" will also install the krsd init script into /etc/rc*.d/.

Setting options

There are a variety of options

If you want to use the init script, you can set options in /etc/default/krsd, otherwise just pass them on the command line.

Run:

krsd --help

to get a list of supported options.

Integrating authorization

To integrate an authorization endpoint, you need to do two things:

  • configure endpoint URI

    Set the --auth-uri option to a printf style format string. "%s" will be replaced with the username.

  • configure your authorization endpoint to manage krsd tokens

    krsd doesn't care where tokens come from, but it need to know them to decide whether a given request is authorized or not. It maintains an internal store for authorizations (i.e. structures of [user-name, token, scopes]), which must be managed from the outside.

    The tools to do this are:

    • rs-add-token:

      Usage: rs-add-token <user> <token> <scope1> [<scope2> ... <scopeN>]

      • <user> is the login name of the user (krsd must be able to resolve it using getpwnam() in order to find the home directory)
      • <token> is the token string authenticating future requests. For krsd it is an opaque string.
      • <scope1>..<scopeN> are scope strings in the same form as described in draft-dejong-remotestorage-01, Section 9.
    • rs-remove-token:

      Usage: rs-remove-token <user> <token>

      <user> and <token> must both be given. If the token cannot be found, rs-remove-token terminates with non-zero status.

    • rs-list-tokens:

      Lists all currently installed tokens and their respective scopes.

      The output format is primarily meant for (human) debugging and subject to change.

Contributing

Download or fork now

Download now from Github or fork the code. Have fun!