Handling binary data
The majority of data stored in FHIR is structured data in (more or less) human readable format. However, there are use cases that require binary data. Examples include avatars, PDF reports, signatures or binary diagnostic data.
FHIR and Fire Arrow provide a number of options to store such data.
Uploading data as extension
The simplest form of storing binary data is using the valueBase64Binary property of DomainResource.extension, inherited by most FHIR resources. An extension is always a system-specific customization, so data stored in an extension will not be supported natively by other systems. Another disadvantage is that the data will be inlined into the resource, so every read, search or update request requires re-transferring the resource plus all contained binary data, making it only practical for very small amounts of data.
Uploading data as Attachments
A slightly different variant is using an Attachment, such as Patient.photo. An Attachment is understood by all systems, so doesn't suffer from being a proprietary extension. An Attachment may store data in the entity itself (at the considerable expense of increasing the parent resource's size) or specify an external URL.
Uploading data as Binary
The Binary entity is FHIR's solution to storing binary files in the FHIR database without poisoning another entity. A Binary can be managed as an independent resource on the FHIR server and linked via reference in supported entities (or through DomainResource.extension.valueReference).
The downside to this approach is that all clients trying to request the corresponding data need to understand FHIR, which prevents using these resources directly in a web browser for example.
Uploading data as separate files
Fire Arrow supports a mix of Binary and Attachment. Wherever an Attachment is supported, a client can upload the binary data via the UploadFile mutation.
mutation($data: Upload!) {
UploadFile(forEntity: "Patient/1234", data: $data) {
url
}
}
UploadFile stores the uploaded file on external storage, associates it with a specific FHIR resource and on completion returns a unique file identifier for the uploaded data. The uploading client can store this identifier in Attachment.url. If a client tries to access the Attachment.url field of the corresponding FHIR resource, Fire Arrow will validate the access request and (if the request is permitted) will return a public, pre-signed URL to the file.
The pre-signed URL can be used in browsers so that browsers can for example directly render images in web pages or show PDF files using their integrated viewers. Since the URL is signed, it will time out after a short time, ensuring that the stored data is protected even if the URL leaks.
Configuring Azure Blob Storage
Set the following values in config.json:
storage.enabled: truestorage.backend: azure_blobstorage.azure_blob.connection_string: Connection string for storage accountstorage.azure_blob.container: Container namestorage.azure_blob.block_size: Maximum size of each block in bytes (optional)
Azure Blob Storage should be configured with current defaults (HTTPS only, no anonymous access).
Files that are larger than the configured maximum block size are split into chunks of the configured block size. This needs to be balanced between the number of network calls (smaller blocks == higher number of network calls) and network throughput (smaller blocks == worse performance due to overhead) as well as Azure's service specific maximum block size. The default value assumes that Fire Arrow is located closely to the blob storage service and will benefit from using the largest supported block size (currently 4MB).
Configuring S3 storage
Set the following values in config.json:
storage.enabled: truestorage.backend: s3storage.s3.endpoint: Endpoint URL of the bucketstorage.s3.bucket: Bucket namestorage.s3.region: Region of the bucketstorage.s3.access_key: Access key to the bucketstorage.s3.secret_key: Secret key to the bucket
Make sure the S3 bucket does not allow public access.
General settings
Link expiration can be set as a generic value for both backends:
storage.link_expiration_seconds: Validity duration in seconds of each generated pre-signed URL.
The link expiration time should be short and ideally doesn't exceed the expiration time of the auth token.
Special considerations when updating an entity
When a client updates an entity, it won't have access to the internal storage locator URL anymore. The reason is that the internal storage locator URL can only be submitted during entity creation. When subsequently retrieving the entity to update it, the client receives a pre-signed URL by Fire Arrow. This means that if the client tries to update the entity and writes back the pre-signed URL, the internal storage locator URL will be overwritten and Fire Arrow would permanently store the pre-signed URL written by the client, which will soon time out - after which access to the file will be lost.
To counter this problem, Fire Arrow will intercept URLs pointing to its internal storage on entity updates. If it encounters a pre-signed URL that matches a file belonging to this entity, it will internally convert it back to the original, internal storage locator URL.
For security reasons, this action is only performed on updates and only for files that belong to the entity being updated. If a client tries to inject a URL pointing to a file that belongs to a different entity, the update action will be rejected.