Introduction
So far we have learned how to upload assets when their size and other properties are already known, but what if we want to upload an asset as it is being recorded, rendered, or streamed?
This guide will cover how to upload a real-time asset. The Real-time Uploads API enables asset to be playable in Frame.io seconds after they are done being recorded.
Demo video
If you would like to get a quick preview of this API in action, take a look at our video demo!
Watch as a render is uploaded out of Adobe Media Encoder in real time, with the video being playable in Frame.io 5 seconds after the render completes. With real-time uploads your integration can be more responsive than ever!
What will I need?
If you haven’t read the Implementing C2C: Setting Up guide, give it a quick glance before moving on!
You will also need the access_token
you received during the C2C hardware or C2C Application authentication and authorization guide.
In this guide we will be using a test asset found at this Frame.io link. Hit the Download
button to download it. Using the example file will help you match all the values in our example curl
commands.
It is also recommended that you are familiar with How to: Upload (Basic), guide, as this guide will assume you are familiar with the basic upload flow.
Create a real-time asset
Uploading an asset in real time begins with how that asset is created. When creating your asset, is_realtime_upload
must be set to true
, and filesize
must be elided or set to null
. Normally, filesize
is required, but if you are uploading a file as it is being created, that obviously can't be known.
{
curl -X POST https://api.frame.io/v2/devices/assets \
--header 'Authorization: Bearer [access_token]' \
--header 'Content-Type: application/json' \
--header 'x-client-version: 2.0.0' \
--data-binary @- <<'__JSON__'
{
"name": "C2C_TEST_CLIP.mp4",
"filetype": "video/mp4",
"is_realtime_upload": true
}
__JSON__
} | python -m json.tool
Docs for /v2/devices/assets
can be found here, and is identical to the /v2/assets
endpoint.
When uploading a real-time asset, an extension is required. However, some uploaders may not know the filename before the asset data has finished being generated. In those instances, the extension
field may be used to supply an extension in the format of .[extension]
, like '.mp4'
. When supplying extension only, this field should be preferred, as it will allow you to update the asset name later.
Let's send our payload. We'll get this response:
{
"id": "{asset_id}",
"name": "C2C_TEST_CLIP.mp4"
}
The response for real-time assets is significantly stripped down compared to regular asset creation. Note in particular that upload_urls
is missing. For real-time assets, we will be generating our upload URLs on-demand!
Requesting upload URLs
Let's request the URL for the first half of our file — exactly half of the file — 10,568,125 bytes. We'll need to include the id
from above as the asset_id
in the URL below:
{
curl -X POST https://api.frame.io/v2/devices/assets/{asset_id}/realtime_upload/parts \
--header 'Authorization: Bearer [access_token]' \
--header 'Content-Type: application/json' \
--header 'x-client-version: 2.0.0' \
--data-binary @- <<'__JSON__'
{
"parts": [
{
"number": 1,
"size": 10568125,
"is_final": false
}
]
}
__JSON__
} | python -m json.tool
Docs for /v2/devices/assets/{asset_id}/realtime_upload/parts
can be found here.
This request will fetch a single URL. Let's break down the request fields:
-
parts
: A list of upload parts we wish to generate upload URLs for. As this is a list, we can batch-request URLs if desired to be more efficient.number
: The part number/index, starting at 1. Part numbers may be skipped, and may be uploaded in any order, but will be used to concatenate the final file in sequential order. Cannot be greater than 10,000 (the maximum number of parts that AWS allows).size
: The size of the part in bytes. If the size does not abide by the AWS Multi-Part upload restrictions, an error will be returned.is_final
: Whether this URL is for the final part of the file.
When we make the request, we should get a response that looks like so:
{
"upload_urls": [
"https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[part_01_path]"
]
}
The upload_urls
list will be in the same order as the parts
request field.
Now we can upload our data just like a basic upload:
head -c 10568125 ~/Downloads/C2C_TEST_CLIP.mp4 | \
curl -X PUT https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[part_01_path] \
--include \
--header 'content-type: video/mp4' \
--header 'x-amz-acl: private' \
--data-binary @-
Next, we need to request the URL for our second, and final, part:
{
curl -X POST https://api.frame.io/v2/devices/assets/{asset_id}/realtime_upload/parts \
--header 'Authorization: Bearer [access_token]' \
--header 'Content-Type: application/json' \
--header 'x-client-version: 2.0.0' \
--data-binary @- <<'__JSON__'
{
"asset_filesize": 21136250,
"parts": [
{
"number": 2,
"size": 10568125,
"is_final": true
}
]
}
__JSON__
} | python -m json.tool
Note that is_final
, is set to true
. If this field is set to false
, Frame.io will wait forever for the final part and the asset will never become playable in Frame.
Also note the asset_filesize
field which contains the full filesize for the upload. This field is required when an object in parts
has is_final
set to true
.
asset_filesize
may be supplied with any request, but MUST be supplied with the final part request.
We will get another response payload like before:
{
"upload_urls": [
"https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[part_02_path]"
]
}
... which we can use to upload the second half of our file:
tail -c 10568125 ~/Downloads/C2C_TEST_CLIP.mp4 | \
curl -X PUT https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[part_02_path] \
--include \
--header 'content-type: video/mp4' \
--header 'x-amz-acl: private' \
--data-binary @-
When the final part is uploaded, it kicks off a process in our backend to stitch the parts together into a single file. This process will only wait a 60-second grace period for all other parts to complete uploading, For this reason, we recommend waiting to upload the final part until all other parts are uploaded.
That's it! Navigate to Frame.io and you will see that your asset has been uploaded. Your first real-time upload is complete! 🎉🎉🎉
Asset names
As discussed above, it is possible to supply an extension
field and omit the name
field on asset creation when the full asset name is not known:
{
curl -X POST https://api.frame.io/v2/devices/assets \
--header 'Authorization: Bearer [access_token]' \
--header 'Content-Type: application/json' \
--header 'x-client-version: 2.0.0' \
--data-binary @- <<'__JSON__'
{
"extension": ".mp4",
"filetype": "video/mp4",
"is_realtime_upload": true
}
__JSON__
} | python -m json.tool
Response:
{
"id": "{asset_id}",
"name": "[new file].mp4"
}
When an asset has been created this way, it will be automatically assigned the name [new file].[ext]
. You may supply an asset_name
field at any point when requesting upload URLs to update it:
{
curl -X POST https://api.frame.io/v2/devices/assets/{asset_id}/realtime_upload/parts \
—-header 'Authorization: Bearer [access_token]' \
—-header 'Content-Type: application/json' \
--header 'x-client-version: 2.0.0' \
—data-binary @- <<'*JSON*'
{
"asset_name": "C2C_TEST_CLIP.mp4",
"asset_filesize": 21136250,
"parts": [
{
"number": 2,
"size": 10568125,
"is_final": true
}
]
}
*JSON*
} | python -m json.tool
The asset name will only be updated if the asset still has a default name, otherwise it will be ignored, including if the name was updated in the Frame.io UI.
Batching URLs
It is recommended that you request as many URLs as you currently have data to supply, rather than requesting a single URL per request. This strategy, while a little more complex logically, will ensure that files with enough data for thousands of URLs are handled efficiently when upload speed is not able to keep pace with uploads.
Minimum part size and media file headers.
The minimum part size for a non-final part is determined by AWS, and is 5 MiB (5,242,880 bytes). Some file formats may require a header at the front of the file that is not written until the rest of the file data has been rendered. Thus, we have a problem: if we need to delay writing the header until all other data has been written, and it is smaller than the minimum 5,242,880 bytes, then we will not be able to upload it as part_number=1
.
We suggest that in these cases, you hold back the first 5,242,880 bytes of your media data, and begin uploading parts with part_number=2
. Once your data has finished being generated, you can pre-append the header to this held-back chunk of media data, request a URL for part_number=1
, then upload the part. This ensures that your first chunk will always meet the minimum part size requirement.
Implementing a scaling part size formula.
The AWS limit for a single file is 5 TiB (5,497,558,138,880 bytes). The maximum number of parts a file can be split into is 10,000.
We want to make sure that we can upload the maximum filesize with the number of parts available to us, especially since — with real-time uploads — we can't know the filesize ahead of time. Make our parts too small, and we may blow through all 10,000 of them far too quickly, while also leaving a healthy chunk of bytes on the table. But make our parts too large, and we lose the ability to start uploading small files before they are completed.
Let's looks at some hard numbers. If we request the minimum part size of 5,242,880 bytes for all 10,000 parts, we can upload a total of 52,428,800,000 bytes (52.4 GB), or 1% of the total filesize available to us.
But if we were to evenly distribute the total filesize, each part would be 5,497,558,138,880 / 10,000
bytes, or ~550 MB. Thats a very large payload if it turns out our file only ends up being 80 MB. We wouldn't gain any benefit from uploading it in real time.
For small files we want to keep payload size close to the minimum so we can keep pace with the file's creation as closely as possible. But for large files, we don't want to run out of parts before we have hit the 5 TiB limit.
This calls for a more complex approach than just arbitrarily assigning a payload size.
Our suggested formula
We suggest the following formula as a starting place, expressed in Python code and using 64-bit, double-precision floats:
import math
from typing import Callable
# The minimum size, in bytes, for a single, non-final part upload.
MINIMUM_PART_SIZE = 5_242_880
# The maximum filesize in
MAXIMUM_PART_COUNT = 10_000
# The maximum size, in bytes, for an AWS upload.
MAXIMUM_FILE_SIZE = 5_497_558_138_880
# The data rate at which every part is an equal size, and could not
# be any uniformly larger without violating the maximum total file
# size if 10_000 parts were to be uploaded. it works out to
# ~549.8 MB per payload. By enforcing this we actually never need
# to check if a part exceeds the maximum allowed part size, as our
# parts will never exceed ~549.8 MB.
MAXIMUM_DATA_RATE = MAXIMUM_FILE_SIZE // MAXIMUM_PART_COUNT
def part_size(part_number: int, format_bytes_per_second: int) -> int:
"""
Returns the payload size for a specific part number given the file's
expected data rate.
"""
if part_number < 1:
raise ValueError("part_number must be greater than 0")
if part_number > 10_000:
raise ValueError("part_number must be less than 10,000")
# Make sure we never go above the maximum data rate or fall below the
# minimum part size, even if the data rate is lower.
data_rate = min(format_bytes_per_second, MAXIMUM_DATA_RATE)
data_rate = max(data_rate, MINIMUM_PART_SIZE)
# Calculate a scalar given our data rate. We will explain this step
# futher on in the guides.
scalar = -(2 * (125 * data_rate - 68_719_476_736)) / 8_334_583_375
part_size = math.floor(scalar * pow(part_number, 2)) + data_rate
return part_size
... where part_number
is between 1
and 10_000
, inclusive and format_bytes_per_second
is the expected average number of bytes your file is expected to consume per second. We'll go over how the formula was reached further in.
The scalar
variable and calculation might be a little perplexing at first glance, but it is a mathematical tool that ensures no matter what value we use for format_bytes_per_second
, if we feed all allowed part_number
values from 1
to 10_0000
into the function, we will receive a set of values that totals to exactly our 5 TiB filesize limit — well, as exactly as possible. We show our work further in on how we came to this formula.
By using floor rounding, we leave some bytes on the table, but ensure that regular rounding over 10,000 part does not accidentally cause us to exceed our maximum allowed filesize. At most, 10,000 bytes, or 10 KB will be left on the table this way, an acceptable tradeoff.
The important characteristics of this formula are:
- When uploading 10,000 parts, the total amount of data uploaded will be within 10 KB of our 5 TiB limit.
- Optimizes for smaller, more efficient payloads at the beginning to increase responsiveness for short and medium-length clips.
- Very long clips will have reduced responsiveness between the end of a file being written and it becoming playable in frame.
The tradeoff between the second and their point is mitigated by the fact that most clips will not reach the size where point three comes into play. We are trading increased responsiveness of MOST files for decreased responsiveness of very few.
A more advanced and efficient version of our formula (that generates an anonymous part_size_calculator
function with our static scalar and data rate precomputed and baked in) might look like this:
def create_part_size_calculator(format_bytes_per_second: int) -> Callable[[int], int]:
"""
Returns a function that takes in a `part_number` and returns a
`part_size` based on `data_rate`.
"""
# Make sure we never go above the maximum data rate or fall below the
# minimum part size, even if the data rate is lower.
data_rate = min(format_bytes_per_second, MAXIMUM_DATA_RATE)
data_rate = max(data_rate, MINIMUM_PART_SIZE)
static_scalar = -(2 * (125 * data_rate - 68_719_476_736)) / 8_334_583_375
def part_size_calculator(part_number: int) -> int:
"""Calculates size in bytes of upload for `part_number`."""
if part_number < 1:
raise ValueError("part_number must be greater than 0")
if part_number > 10_000:
raise ValueError("part_number must be less than 10,000")
return math.floor(static_scalar * pow(part_number, 2) + data_rate)
return part_size_calculator
How the formula performs.
Let's examine the output characteristics of the formula above over several common file types.
Example 1: Web format
For web-playable formats with a rate of ~5.3MB/s or less (most H.264/H.265/HEVC files), we will get a payload-size progression that looks like so:
Total Parts | Payload Bytes | Payload MB | Total File Bytes | Total File GB |
---|---|---|---|---|
1 | 5,242,896 | 5.2 MB | 5,242,896 | 0.0 GB |
1,000 | 21,575,817 | 21.6 MB | 10,695,361,357 | 10.7 GB |
5,000 | 413,566,329 | 413.6 MB | 706,957,655,928 | 707.0 GB |
10,000 | 1,638,536,679 | 1,638.5 MB | 5,497,558,133,921 | 5,497.6 GB |
Table columns key
Total Parts
: the total number of file parts uploaded to AWS.Payload Bytes
: the size of the AWS PUT payload whenpart_number
is equal toTotal Parts
.Payload MB
: AsPayload Bytes
, but in megabytes.Total File Bytes
: the total number of bytes uploaded for the file whenTotal Parts
sequential parts have been uploaded.Total File GB
: AsTotal File Bytes
, but in GB.
These values are nicely balanced for real-time uploads, especially of web-playback codecs like H.264; most will be under 10.7 GB, and therefore be completed within 1,000 parts. The payload size would never exceed 21.6 MB.
If we chewed halfway through our parts, the payload size would still never exceed 413.5 MB. The upload would total 707 GB, more than enough for the vast majority of web files.
It is only once we near the end of our allowed part count that the file size begins to balloon. However, it never exceeds 1.7 GB, well below the AWS limit of 5 GiB per part.
Example 2: Prores 422 LT
Prores 422 LT has a data-rate of 102 Mbps and generates a table like so:
Total Parts | Payload Bytes | Payload MB | Total File Bytes | Total File GB |
---|---|---|---|---|
1 | 12,750,016 | 12.8 MB | 12,750,016 | 0.0 GB |
1,000 | 28,857,758 | 28.9 MB | 18,127,308,783 | 18.1 GB |
5,000 | 415,443,954 | 415.4 MB | 735,107,948,432 | 735.1 GB |
10,000 | 1,623,525,817 | 1,623.5 MB | 5,497,558,133,958 | 5,497.6 GB |
This table reveals useful properties compared to our web-optimized formula. Within the first 1,000 parts, we are able to upload 8 GB more of file. Larger initial payloads mean we will not need to request URLs too quickly at the beginning, making the upload more efficient for the higher data rate. Our payload size at the tail of the upload process remains large.
Example 2: Camera raw
Finally, let's try a camera RAW format that has a data rate of 280 MB/s. With data coming this fast, trying to upload in 5 MiB chunks at the beginning just doesn't make sense:
Total Parts | Payload Bytes | Payload MB | Total File Bytes | Total File GB |
---|---|---|---|---|
1 | 280,000,008 | 280.0 MB | 280,000,008 | 0.3 GB |
1,000 | 288,091,460 | 288.1 MB | 282,701,200,139 | 282.7 GB |
5,000 | 482,286,516 | 482.3 MB | 1,737,245,341,542 | 1,737.2 GB |
10,000 | 1,089,146,065 | 1,089.1 MB | 5,497,558,133,870 | 5,497.6 GB |
Not only are early payloads more efficient, but we are saving over half a gig at the upper end, which will make those network calls less susceptible to adverse network events.
Showing our work
Before we pull everything together into an example uploader, let's see how we arrived at our formula.
What we needed to do was come up with a formula that traded large, heavy payloads at the end of our allowed parts — which most uploads will never reach — for light, efficient payloads near the beginning, where every upload can take advantage. At the same time, we wanted to ensure that our algorithm will land in the ballpark of the 5 TiB filesize limit right at part number 10,000.
It was time to break out some calculus.
We want our graph to grow exponentially, so our formula should probably look something like:
n^2
... where n
is the part number. We also want to ensure each part is, at minimum, the data rate for our formula, which we will call r
:
n^2 + r
Now we need to find a formula which can tell us the sum of this formula for the first 10,000 natural numbers (1, 2, 3, ...). The sigma Σ
symbol denotes summation. Let's add it to our formula:
Σxn^2 + r
... and redefine n
as the series of natural numbers between 1 and 10,000, inclusive.
The equation is not very useful to us yet. It has the right intuitive shape, but if we set n=10,000
and r=5,242,880
like we want to, it just spits out a result: 385,812,135,000
(385 GB). Not only is the result far below our filesize limit of 5 TiB, there is no way to manipulate the formula to spit out that result.
Lets give ourselves a dial to spin:
Σxn^2 + r
... where x
is scalar we can solve for to get 5 TiB as the result. Now we can set the equation equal to our filesize limit and solve for x
:
Σxn^2 + r = 5,497,558,138,880
Often, summations must be solved iteratively, as in a for
or while
loop. But it turns out there is a perfect formula for us: a known way of cheaply computing the sum of the square for the first n
natural numbers:
Σn^2 = n(n+1)(2n+1)/6
Rearranging it into a polynomial makes it easier to look at:
Σn^2 = (2n^3 + 3n^2 + n)/6
We can add our variables, x
and r
, to both sides:
Σxn^2 + r = x(2n^3 + 3n^2 + n)/6 + rn
And finally we set our new formula equal to 5 TiB:
x(2n^3 + 3n^2 + n)/6 + rn = 5,497,558,138,880
Now all we need to do is solve for x
by setting n=10,000
, our total part count. This will give us a way to compute a static scalar for a given data rate.
Rather than doing this by hand lets plug it into Wolfram Alpha:
x = -(2 (125 r - 68719476736)) / 8334583375
Now we're getting somewhere! If our data rate was the minimum part size (5 MiB), we would get a static scalar of:
136,128,233,472 / 8,334,583,375
In computerland, this represents a float64 value of 16.33293799427617
. Our formula to determine part size in this instance would be:
s = 16.33293799427617n^2 + 5,242,880
Where s
is our part size.
We still have one more problem. In the real world, we can't have a payload with non-whole bytes. We need to round each value. We'll use Python, and round down:
math.floor(16.33293799427617 * pow(part_number, 2)) + 5_242_880
We have arrived at a concrete example of the original function given in this guide.
Building a basic uploader
Let’s take a look at some simple python-like pseudocode for uploading a file being rendered in real time, using everything we have learned in this guide:
import math
from datetime import datetime, timezone
from typing import Callable
# The minimum size, in bytes, for a single, non-final part upload.
MINIMUM_PART_SIZE = 5_242_880
# The maximum filesize in
MAXIMUM_PART_COUNT = 10_000
# The maximum size, in bytes, for an AWS upload.
MAXIMUM_FILE_SIZE = 5_497_558_138_880
# The data rate at which every part is an equal size, and could not
# be any uniformly larger without violating the maximum total file
# size if 10_000 parts were to be uploaded. it works out to
# ~549.8 MB per payload. By enforcing this we actually never need
# to check if a part exceeds the maximum allowed part size, as our
# parts will never exceed ~549.8 MB.
MAXIMUM_DATA_RATE = MAXIMUM_FILE_SIZE // MAXIMUM_PART_COUNT
def create_part_size_calculator(format_bytes_per_second: int) -> Callable[[int], int]:
"""
Returns a function that takes in a `part_number` and returns a
`part_size` based on `data_rate`.
"""
...
def upload_render(data_stream: DataStream, channel: int = 0) -> None:
"""
Uploads an asset for data_stream, which is a custom IO class that pulls remaining
upload data from an internal buffer or file, depending on how well the upload is
keeping pace with the render.
Uploads to `channel`
"""
asset = c2c.asset_create(
extension=data_stream.extension,
filetype=data_stream.mimetpye,
channel=channel,
offset=datetime.now(timezone.utc) - data_stream.created_at()
)
calculate_part_size = create_part_size_calculator(data_stream.data_rate())
next_part_number = 0
while True:
next_payload_size = calculate_part_size(next_part_number)
# Waits until one or more chunks worth of data is ready for upload. Cache
# whether our data stream has completed writing the file, and the current
# number of bytes we have remaining to upload at this time.
available_bytes, stream_complete = data_stream.wait_for_available_data(
minimum_bytes=next_payload_size
)
# Build the list of parts to request based on our available data.
parts = []
while available_bytes > 0:
payload_size = calculate_part_size(next_part_number)
if available_bytes < payload_size and not stream_complete:
break
payload_size = min(payload_size, available_bytes)
parts.append(
c2c.RealtimeUploadPart(
part_number=next_part_number,
part_size=payload_size,
is_final=False
)
)
available_bytes -= payload_size
next_part_number += 1
# If our stream is done writing, mark the last part as final.
if stream_complete:
parts[-1].is_final = True
# Create the part URLs using the C2C endpoint.
response = c2c.create_realtime_parts(
asset_id=asset.id,
asset_name=None if not stream_complete else data_stream.filename,
asset_filesize=None if not stream_complete else data_stream.size(),
parts=parts
)
# Upload each part to its URL.
for part, part_url in zip(parts, response.upload_urls):
part_data = data_stream.read(bytes=part.size)
c2c.upload_chunk(part_data, part_url, data_stream.mimetype)
if stream_complete:
break
The code above only demonstrates the basic flow of uploading a file in real time. In reality, this logic will need to be enhanced with error handling and advanced upload techniques.