How to: Upload (Real-time)

Introduction

So far we have learned how to upload assets when their size and other properties are already known, but what if we want to upload an asset as it is being recorded, rendered, or streamed?

This guide will cover how to upload a real-time asset. The Real-time Uploads API enables asset to be playable in Frame.io seconds after they are done being recorded.

Demo video

If you would like to get a quick preview of this API in action, take a look at our video demo!

Watch as a render is uploaded out of Adobe Media Encoder in real time, with the video being playable in Frame.io 5 seconds after the render completes. With real-time uploads your integration can be more responsive than ever!

What will I need?

If you haven’t read the Implementing C2C: Setting Up guide, give it a quick glance before moving on!

You will also need the access_token you received during the C2C hardware or C2C Application authentication and authorization guide.

In this guide we will be using a test asset found at this Frame.io link. Hit the Download button to download it. Using the example file will help you match all the values in our example curl commands.

It is also recommended that you are familiar with How to: Upload (Basic), guide, as this guide will assume you are familiar with the basic upload flow.

Create a real-time asset

Uploading an asset in real time begins with how that asset is created. When creating your asset, is_realtime_upload must be set to true, and filesize must be elided or set to null. Normally, filesize is required, but if you are uploading a file as it is being created, that obviously can't be known.

{
curl -X POST https://api.frame.io/v2/devices/assets \
    --header 'Authorization: Bearer [access_token]' \
    --header 'Content-Type: application/json' \
    --header 'x-client-version: 2.0.0' \
    --data-binary @- <<'__JSON__' 
        {
            "name": "C2C_TEST_CLIP.mp4", 
            "filetype": "video/mp4", 
            "is_realtime_upload": true
        }
__JSON__
} | python -m json.tool

API endpoint specification

Docs for /v2/devices/assets can be found here, and is identical to the /v2/assets endpoint.

Extension and filename

When uploading a real-time asset, an extension is required. However, some uploaders may not know the filename before the asset data has finished being generated. In those instances, the extension field may be used to supply an extension in the format of .[extension], like '.mp4'. When supplying extension only, this field should be preferred, as it will allow you to update the asset name later.

Let's send our payload. We'll get this response:

{
    "id": "{asset_id}",
    "name": "C2C_TEST_CLIP.mp4"
}

The response for real-time assets is significantly stripped down compared to regular asset creation. Note in particular that upload_urls is missing. For real-time assets, we will be generating our upload URLs on-demand!

Requesting upload URLs

Let's request the URL for the first half of our file — exactly half of the file — 10,568,125 bytes. We'll need to include the id from above as the asset_id in the URL below:

{
curl -X POST https://api.frame.io/v2/devices/assets/{asset_id}/realtime_upload/parts \
    --header 'Authorization: Bearer [access_token]' \
    --header 'Content-Type: application/json' \
    --header 'x-client-version: 2.0.0' \
    --data-binary @- <<'__JSON__' 
        {
            "parts": [
                {
                    "number": 1,
                    "size": 10568125,
                    "is_final": false
                }
            ]
        }
__JSON__
} | python -m json.tool

API endpoint specification

Docs for /v2/devices/assets/{asset_id}/realtime_upload/parts can be found here.

This request will fetch a single URL. Let's break down the request fields:

parts: A list of upload parts we wish to generate upload URLs for. As this is a list, we can batch-request URLs if desired to be more efficient.
- number: The part number/index, starting at 1. Part numbers may be skipped, and may be uploaded in any order, but will be used to concatenate the final file in sequential order. Cannot be greater than 10,000 (the maximum number of parts that AWS allows).
- size: The size of the part in bytes. If the size does not abide by the AWS Multi-Part upload restrictions, an error will be returned.
- is_final: Whether this URL is for the final part of the file.

When we make the request, we should get a response that looks like so:

{
    "upload_urls": [
        "https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[part_01_path]"
    ]
}

The upload_urls list will be in the same order as the parts request field.

Now we can upload our data just like a basic upload:

head -c 10568125 ~/Downloads/C2C_TEST_CLIP.mp4 | \
curl -X PUT https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[part_01_path] \
        --include \
        --header 'content-type: video/mp4' \
        --header 'x-amz-acl: private' \
        --data-binary @-

Next, we need to request the URL for our second, and final, part:

{
curl -X POST https://api.frame.io/v2/devices/assets/{asset_id}/realtime_upload/parts \
    --header 'Authorization: Bearer [access_token]' \
    --header 'Content-Type: application/json' \
    --header 'x-client-version: 2.0.0' \
    --data-binary @- <<'__JSON__' 
        {
            "asset_filesize": 21136250,
            "parts": [
                {
                    "number": 2,
                    "size": 10568125,
                    "is_final": true
                }
            ]
        }
__JSON__
} | python -m json.tool

Note that is_final, is set to true. If this field is set to false, Frame.io will wait forever for the final part and the asset will never become playable in Frame.

Also note the asset_filesize field which contains the full filesize for the upload. This field is required when an object in parts has is_final set to true.

Asset filesize

asset_filesize may be supplied with any request, but MUST be supplied with the final part request.

We will get another response payload like before:

{
    "upload_urls": [
        "https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[part_02_path]"
    ]
}

... which we can use to upload the second half of our file:

tail -c 10568125 ~/Downloads/C2C_TEST_CLIP.mp4 | \
curl -X PUT https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[part_02_path] \
        --include \
        --header 'content-type: video/mp4' \
        --header 'x-amz-acl: private' \
        --data-binary @-

Uploading the final part

When the final part is uploaded, it kicks off a process in our backend to stitch the parts together into a single file. This process will only wait a 60-second grace period for all other parts to complete uploading, For this reason, we recommend waiting to upload the final part until all other parts are uploaded.

That's it! Navigate to Frame.io and you will see that your asset has been uploaded. Your first real-time upload is complete! 🎉🎉🎉

Asset names

As discussed above, it is possible to supply an extension field and omit the name field on asset creation when the full asset name is not known:

{
curl -X POST https://api.frame.io/v2/devices/assets \
    --header 'Authorization: Bearer [access_token]' \
    --header 'Content-Type: application/json' \
    --header 'x-client-version: 2.0.0' \
    --data-binary @- <<'__JSON__' 
        {
            "extension": ".mp4", 
            "filetype": "video/mp4", 
            "is_realtime_upload": true
        }
__JSON__
} | python -m json.tool

Response:

{
    "id": "{asset_id}",
    "name": "[new file].mp4"
}

When an asset has been created this way, it will be automatically assigned the name [new file].[ext]. You may supply an asset_name field at any point when requesting upload URLs to update it:

{
curl -X POST https://api.frame.io/v2/devices/assets/{asset_id}/realtime_upload/parts \
    —-header 'Authorization: Bearer [access_token]' \
    —-header 'Content-Type: application/json' \
    --header 'x-client-version: 2.0.0' \
    —data-binary @- <<'*JSON*' 
        {
            "asset_name": "C2C_TEST_CLIP.mp4",
            "asset_filesize": 21136250,
            "parts": [
                {
                    "number": 2,
                    "size": 10568125,
                    "is_final": true
                }
            ]
        }
*JSON*
} | python -m json.tool

The asset name will only be updated if the asset still has a default name, otherwise it will be ignored, including if the name was updated in the Frame.io UI.

Batching URLs

It is recommended that you request as many URLs as you currently have data to supply, rather than requesting a single URL per request. This strategy, while a little more complex logically, will ensure that files with enough data for thousands of URLs are handled efficiently when upload speed is not able to keep pace with uploads.

Minimum part size and media file headers.

The minimum part size for a non-final part is determined by AWS, and is 5 MiB (5,242,880 bytes). Some file formats may require a header at the front of the file that is not written until the rest of the file data has been rendered. Thus, we have a problem: if we need to delay writing the header until all other data has been written, and it is smaller than the minimum 5,242,880 bytes, then we will not be able to upload it as part_number=1.

We suggest that in these cases, you hold back the first 5,242,880 bytes of your media data, and begin uploading parts with part_number=2. Once your data has finished being generated, you can pre-append the header to this held-back chunk of media data, request a URL for part_number=1, then upload the part. This ensures that your first chunk will always meet the minimum part size requirement.

Implementing a scaling part size formula.

The AWS limit for a single file is 5 TiB (5,497,558,138,880 bytes). The maximum number of parts a file can be split into is 10,000.

We want to make sure that we can upload the maximum filesize with the number of parts available to us, especially since — with real-time uploads — we can't know the filesize ahead of time. Make our parts too small, and we may blow through all 10,000 of them far too quickly, while also leaving a healthy chunk of bytes on the table. But make our parts too large, and we lose the ability to start uploading small files before they are completed.

Let's looks at some hard numbers. If we request the minimum part size of 5,242,880 bytes for all 10,000 parts, we can upload a total of 52,428,800,000 bytes (52.4 GB), or 1% of the total filesize available to us.

But if we were to evenly distribute the total filesize, each part would be 5,497,558,138,880 / 10,000 bytes, or ~550 MB. Thats a very large payload if it turns out our file only ends up being 80 MB. We wouldn't gain any benefit from uploading it in real time.

For small files we want to keep payload size close to the minimum so we can keep pace with the file's creation as closely as possible. But for large files, we don't want to run out of parts before we have hit the 5 TiB limit.

This calls for a more complex approach than just arbitrarily assigning a payload size.

Our suggested formula

We suggest the following formula as a starting place, expressed in Python code and using 64-bit, double-precision floats:

import math
from typing import Callable

# The minimum size, in bytes, for a single, non-final part upload.
MINIMUM_PART_SIZE = 5_242_880
# The maximum filesize in
MAXIMUM_PART_COUNT = 10_000
# The maximum size, in bytes, for an AWS upload.
MAXIMUM_FILE_SIZE = 5_497_558_138_880

# The data rate at which every part is an equal size, and could not
# be any uniformly larger without violating the maximum total file
# size if 10_000 parts were to be uploaded. it works out to
# ~549.8 MB per payload. By enforcing this we actually never need
# to check if a part exceeds the maximum allowed part size, as our
# parts will never exceed ~549.8 MB.
MAXIMUM_DATA_RATE = MAXIMUM_FILE_SIZE // MAXIMUM_PART_COUNT

def part_size(part_number: int, format_bytes_per_second: int) -> int:
    """
    Returns the payload size for a specific part number given the file's
    expected data rate.
    """
    if part_number < 1:
        raise ValueError("part_number must be greater than 0")

    if part_number > 10_000:
        raise ValueError("part_number must be less than 10,000")

    # Make sure we never go above the maximum data rate or fall below the
    # minimum part size, even if the data rate is lower.
    data_rate = min(format_bytes_per_second, MAXIMUM_DATA_RATE)
    data_rate = max(data_rate, MINIMUM_PART_SIZE)

    # Calculate a scalar given our data rate. We will explain this step
    # futher on in the guides.
    scalar = -(2 * (125 * data_rate - 68_719_476_736)) / 8_334_583_375
    part_size = math.floor(scalar * pow(part_number, 2)) + data_rate

    return part_size

... where part_number is between 1 and 10_000, inclusive and format_bytes_per_second is the expected average number of bytes your file is expected to consume per second. We'll go over how the formula was reached further in.

Scalar value

The scalar variable and calculation might be a little perplexing at first glance, but it is a mathematical tool that ensures no matter what value we use for format_bytes_per_second, if we feed all allowed part_number values from 1 to 10_0000 into the function, we will receive a set of values that totals to exactly our 5 TiB filesize limit — well, as exactly as possible. We show our work further in on how we came to this formula.

Floor Rounding

By using floor rounding, we leave some bytes on the table, but ensure that regular rounding over 10,000 part does not accidentally cause us to exceed our maximum allowed filesize. At most, 10,000 bytes, or 10 KB will be left on the table this way, an acceptable tradeoff.

The important characteristics of this formula are:

When uploading 10,000 parts, the total amount of data uploaded will be within 10 KB of our 5 TiB limit.
Optimizes for smaller, more efficient payloads at the beginning to increase responsiveness for short and medium-length clips.
Very long clips will have reduced responsiveness between the end of a file being written and it becoming playable in frame.

The tradeoff between the second and their point is mitigated by the fact that most clips will not reach the size where point three comes into play. We are trading increased responsiveness of MOST files for decreased responsiveness of very few.

A more advanced and efficient version of our formula (that generates an anonymous part_size_calculator function with our static scalar and data rate precomputed and baked in) might look like this:

def create_part_size_calculator(format_bytes_per_second: int) -> Callable[[int], int]:
    """
    Returns a function that takes in a `part_number` and returns a
    `part_size` based on `data_rate`.
    """

    # Make sure we never go above the maximum data rate or fall below the
    # minimum part size, even if the data rate is lower.
    data_rate = min(format_bytes_per_second, MAXIMUM_DATA_RATE)
    data_rate = max(data_rate, MINIMUM_PART_SIZE)

    static_scalar = -(2 * (125 * data_rate - 68_719_476_736)) / 8_334_583_375

    def part_size_calculator(part_number: int) -> int:
        """Calculates size in bytes of upload for `part_number`."""
        if part_number < 1:
            raise ValueError("part_number must be greater than 0")

        if part_number > 10_000:
            raise ValueError("part_number must be less than 10,000")

        return math.floor(static_scalar * pow(part_number, 2) + data_rate)

    return part_size_calculator

How the formula performs.

Let's examine the output characteristics of the formula above over several common file types.

Example 1: Web format

For web-playable formats with a rate of ~5.3MB/s or less (most H.264/H.265/HEVC files), we will get a payload-size progression that looks like so:

Total Parts	Payload Bytes	Payload MB	Total File Bytes	Total File GB
1	5,242,896	5.2 MB	5,242,896	0.0 GB
1,000	21,575,817	21.6 MB	10,695,361,357	10.7 GB
5,000	413,566,329	413.6 MB	706,957,655,928	707.0 GB
10,000	1,638,536,679	1,638.5 MB	5,497,558,133,921	5,497.6 GB

Table columns key

Total Parts: the total number of file parts uploaded to AWS.
Payload Bytes: the size of the AWS PUT payload when part_number is equal to Total Parts.
Payload MB: As Payload Bytes, but in megabytes.
Total File Bytes: the total number of bytes uploaded for the file when Total Parts sequential parts have been uploaded.
Total File GB: As Total File Bytes, but in GB.

These values are nicely balanced for real-time uploads, especially of web-playback codecs like H.264; most will be under 10.7 GB, and therefore be completed within 1,000 parts. The payload size would never exceed 21.6 MB.

If we chewed halfway through our parts, the payload size would still never exceed 413.5 MB. The upload would total 707 GB, more than enough for the vast majority of web files.

It is only once we near the end of our allowed part count that the file size begins to balloon. However, it never exceeds 1.7 GB, well below the AWS limit of 5 GiB per part.

Example 2: Prores 422 LT

Prores 422 LT has a data-rate of 102 Mbps and generates a table like so:

Total Parts	Payload Bytes	Payload MB	Total File Bytes	Total File GB
1	12,750,016	12.8 MB	12,750,016	0.0 GB
1,000	28,857,758	28.9 MB	18,127,308,783	18.1 GB
5,000	415,443,954	415.4 MB	735,107,948,432	735.1 GB
10,000	1,623,525,817	1,623.5 MB	5,497,558,133,958	5,497.6 GB

This table reveals useful properties compared to our web-optimized formula. Within the first 1,000 parts, we are able to upload 8 GB more of file. Larger initial payloads mean we will not need to request URLs too quickly at the beginning, making the upload more efficient for the higher data rate. Our payload size at the tail of the upload process remains large.

Example 2: Camera raw

Finally, let's try a camera RAW format that has a data rate of 280 MB/s. With data coming this fast, trying to upload in 5 MiB chunks at the beginning just doesn't make sense:

Total Parts	Payload Bytes	Payload MB	Total File Bytes	Total File GB
1	280,000,008	280.0 MB	280,000,008	0.3 GB
1,000	288,091,460	288.1 MB	282,701,200,139	282.7 GB
5,000	482,286,516	482.3 MB	1,737,245,341,542	1,737.2 GB
10,000	1,089,146,065	1,089.1 MB	5,497,558,133,870	5,497.6 GB

Not only are early payloads more efficient, but we are saving over half a gig at the upper end, which will make those network calls less susceptible to adverse network events.

Showing our work

Before we pull everything together into an example uploader, let's see how we arrived at our formula.

What we needed to do was come up with a formula that traded large, heavy payloads at the end of our allowed parts — which most uploads will never reach — for light, efficient payloads near the beginning, where every upload can take advantage. At the same time, we wanted to ensure that our algorithm will land in the ballpark of the 5 TiB filesize limit right at part number 10,000.

It was time to break out some calculus.

We want our graph to grow exponentially, so our formula should probably look something like:

n^2

... where n is the part number. We also want to ensure each part is, at minimum, the data rate for our formula, which we will call r:

n^2 + r

Now we need to find a formula which can tell us the sum of this formula for the first 10,000 natural numbers (1, 2, 3, ...). The sigma Σ symbol denotes summation. Let's add it to our formula:

Σxn^2 + r

... and redefine n as the series of natural numbers between 1 and 10,000, inclusive.

The equation is not very useful to us yet. It has the right intuitive shape, but if we set n=10,000 and r=5,242,880 like we want to, it just spits out a result: 385,812,135,000 (385 GB). Not only is the result far below our filesize limit of 5 TiB, there is no way to manipulate the formula to spit out that result.

Lets give ourselves a dial to spin:

Σxn^2 + r

... where x is scalar we can solve for to get 5 TiB as the result. Now we can set the equation equal to our filesize limit and solve for x:

Σxn^2 + r = 5,497,558,138,880

Often, summations must be solved iteratively, as in a for or while loop. But it turns out there is a perfect formula for us: a known way of cheaply computing the sum of the square for the first n natural numbers:

Σn^2 = n(n+1)(2n+1)/6

Rearranging it into a polynomial makes it easier to look at:

Σn^2 = (2n^3 + 3n^2 + n)/6

We can add our variables, x and r, to both sides:

Σxn^2 + r = x(2n^3 + 3n^2 + n)/6 + rn

And finally we set our new formula equal to 5 TiB:

x(2n^3 + 3n^2 + n)/6 + rn = 5,497,558,138,880

Now all we need to do is solve for x by setting n=10,000, our total part count. This will give us a way to compute a static scalar for a given data rate.

Rather than doing this by hand lets plug it into Wolfram Alpha:

x = -(2 (125 r - 68719476736)) / 8334583375

Now we're getting somewhere! If our data rate was the minimum part size (5 MiB), we would get a static scalar of:

136,128,233,472 / 8,334,583,375

In computerland, this represents a float64 value of 16.33293799427617. Our formula to determine part size in this instance would be:

s = 16.33293799427617n^2 + 5,242,880

Where s is our part size.

We still have one more problem. In the real world, we can't have a payload with non-whole bytes. We need to round each value. We'll use Python, and round down:

math.floor(16.33293799427617 * pow(part_number, 2)) + 5_242_880

We have arrived at a concrete example of the original function given in this guide.

Building a basic uploader

Let’s take a look at some simple python-like pseudocode for uploading a file being rendered in real time, using everything we have learned in this guide:

import math
from datetime import datetime, timezone
from typing import Callable

# The minimum size, in bytes, for a single, non-final part upload.
MINIMUM_PART_SIZE = 5_242_880
# The maximum filesize in
MAXIMUM_PART_COUNT = 10_000
# The maximum size, in bytes, for an AWS upload.
MAXIMUM_FILE_SIZE = 5_497_558_138_880

# The data rate at which every part is an equal size, and could not
# be any uniformly larger without violating the maximum total file
# size if 10_000 parts were to be uploaded. it works out to
# ~549.8 MB per payload. By enforcing this we actually never need
# to check if a part exceeds the maximum allowed part size, as our
# parts will never exceed ~549.8 MB.
MAXIMUM_DATA_RATE = MAXIMUM_FILE_SIZE // MAXIMUM_PART_COUNT

def create_part_size_calculator(format_bytes_per_second: int) -> Callable[[int], int]:
    """
    Returns a function that takes in a `part_number` and returns a
    `part_size` based on `data_rate`.
    """
    ...

def upload_render(data_stream: DataStream, channel: int = 0) -> None:
    """
    Uploads an asset for data_stream, which is a custom IO class that pulls remaining
    upload data from an internal buffer or file, depending on how well the upload is
    keeping pace with the render.

    Uploads to `channel`
    """

    asset = c2c.asset_create(
        extension=data_stream.extension, 
        filetype=data_stream.mimetpye, 
        channel=channel,
        offset=datetime.now(timezone.utc) - data_stream.created_at()
    )

    calculate_part_size = create_part_size_calculator(data_stream.data_rate())
    next_part_number = 0

    while True:
        next_payload_size = calculate_part_size(next_part_number)

        # Waits until one or more chunks worth of data is ready for upload. Cache 
        # whether our data stream has completed writing the file, and the current 
        # number of bytes we have remaining to upload at this time.
        available_bytes, stream_complete = data_stream.wait_for_available_data(
            minimum_bytes=next_payload_size
        )

        # Build the list of parts to request based on our available data.
        parts = []
        while available_bytes > 0:
            payload_size = calculate_part_size(next_part_number)

            if available_bytes < payload_size and not stream_complete:
                break

            payload_size = min(payload_size, available_bytes)

            parts.append(
                c2c.RealtimeUploadPart(
                    part_number=next_part_number,
                    part_size=payload_size,
                    is_final=False
                )
            )

            available_bytes -= payload_size
            next_part_number += 1

        # If our stream is done writing, mark the last part as final.
        if stream_complete:
            parts[-1].is_final = True

        # Create the part URLs using the C2C endpoint.
        response = c2c.create_realtime_parts(
            asset_id=asset.id,
            asset_name=None if not stream_complete else data_stream.filename,
            asset_filesize=None if not stream_complete else data_stream.size(),
            parts=parts
        )

        # Upload each part to its URL.
        for part, part_url in zip(parts, response.upload_urls):
            part_data = data_stream.read(bytes=part.size)
            c2c.upload_chunk(part_data, part_url, data_stream.mimetype)

        if stream_complete:
            break

Advanced uploading

The code above only demonstrates the basic flow of uploading a file in real time. In reality, this logic will need to be enhanced with error handling and advanced upload techniques.

Next Up

Real-time uploads offer a way to make your integration as responsive as possible, with assets becoming playable in Frame.io seconds after they have finished recording. A later guide will cover advanced uploading techniques and requirements. Although it is written with basic uploads in mind, the majority of the guide will still be applicable to real-time uploads.

If you haven’t already, we encourage you to reach out to our team, then continue to the next guide. We look forward to hearing from you!