Flat File Ingestion Compared to API Ingestion

Unit21 offers two different data ingestion pathways: API and Flat-File. The former is generally a good fit for customers with complicated data transformation requirements or who want to use our Real-Time Rules product. For more straightforward use cases, the latter approach reduces the amount of engineering resources needed to integrate and can provide a quicker integration process.

Overview of Our Data Ingestion Pathways

Unit21 offers two different ways for our users to bring their data into the system:

  • API Ingestion - An ETL-driven approach where you transform your data into the Unit21 schema and send it to us via our API.
  • Flat-File Ingestion - An ELT-driven approach where you send us your raw data via flat files. These can be uploaded to an S3 bucket or directly through our Dashboard. We transform it (with limits) into the Unit21 schema for you. Note while we support the majority of common transformations, more complex data schemas may require you to do some additional work prior to sending us the data.

Both of these ingestion pathways are equally supported and robustly tested by the Unit21 team. Both pathways also adhere to the standard Unit21 service level agreements (SLAs), guaranteeing you priority assistance when you need it. Our Implementation Team can provide support with either pathway as well, so we’ve written this guide to help you decide which pathway makes the most sense given your needs.

API Ingestion Pathway

The Unit21 API is a standard REST API that is exposed to our customers securely over HTTPS. We offer extensive documentation and a regularly-updated API reference on our documentation hub.

Ingesting data into Unit21 via our API is an ETL-driven approach, meaning that you must extract your data from your database(s), transform it into a schema that Unit21 can understand, and load it into our system via the API.

To assist with the transformation step, we provide a data schema mapping template (displayed on the right) that our Implementation Team will utilize to assist you in determining which fields need to be extracted and how they must be transformed to compose the API request.

The primary advantage of the API Ingestion Pathway is its technical flexibility, assuming you have the requisite engineering resources and/or desire to take advantage of it. For the majority of our customers, our Flat-File Ingestion Pathway is able to easily transform customer data into the Unit21 schema without hiccups, but for more complicated use cases, it may make more sense for you to handle the transformation step on your side before sending us the data—in which case, the API may be the preferred route.

One distinction between the API and Flat-File Ingestion Pathways is how often you send us the data. Some of our customers using the former simply chain their Unit21 API calls into their existing transaction processing logic. This means as soon as they process a new transaction, they send it to Unit21 shortly thereafter. These transactions are queued for data ingestion immediately.

In contrast, users of the Flat-File Ingestion Pathway send their data payloads a few times a day. This is typically not a problem due to the way our detection modeling system works. Many of our customers have rules which run on a semi-frequent basis (e.g, every 24 hours), so there’s no need for the transactions to be sent continuously.

Our Implementation Team can help you think through your operational needs and how frequently you will need to run detection models on your data. The API Ingestion Pathway makes more sense for the subset of our customers that need to run these models very frequently. Likewise, customers that may want to use our Real-Time Rules (RTR) product down the road will need to integrate with us via API, so it may make sense to do so now if that is of interest.

Flat-File Ingestion Pathway

Whereas the API is ETL-first, requiring customers to transform their data before they send it to Unit21, the Flat-File Ingestion Pathway (FFIP) takes a different approach: it’s ELT-first and allows you to simply send your raw data with minimal transformations to Unit21 and let us handle the remaining transformation work for you.

Flat-File Ingestion consists of the following steps:

  1. You work with our Implementation Team to determine the data requirements for the project and extract the applicable data from your system. This step is identical to the API Ingestion Pathway’s extract step.
  2. You send us an initial set of data in CSV or JSON format. Most of the time, this data can be in its raw format, but occasionally you will need to perform some light transformations ahead of time (check the next section, IV. Limitations of FFIP’s Transformations, for more details). You can upload the data directly through our web app, programmatically through our API’s Import endpoint (documentation available here) which provides a pre-signed URL to an Amazon S3 bucket, or by giving Unit21 shared access to an S3 bucket that you own. We recommend uploading data via the UI for testing purposes, then create an automated data pipeline via pre-signed url generation or S3 bucket sharing.
  3. Our Implementation Team works with you to transform the data using a graphical interface in the Unit21 Dashboard called the Data Mapping UI. This no-code tool requires significantly less engineering time than a traditional, code-based API integration. Using the initial dataset, the Data Mapping UI develops a schema mapping that can be used for future transformations of any data you send us via the Import endpoint.
  4. You continually load data files into the Unit21 system—through the Dashboard or programmatically—and our system automatically transforms and ingests them for you.

The biggest advantage of using the Flat-File Ingestion Pathway is that it requires less engineering investment from your team. Because it is an ELT-driven approach, we are able to assist you with the transformation step using our Data Mapping UI (displayed above) and save you the equivalent effort. In addition to being able to map between two different schema sets, we’re able to apply a variety of transformations to your data, including casting strings to numbers, converting between datetime formats, and pre-fixing/post-fixing your data as needed.

In many cases, setting up an integration using FFIP with a modern database is as straightforward as creating a view with the subset of data you’d like to extract, writing a simple SQL query to pull the requisite data, and exporting the results via a CSV file which you send to us via S3. Unit21’s Data Ingestion & Management (DIM) Team, which built FFIP, has found that we’re able to set up an automated process to send these flat files to our S3 bucket in less than a single engineer’s day of work, which is an order of magnitude faster than an equivalent API-based integration.

Another advantage of the Flat-File Ingestion Pathway is that it is more friendly with loading historical data via a large backfill than the API. Our API has rate limiting in place which prevents you from sending in more than 200 records/per second. This requires you to build the network throttling logic during the initial integration process to load historical data, whereas FFIP will simply automatically throttle large flat files and handle ingestion for you. This means during an initial integration with Unit21, you can simply backfill all of your data in one go without having to worry about throttling it yourself.

We should mention that while the Data Mapping UI is quite advanced and being improved all the time, we can’t handle every possible use case. Please be aware that more complicated transforms might require light data engineering work on your side before you send us the flat file. For the majority of our customers though, the Data Mapping UI is more than adequate enough to meet their needs.

Limitations of the Flat-File Ingestion Pathway

While the Flat-File Ingestion Pathway is designed to minimize the transformation work required for integration, it is not intended to address all use cases. When customer-facing transformation work is required, it’s usually because the customer has a particularly complex data model, a unique business model, or requires a set of transformations not yet supported by the Data Mapping UI.

An overview of some of the edge cases we’ve encountered with our Flat-File Ingestion Pathway (FFIP) is listed below:

  • JOINs between data from different streams. FFIP does not support performing a JOIN between data from different streams. If you require a JOIN to successfully ingest your data into Unit21, this will have to be done as a pre-transformation step before sending your batch data to us.
    • In practice, because our API/FFIP automatically upsert data against with the same primary key, we do support a “naive” form of JOINing. For example, you can have one stream that ingests an entity and provides information while another stream enriches that entity, using the same entity_id, and adds additional data to the object.
    • The above workaround does not work if you’re not able to the same primary key for the object in both streams—and it must specifically be the designated external_id primary key that Unit21 specifies in our API spec. This means that even if you have a secondary unique identifier present in both streams, if it is not the entity_id, instrument_id, or event_id that was used to create that object in Unit21, you cannot upsert using that secondary key as the index.
    • This means that the most important pre-transformation step you must take is ensuring that you choose a primary key for each object type and consistently include it wherever you want to reference that object across your various FFIP streams.
  • Some constraints around our Real-Time Rules product. Our Real-Time Rules product allows you to pass a request to the Unit21 API and determine if a transaction should be blocked or accepted within milliseconds. The specific detection models used to power this decision response rely upon aggregations from the customer and transactional data you send to Unit21. These aggregations are only as up-to-date as the frequency with which you ingest your data into Unit21, so depending upon your business requirements, you may find our Real-Time Rules product to be better supported by an API integration where you stream live data updates to Unit21 continuously.
  • Robust, but preset transformation options. We include a variety of preset transformations to help transform your data into its intended destination format, but we cannot anticipate every possible use case. Here are the preset transformations we currently support via our no-code Data Mapping UI:
  • LIST OF SUPPORTED TRANSFORMATIONS
  • Case format
    • UPPER_CASE or CamelCase
  • Prefix
    • Add a string prefix
  • Postfix
    • Add a string postfix
  • Datetime format
    • Cast from a datetime of Epoch seconds, Epoch milliseconds, ISO 8601, a user-indicated Strptime format, or automatically attempt to cast using Python’s built-in auto-cast functionality
  • Find/Replace
    • A simple string match-and-replace function (first occurrence or global)
  • MD5 Hash
    • Hash the origin field using the MD5 algorithm
  • Cast to Number
    • Cast from a string to a number
  • Cast to boolean
    • Cast from a string to a boolean
  • ID Clean
    • Replaces whitespace, parentheses, and comma characters with an underscore to generate an ID-like value (e.g., “Smith, John” → “Smith_John”)
  • JSON Loads
    • Attempts to load a string field as a JSON object
  • Value Map
    • Matches/replaces a list of values respectively
  • Math
    • Accepts the origin field as an operand and allows you to specify a second, fixed operand as well as choose from a list of operators (+, -, *, /) to perform basic math
  • Remove escape sequences
    • Remove all occurences of an escape sequence (e.g., carriage return) from an origin value
  • Complex filtering of rows within a stream. Our Row Filtering feature allows for including/excluding rows on the basis of a single key value, but we don’t support filtering ingested data with more complex conditional logic.
  • No symmetric encryption. While the API supports symmetric encryption, FFIP does not currently support end-to-end encryption. However, all files are encrypted in transit (via SSL) and stored in AWS which adheres to its own rigorous, industry-standard security protocols.

Please note that we’re actively adding features and updating the FFIP every day. As we discover multiple customers experiencing the same limitation, we actively prioritize our roadmap accordingly. Feel free to share your feedback or unique case with your Unit21 contact and they'll pass it on the the engineering team.

Concluding Thoughts

If you’re not sure whether the API or Flat-File is the right way to go, we encourage you to start with the latter! It will require less engineering work on your side, and you can always revert to doing an API-based integration if needed down the road. In the event your data requires light transformation work before sending it to Unit21, you can also always take a hybrid approach: perform the transformations and then still send the data via flat file.

Here’s a quick summary of the two ingestion pathways as a recap of what this document covered:

API INGESTION PATHWAY

A more custom integration process that provides technical flexibility, but is more time-consuming for your team

  • Requires more engineering effort to build the integration
  • Requires you to provide the network throttling logic if you’re rate-limited (e.g., you want to send in more than 200 records/sec during a backfill)
  • Potentially supports faster aggregations for powering rules with our Real-Time Rules product (depends on how you setup your API integration exactly)
  • Maintains backwards compatibility and provides a stable integration experience
  • Adheres to the standard Unit21 SLAs
  • Strong support and guidance from the Unit21 Implementation Team
  • Support for symmetric encryption (via the fernet specification)

FLAT-FILE INGESTION PATHWAY

  • A potentially more lightweight integration process for customers with a straightforward data model
  • Less engineering effort required overall for most of our customers
  • Automatically handles the network throttling logic for you, without any intervention required (e.g., you can just upload a 300mb file, and FFIP will automatically throttle the ingestion process for you)
  • Supports our Real-Time Rules product but with increased latency in updating your rules’ aggregations
  • Maintains backward compatibility and provides a stable integration experience
  • Adheres to the standard Unit21 SLAs
  • Strong support and guidance from the Unit21 Implementation Team
  • No support for symmetric encryption

Regardless which ingestion pathway you choose, our Implementation Team is here to guide you through Unit21’s implementation process. Please reach out to your Implementation Manager if you have specific questions that are not answered in this guide.