Presentation: This article was presented at the DC API User Group on January 5th 2016. The presentation version is available on Google Slides.

I do a lot of work reading and writing to APIs and the projects have many similarities. One system has data that another system could benefit from and I develop the strategy and code to get the data from system A to system B. In this post I am sharing the common formula and variations and best practices I've learned.

The basic model of a system integration is that System B needs data that System A has. System A may or may not need data back from System B. Sometimes there may not even be a System B and you are just interacting with System A to perform a process periodically. In all of these cases you need to consider five common patterns:

  • What are the systems and where do they live?
  • How you will interface with the system(s)
  • What frequency will the integration run on and/or what triggers it
  • Mapping the data and which system owns different parts of the data
  • What business logic or data transformations are needed and where that will live
  • Feedback loops and error handling
  • Testing

Systems

The systems I see include any and all of the following:

  • Customer Relationship Management
  • Enterprise Resource Planning
  • Accounting / Billing Systems
  • Point of Sale Systems
  • System Logs
  • Data Warehouses
  • Telematics/Tracking Devices
  • Subscriber Databases
  • Custom Processes

The things that concern me the most with these systems are if they are Windows Based or Unix Based, and if they are hosted externally or behind a firewall. I will need to adjust the code I use based on what type of operating system it is running and also take into consideration who will be maintaining the system in the long run. For in-house systems there is the added consideration of security polices among disparate systems. For hosted systems those security policies still exist but just in a different form.

Interfaces

My preferred interface with any system is a well documented and stable API. Sometimes, especially in Enterprise software, you run into APIs that are incomplete or poorly documented. Some systems don't have APIs. When APIs aren't available or incomplete you can go direct to the database. The big pitfall here is that if the underlying database schema is upgraded as part of software updates your code will break. Another option is to look at the existing reporting features of software and see if it can export to flat files. Flat files are not ideal, but in many cases existing processes are already built upon them and leveraging the existing systems is less time consuming than conquering a new API.

Frequency and Triggers

How frequently the integration needs to run and if the data is pushed or polled are the next big considerations. Anytime data is being pushed to my systems I write an API to interface with it. I may change the architecture slightly depending on the volume and frequency of data, but always an API is developed for receiving pushed data. When my systems need to poll for data then the frequency matters. If it is something that happens in "near real time" meaning every few seconds to every minute, I will write a service or daemon that is always running. If the process can be scheduled to run anywhere from every 5 minutes to once a month, then a console application that is scheduled to run on an interval will suffice and provide some flexibility to re-run if necessary.

Basic Integration

Data Ownership, Mapping and Transformation

After you have figured out the best type of integration to write and what languages/frameworks you are going to use, the next biggest step is to document the data mappings. An order number in System A may not be the same as an order number in System B and all parties must be in agreement as to what the data looks like in both systems. Any translations that need to happen should be documented. I use the following spreadsheet format to accomplish this. I have included sample data to show how data in System A may need to be translated into data in System B.

Source Field Source Example Destination Field Destination Example Transformation Rule
Line Item # R0005678001O Warehouse ID R First Character of Source Line Item
Line Item # R0005678001O Order Number 0005678 Characters 2-9 of Source Line Item
Line Item # R0005678001O Line Item # 001 Characters 10-13 of Source Line Item
Line Item # R0005678001O Order Type O or P Last Character of Source Line Item. O is for Order, P is for Pickup.

Which system will be the system of record for different systems is also crucial. A real world example I have encountered is when a Customer's address is stored in a CRM system, but the geocoded latitude and longitude used for shipping is calculated in another system. The two need to stay in sync with the CRM serving as the system of record for changes to the address but the geocoded lat/long being sent back to the CRM after geocoding.

Feedback Loops and Error Handling

How errors will be handled and what feedback loops exist to correct issues is a critical consideration as well. I have seen (and built) systems where data goes into a black hole and issues are not noticed until a downstream process uses it incorrectly, or perhaps they are just never noticed. Wherever possible, very strict data quality enforcement should be enacted so that there is immediate feedback that an issue has occurred so it can be corrected and/or retried. Error data needs to be succinct and actionable. If floods of non-actionable data is created during a process it will most likely get ignored.

Testing and Client Acceptance

It should go without saying that testing is important, but in many software project testing is always the first to get axed when budgets or time-lines run short. In data integrations testing will take more time than usual and you will almost certainly run into unexpected data permutations. Test driven development and iterative development so that testing can happen as you go will help, but the most important thing from a developers perspective here is to cover yourself. Have a defined test plan that gets signed off on as part of the client acceptance process.

I have a newsletter...

Many of my posts end up in Digital Ambit's monthly newsletter. It is the best way to keep up with what Dagny and I are doing in the business world. I appreciate your support and will only send you things we think are valuable.