First Factory

  • About Us
    • Our Values
    • Nearshore Solutions
      • Why Costa Rica
    • Team
      • About Jason
    • Inc 5000 Honoree
    • Carbon Neutral
    • Industries Served
  • Solutions
    • AI
      • AI Capabilities
    • Cloud
    • Product
    • Software Development
    • Engagement Models
  • Expertise
    • Software Engineering
    • UX/UI Design
      • UX Research
    • Project Management
    • InfoSecurity
    • Technical Expertise
  • Careers
    • Open Positions
    • Referral process
    • Employee Benefits
    • Employee Satisfaction
  • Resources
    • AI Corner
    • Startup Stories
    • Blog
    • Whitepapers
    • Client Reviews
    • Guarantee
    • FAQs
  • Contact Us

  • About Us
    • Our Values
    • Nearshore Solutions
      • Why Costa Rica
    • Team
      • About Jason
    • Inc 5000 Honoree
    • Carbon Neutral
    • Industries Served
  • Solutions
    • AI
      • AI Capabilities
    • Cloud
    • Product
    • Software Development
    • Engagement Models
  • Expertise
    • Software Engineering
    • UX/UI Design
      • UX Research
    • Project Management
    • InfoSecurity
    • Technical Expertise
  • Careers
    • Open Positions
    • Referral process
    • Employee Benefits
    • Employee Satisfaction
  • Resources
    • AI Corner
    • Startup Stories
    • Blog
    • Whitepapers
    • Client Reviews
    • Guarantee
    • FAQs
  • Contact Us

Drive the bus: How to use the event bus pattern for your ETLs

March 25, 2021

Black and red banner that says 'Drive the bus - How to use the event bus pattern for your ETLS'

There are many benefits that AWS services have to offer, including solutions that simplify the process of extracting data from different sources and injecting them into a new destination. The event bus architectural pattern, one of many architectural patterns, can be divided into three major components: The event source, the event listener and the bus channel. 

The event source is where the information will come from, and this will normally trigger any process that should start. The event listener is tightly integrated with the event source and is going to be in charge of processing whatever information comes from the event source. Lastly, the bus channel is in charge of transferring information between multiple parties, where there can be various implementations adopting different data structures. 

An ETL, or “extract transform and load,” is a procedure of copying data from one or more sources into a destination system that represents the data differently from the source. Extracting information could involve multiple steps like encoding, sorting, doing aggregations, changing formats and combining information. The information could also come from different sources, which means we can combine different options like microservices, text files, databases and much more.  This can be done using tools like SQL Server integration services or C-data sync. 

As an example, let’s imagine working on a complete project rewrite of an existing system with multiple data sources like web services, XML feeds or text files. It’s important that the data being used for development is as close to the original type as possible, as it will provide a better picture of how the application is functioning in comparison to the original.

As is the industry standard, this project would be a phased-approach release, and an agile methodology like scrum would be implemented in the development process. This would allow for a better process of pulling information from the source, as needed, and means that the first couple of sprints would let the team focus on only a couple of modules at a time.

An advantage of creating a process like the one described above is that this structure helps to manipulate data for many different purposes. One of the main core concepts of an ETL is to transform information, so if there is a limitation on what information the developers can see, you can use this process to change some values for testing purposes.

Now that you know what kind of environment you will work in, it’s time to design the ETL to pull information. This can be accomplished by selecting the trigger that will execute the entire process, which could be an action done by a user or cron expression. Next, you will define what is going to be the route and which stops this route will have.  A stop can be a new data source from which to pull information, that means new passengers in the bus, or a destination data source, where the information will be dropped. You will then need to define what will be the “passengers,”  meaning which data points will be fetched at the starting point, which ones will be dropped at a specific stop, or if new passengers will be picked up along the way.

The final step is more oriented toward the design of the system, and important considerations that should be taken care of in this process. These include:

  • Data integrity: It’s important to map what data points will need to be processed first in order to maintain a sequence of events. The whole process can be understood as a database transaction. With each step/stop, we assume that the previous ones have been executed successfully, but then if a difficult problem occurs, we get an exception and need to be able to handle that gracefully. There are multiple strategies, ranging from a complex rollback to a simple retry policy.
  • Detailed logging:  Since you are moving to a serverless architecture with microservices, it’s important to have logging information that can provide as many meaningful details that will help developers to debug any problem in the future. And here is where Cloudwatch steps into since that’s the service that will become one of your allies to solve any problem.
  • Concurrency problems: This is possibly the most difficult part of the whole process because you will need to think ahead of which scenarios can happen where the same resource can be accessed by multiple data points or multiple stops within the execution. For instance, if we have a microservice that uploads images to S3 and serves them to a CDN-like cloud front, it might be possible that invalidations will be required where there are special limits.
  • Access restrictions: This is a general consideration in every software development process, but it is important to validate that the system has the right permissions to access all of the information that is needed.

Now that the ETL process has been defined, here is what AWS provides for us through three popular services:

  • The first service that we will highlight is SQS, which is a message queuing service that enables you to decouple and scale microservices, distributed systems and serverless applications.
  • The second service is AWS Lambda,  which lets you run code without provisioning or managing servers. This means that you pay only for the compute time you consume.
  • The third and final service is Cloudwatch, which provides you with data and actionable insights to monitor your applications, optimize resources and get a unified view of the overall health of your systems.

Now, combining the concept of a stop that we described before with the services above, this will be the composition of a stop.

To recap, you will have the data coming from the event message queue, then the code handler that will be in charge of processing the data and finally, you will have all the outside resources that there will be interaction with. These could be the databases where we put information, or the different sources from where we pull information, including microservices.

This entire process helps to facilitate the interaction of the data between all of the microservices by having a mapped route of how the information is managed, and then there is a clear understanding of how to interact with the services between them. 

AWS provides a variety of services that can help to satisfy various requirements, in this case, leveraging their flexibility and scalability with their services to create a system that speeds up the process to extract information for your projects. 

Related posts

Inc. Power Partner 2025 Honoree

Backend, Building Core Logic

AI Anxiety


NEW YORK

228 Park Avenue South, #88643
New York, NY 10003
Tel: +1.646.688.5070

COSTA RICA

Plaza Cariari, Segundo Piso,
Office C54
Heredia, Costa Rica
Tel: +506 4101.8282


SOCIAL

  LinkedIn

  Facebook

  Instagram

  YouTube


COMPANY

About Us

Code of Business Ethics

Team

Our Values

DEI Statement

FAQ

Client Reviews


CONTACT US

Employment

Careers

Email: jobs@firstfactory.com


Software Development Needs

Tel: +1.646.688.5070

Contact Us keyboard_double_arrow_right


 
 
 

FIRST FACTORY © · PRIVACY POLICY

Join Our Newsletter

Signup today and be the first to get notified of new updates

Name(Required)
Email(Required)
Privacy(Required)
Serving Up Cookies

Decide for yourself if you want Cookies to sweeten your experience. We use Cookies to offer enhanced site navigation and performance, analyze site traffic, and serve targeted messaging. If you’re not in the mood for Cookies, no problem, opt-out below.

Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
  • Manage options
  • Manage services
  • Manage {vendor_count} vendors
  • Read more about these purposes
View preferences
  • {title}
  • {title}
  • {title}

Submit Your Referral

This field is hidden when viewing the form
Max. file size: 300 MB.