Highly loaded system 2 billion messages per day for 7 months

Artem Lebedev, Delivery Director of Format Koda, shares his experience in designing high-load systems for the transmission, storage and processing of large amounts of data, talks about the features of the project, tools and best practices for their use.

In today’s era of ubiquitous digitalization, many companies are faced with large amounts of data. These systems must withstand heavy loads.

The question arises: how can systems stay operational and meet the required SLAs?

Our company has extensive experience in working with High End systems, the main criteria of which are high load, high availability, fault tolerance and low latency. Consider an example from our practice – the creation of a highly loaded system for receiving and processing 2 billion messages per day.

About approaches to work and technologies

In addition to the main standard characteristics for the system, the team implemented functional requirements: 24/7 availability, horizontal scalability and real-time analytics.

To implement this project, we used the world-famous Hadoop technology, the HBase database, ClickHouse, and the Kafka message broker. The choice of these technologies allowed us to implement several stages of checks at the reception, followed by filtering out those messages that did not meet the customer’s criteria with an SLA of less than 100 ms.

One of the key parameters of the system is the ability to process a large amount of data. Therefore, during development, in addition to regular testing, the team conducted load testing sessions. The results showed that the maximum load is about 23,000 messages per second. And this is 20% higher than the requirements of the customer. For more than two years of commercial operation, the availability of our system has significantly exceeded international standards for such systems. And this despite 80 releases during its operation.

Effective team building

We managed to significantly speed up the development process: such systems are usually developed for about a year by teams of at least 10-12 people. We reduced this period to 7 months and implemented the entire project with a team of only 6 people.

A similar result was achieved thanks to:
– using all the possibilities of flexible methodologies as the main element of planning and development;
– use of the principle of continuous integration;
– reducing the influence of managing managers on the development team – Techlead, aka Scrummaster;
– daily interaction with the customer;
– a separate test stand, which was a complete copy of the industrial one;
– prompt response to periodic challenges on weekends and at night.
At present, the system has been in commercial operation for more than 2 years. Since the release, the amount of data has grown more than 5 times and reaches a volume of about 3000 TB.

Peculiarities. Challenges. Solutions.

I would like to say a few words about the complex tasks of the project. One of the typical difficulties we have encountered is the use of personal data. We did not have direct access to the customer’s databases. All development was carried out on a separate circuit with further releases by the customer himself. Therefore, our testers, in addition to standard testing of the functionality, had to prepare the most detailed instructions.

Another challenge was the decision on the part of the customer in the middle of development to change the format for receiving data without changing the release date. In the current realities, this is a fairly common situation, since external circumstances are constantly changing and companies need to adapt to them. It was important for us, as developers, to help them with this.

Now let’s talk about solutions. We managed to agree with the customer on one final release without intermediate ones – this allowed us to gain quite a lot of time that we didn’t have to spend on planning, assembling, etc. Also, thanks to a well-thought-out microservice architecture, we quickly and precisely replaced the original modules of the system, added new rules without affecting the overall functionality. At the moment, our system has more than three hundred modules.
An important factor in system performance that requires special attention is fine-tuning of equipment, which directly affects the speed of work in general. Often, the recommended hardware parameters do not meet the criteria for the system that we set during development, so we also provide the necessary support for hardware configuration.

How to choose an artist

When choosing a contractor for similar tasks, I recommend paying attention to the following points. First, the company’s portfolio should include successfully completed projects on high-load systems. Secondly, proven experience in agile methodologies, as well as experience in working with closed customer loops without direct access to databases.

Do not be afraid of complex tasks – for each of them there is an optimal solution.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button