What is time-series data?
Time-series data is present everywhere, and typically, we are talking about significant amounts of data. Time-series data typically consists of data points that may represent instances of events that are tracked and aggregated continuously over a period of time. Every kind of data point with a timestamp can be considered time-series data. This could be monitoring data (performance metrics and logs), financial data, health data, temperature data, etc.
Time-series data has exploded in popularity, and the value of tracking and analyzing how things change
over time has become evident in every industry: DevOps and IT monitoring, industrial manufacturing,
financial trading and risk management, sensor data, ad tech, application eventing, smart home systems,
autonomous vehicles, and more. [6]
In Conversational eCommerce, the system will interact will multiple users simultaneously. Each interaction will generate valuable information that the system can store. The data will mostly consist of messages sent by the system and the user. Therefore, the amount of conversational data to be held will be vast, and we should guarantee that the storing system can perform at scale.
What is a time-series database?
A time-series database is purposely built and optimized for collecting, storing, retrieving and processing time-series data. They are databases designed to work at scale, with vast volumes of data being added. Time-series databases are optimized to deal will large volumes of
inserts and typically have built-in functions and operations to do time-series data analysis.
Time-series databases have key architectural design properties that make them very different from other
databases. These include time-stamp data storage and compression, data lifecycle management,
data summarization, ability to handle large time-series dependent scans of many records, and time
series aware queries. [1]
Current trends in the database industry show that time-series databases are the fastest growing database category and are gaining more popularity than other databases categories over the last two years. [2]
Such shows that their usage is becoming increasingly popular and that evermore people are choosing them to store and analyse this type of data. Two of the possibles reasons for this increased popularity are:
-
Increasing usage of time-series data, so the need to store it is becoming a growing necessity.
-
Increasing number of business use cases that benefit from this technology.
Image extracted from [2].
Why should we use a time-series database to store time-series data? What are the benefits?
Storing time-series data in time-series databases can leverage several benefits in the short and long term.
A couple of those benefits are listed below [8].
-
Time-series databases are optimized to handle a massive volume of inserts; typically, they assume an append-only mode which significantly improves insert performance.
-
Time-series databases can offer massive scale, from performance improvements, including higher ingest rates and faster queries at scale.
-
Time-series databases can offer an efficient data storage.
-
Time-series databases allow us to easily measure how data changes over time and allow time-series data analysis.
-
Time-series databases have a short response time that enables real-time analytics.
Which databases can handle time-series data?
Currently, we have several ways to store chronological data. Either in non-relational databases such as Mongo and Cassandra or relational databases like PostgreSQL. Additionally, we have plugins such as TimescaleDB, which allows us to operate over chronological data.
Albeit some of the databases mentioned above were not explicitly developed for time-series analysis and therefore are not optimized for it, they provide many characteristics worth studying, which we present next.
Database Category | Database Name | Database principal features | Performance and Scalability | Drawbacks |
---|---|---|---|---|
noSQL | MongoDB |
|
|
|
noSQL | Cassandra |
|
|
|
SQL | PostgreSQL |
|
|
|
Time series | Timescale DB |
|
|
|
Why should we use a time-series database to store Conversational data?
There are many challenges when working with time-series data. One of them is the massive amount of data collected at any given time, which is inherent to this type of data. Therefore, each use case will be prevalent when choosing the right technology. This is a question of whether collecting data or does one intends to perform advanced analytics tasks or machine learning over it?
Being aware of the full potential of the data and how data is collected and stored is crucial to extracting the most profit out of it. Conversational data within the iFETCH project can be used in several different ways, from tracking the progress of conversations and improving them using machine learning to predicting how customer tastes evolve over the dialogues.
The possibility of doing advanced analytics in real-time is another area of great interest. In the iFETCH context, enabling real-time analytics can bring several benefits, such as identifying the most popular and least popular products for a specific period. Traditionally this is done in a posterior analysis which is also beneficial for our business. Nevertheless, doing this analysis in real-time, during the high peak season, for example, brings additional importance, as it allows us to understand which products are being seen the most, which products will run out of stock or even monitor each component of the system and understand which one is struggling. Time-series databases support real-time analysis, an outstanding and valuable feature.
Traditional relational databases are not prepared to deal with time-series data but can be optimised to deal with this type of data. Still, non-relational databases typically scale better, and so they tend to be used to store time-series data. That being said, using NoSQL databases could also be an option to follow. The scalability and performance of the database should be taken into account.
Benchmarks made by two of the most popular time-series databases, InfluxDB and TimescaleDB, showed that both databases had outperformed Cassandra and MongoDB for time-series data in several parameters [3][4][5][6]. In addition, the TimescaleDB benchmark [7] has also proved that Timescale DB outperformed PostgreSQL for time-series. However, those benchmarks only have leverage that time-series databases can achieve higher performance when it comes to query speed, use less disk space and have greater write throughput than the competitors’ databases. Also, as the databases are prepared for time-series data, there is no need to do complex implementations to collect, retrieve and analyse the data.
Considering all of the arguments presented above, we are confident to say that using time-series databases to store conversational data is the way to go both in terms of scalability and performance. Also, the complete set of features of time-series databases will enable us to extract the full potential of the data.
References
[1] “Time Series Database (TSDB) Guide: Influxdb.” InfluxData, March 9, 2022. https://www.influxdata.com/time-series-database/.
[2]“Engines Ranking per Database Model Category.” DB-ENGINES. Accessed April 20, 2022. https://db-engines.com/en/ranking_categories.
[3]“InfluxDB Outperforms Cassandra by 4.5x.” InfluxData, February 9, 2021. https://www.influxdata.com/resources/benchmarking-influxdb-vs-cassandra-for-time-series-data-metrics-and-management/.
[4] Churilo, Chris. “InfluxDB Is 2.4x Faster vs. Mongodb for time-series Workloads.” InfluxData, March 24, 2020. https://www.influxdata.com/blog/influxdb-is-27x-faster-vs-mongodb-for-time-series-workloads/.
[5] Hampton, Lee. “Benchmarking Cassandra vs. Timescaledb for Time-Series Data.” Timescale Blog. Timescale Blog, November 4, 2020. https://www.timescale.com/blog/time-series-data-cassandra-vs-timescaledb-postgresql-7c2cc50a89ce/.
[6] Kiefer, Rob. “MongoDB Time-Series - A NoSQL vs. SQL Database Comparison.” Timescale Blog. Timescale Blog, January 7, 2022. https://www.timescale.com/blog/how-to-store-time-series-data-mongodb-vs-timescaledb-postgresql-a73939734016/.
[7] Kiefer, Rob. “TimescaleDB vs. Postgresql for Time-Series.” Timescale Blog. Timescale Blog, January 14, 2022. https://www.timescale.com/blog/timescaledb-vs-6a696248104e/.
[8] Kulkarni, Ajay. “What Is Time-Series Data and Why Do I Need A Time-Series Database?” Timescale Blog. Timescale Blog, January 7, 2022. https://www.timescale.com/blog/what-the-heck-is-time-series-data-and-why-do-i-need-a-time-series-database-dcf3b1b18563/.
[9] “Hypertables.” TimescaleDB - Timeseries database for PostgreSQL. Accessed April 20, 2022. https://docs.timescale.com/timescaledb/latest/overview/core-concepts/hypertables-and-chunks/#hypertable-benefits.
Blog photo by energepic.