Traditional relational databases are row-oriented, with each row having a row-id and each field within the row stored together in a table. Every time you look something up in a row-oriented database, every row is scanned, regardless of which columns you require.
The primary aim of the search is to retrieve the most relevant matches to the user’s queries, excluding other general content from the website. The system should be able to narrow down the search by using ranges (price, dates, sizes, etc.), sorting (by popularity, date, price) and filtering (including only desirable parameters). When we talk about web apps where information changes dynamically (prices, description details, availability of goods), it is extremely important to have near real-time updates; for example, in eCommerce or booking engines to show goods and services available in stock, engines can provide recommendations when looking up for the most interesting products or information, to improve the user experience.
Time series data is a set of values that is organized by hours. Examples of time series data include sensor data, stock quotes, click sequence data, and application telemetry data. Time series data can be analyzed to search for historical trends, real-time alerts, or for predictive modeling.
Time series data represents how a resource or process changes over time. Data is time-stamped, but more importantly, time is the most significant axis for viewing or analyzing data. Typically, time series data arrives in chronological order and is usually treated as an insert rather than an update to the database. For this reason, change is measured over time, allowing you to look back and predict future changes. Therefore, time series data is best viewed with scatter or line charts.
Choose a time series solution when you need to ingest data whose strategic value focuses on changes over a period of time, and you're mostly inserting new ones and updating them only on rare occasions, if they ever update. You can use this information to detect anomalies, visualize trends, and compare current data with historical data, among other things. This type of architecture is also ideal for predictive modeling and forecasting results, since it has the historical record of changes over time, which can be applied to any number of forecasting models. The use of time series offers the following advantages: They clearly represent how a resource or process changes over time. They help to quickly detect changes in a number of related sources, making it easy to highlight anomalies and emerging trends clearly. Best suited for predictive modeling and forecasting.
OpenTSDB is a distributed and scalable time series database. This database introduces concepts such as metrics and labels, and designs a set of data models for time series scenarios. It uses HBase as storage in the underlying layer and adopts the special row key design based on characteristics of time series scenarios to improve the ability to aggregate and query time series data.
TimescaleDB is a time series SQL database that presents a specific schema and manages table fragments by time. The underlying TimescaleDB layer is based on PostgreSQL.
Microsoft Time Series Insight (preview) provides a fully managed end-to-end storage and query solution for highly contextualized IoT scale data optimized for time series. Its powerful visualization enables interactive ad-hoc data analysis and asset-based data reporting. In addition, this service supports warm data analysis and raw data analysis by data types. Users are billed separately for storage and queries used.
Timestream is applicable in scenarios like IoT and application operations. Provides an adaptive query processing engine to quickly analyze data, automatically summarize, retain, layer, and compress data. Timestream users are billed separately for writes, stored data, and scanned query data and can achieve efficient administration with this serverless service.
| Column-Oriented DB | Search DB | Time-Series | |
|---|---|---|---|
| Benefits | High performance on aggregation queries (like COUNT, SUM, AVG, MIN, MAX) Highly efficient data compression and/or partitioning True scalability and fast data loading for Big Data Accessible by many 3rd party BI analytic tools Fairly simple systems administration |
full-text search (by simple words and phrases or multiple forms of a word or phrase) Multifield search Highlighting (visual indicator of the searched words) search by synonyms Autocomplete suggestions faceted search (opperations of counting) fuzzy search(algorithms such as levestain algorithm, to detect typos or misspellings) geospacial search(an object such as location according to its latitude an longitude) |
includes functions and operations common to time-series data analysis such as data retention policies, continuous queries, flexible time aggregations, etc. can handle a large scale of data |
| advantages for certain types of systems | Data Warehouses and Business Intelligence Customer Relationship Management (CRM) Library Card Catalogs Ad hoc query systems |
1. Near real-time 2. High scalability 3. Not only indexer but used as a database 4. Visualization of the Data 5. Machine Learning for anomaly detection and outlier detection 6. Fast search |
Monitoring software systems: Virtual machines, containers, services, applications Monitoring physical systems: Equipment, machinery, connected devices, the environment, our homes, our bodies Asset tracking applications: Vehicles, trucks, physical containers, pallets Financial trading systems: Classic securities, newer cryptocurrencies Eventing applications: Tracking user/customer interaction data Business intelligence tools: Tracking key metrics and the overall health of the business |
| Disadvantages | Transactions are to be avoided or just not supported Queries with table joins can reduce high performance Record updates and deletes reduce storage efficiency Effective partitioning/indexing schemes can be difficult to design |
The configuration of the system is not very intuitive Many updates in the database might affect the retrival of results It is needed to review the date converge, many problem matching for recen databases |
TSDB needs to handle ideal patterns for writing and reading data in parallel as scale TSDB are sometimes complicated of use at the beginning, more if the DB is open source |
| DB | SAP Sybase IQ Infobright Vertica (HP) ParAccel Microsoft SQL Server |
* ELASTIC SEARCH * SOLR * SPLUNK * SPHINX |
TimescaleDB Timestream Microsoft Time Series Insight (preview) OpenTSDB |
References