High-frequency time series database

In summary, the speaker is choosing between MongoDB, Kyoto Cabinet, and HDF5 as potential databases to use for high-frequency time series data. They will be inserting a large amount of data per day and are seeking advice from those with experience in these databases. The speaker also mentions that HDF5 is more versatile and Kyoto Cabinet may not be as popular, but they are open to any suggestions.
  • #1
meanrev
116
2
I'm choosing a database to write high-frequency time series data onto and have narrowed it down to MongoDB, Kyoto Cabinet or HDF5.

I will be inserting 1200 rows of 8 entries per second, cumulating about 5 GB of data per day I'm estimating.

Does anyone have experience between the three and could facilitate me in making the decision?

Thanks!
 
Technology news on Phys.org
  • #2
Well, I don't know much about databases; but I will give one silly opinion...

The thing is I have been trying to learn various things available within Python...one of those things I run into was, precisely, HDF5. When I read about HDF5, I understood that it was a storage scheme and not necessarily a database (i.e., there is no database server running with its own intelligence to answer queries or return sets or anything like that).

Of the other two choices that you mention, I just quickly read the main webpages and it looks like MonoDB is a real database (requires a server) and Kyoto Cabinet does not, this last, again, it's just a storage scheme.

So, my first opinion, if you need speed, is to forget about using a real database and stick to a storage scheme...so, Kyoto or HDF5.

It seems Kyoto talks about one key,value per line...does not seem too impressive as a storage scheme...but maybe that's where speed comes from.

HDF5, from what I remember, is actually rather versatile as far as as to what it can store.

The Kyoto site does not look like much...how popular is this?

Just becauss I learned about HDF5 before I ever heard about Kyoto, it sounds like HDF5 is more popular within the scientific/engineering community...

Anyway, that's my un-educated opinion.

gsal
 

1. What is a high-frequency time series database?

A high-frequency time series database is a specialized type of database designed to store and retrieve large volumes of time-stamped data in real-time. It is optimized for storing and processing data points at a high frequency, typically in the range of milliseconds to seconds.

2. What are the benefits of using a high-frequency time series database?

The main benefit of using a high-frequency time series database is its ability to handle large volumes of data at a high speed. This makes it ideal for use cases such as financial data analysis, real-time monitoring, and internet of things (IoT) applications. Additionally, these databases often have built-in features for data compression, aggregation, and analysis, making it easier for scientists to extract insights from the data.

3. How is a high-frequency time series database different from a traditional database?

A traditional database is designed for general-purpose data storage and retrieval, whereas a high-frequency time series database is specifically optimized for handling time-stamped data. This means that a high-frequency time series database can store and retrieve data points at a much faster rate, and also has specialized features for managing time-series data, such as data interpolation and data gap filling.

4. What types of data can be stored in a high-frequency time series database?

A high-frequency time series database can store any type of data that is time-stamped, such as stock market data, sensor readings, website traffic, or weather data. It is also capable of storing data from multiple sources simultaneously and can handle different data formats, including structured, semi-structured, and unstructured data.

5. How can a high-frequency time series database be queried?

A high-frequency time series database can be queried using a specialized query language, which is optimized for time-series data. This query language allows scientists to retrieve specific data points or perform complex analytical queries on the stored data. Some high-frequency time series databases also support standard database query languages such as SQL, making it easier for scientists familiar with these languages to work with the data.

Similar threads

  • Programming and Computer Science
Replies
29
Views
3K
  • Engineering and Comp Sci Homework Help
Replies
8
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
5
Views
2K
  • Introductory Physics Homework Help
Replies
3
Views
6K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
Replies
13
Views
2K
  • STEM Academic Advising
Replies
1
Views
1K
  • Electrical Engineering
Replies
4
Views
2K
  • Beyond the Standard Models
Replies
18
Views
3K
  • STEM Career Guidance
2
Replies
37
Views
12K
Back
Top