Looking for advice in clusterization

  • I
  • Thread starter Frank Einstein
  • Start date
  • Tags
    Time series
  • #1
Frank Einstein
TL;DR Summary
I need to know how to cluster data measured at different time instants.
Hello everyone. I have a machine with a series of sensors. All sensors send a signal each minute. I want to know if any of those sensors are redundant. The data is available as an Excel file, where the columns are the variables and the rows are the measurements. I have 1000 rows.

To do this, I have used DBSCAN in Python as

Data clusterization:
scaler = StandardScaler()
data_normalized = scaler.fit_transform(data)
data_normalized = data_normalized.T
dbscan = DBSCAN(eps=15, min_samples=2)
clusters = dbscan.fit_predict(data_normalized)

However, I think that there has to be a better way to find relationships between variables (each sensor or columns of the data file).

Could someone please point me towards a methodology more suitable for my goals?
Any answer is appreciated.
Tanks for reading.
Best regards.
Physics news on Phys.org
  • #2
You can just look at the correlation matrix. If two inputs are highly correlated then you can probably drop one.
  • Like
Likes WWGD, FactChecker and Frank Einstein
  • #3
Dale said:
You can just look at the correlation matrix. If two inputs are highly correlated then you can probably drop one.
Thanks. I can calculate them with ease as well.