Finding Linearly Dependent Rows in a Large Matrix

  • Context: Graduate 
  • Thread starter Thread starter Big Data Alan
  • Start date Start date
  • Tags Tags
    Linearly Matrix
Click For Summary

Discussion Overview

The discussion revolves around identifying linearly dependent rows in a large, sparse symmetric matrix, particularly in the context of analyzing network traffic data represented in matrix form. Participants explore methods for determining linear independence and the implications of round-off errors in this analysis.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning

Main Points Raised

  • One participant inquires about methods to identify sets of linearly dependent rows in a large symmetric matrix, expressing interest in a continuous function to represent linear independence.
  • Another participant suggests that for a full matrix, identifying linear dependence is challenging without significant computational resources, but proposes using eigenvalues or singular values for sparse matrices to find almost linearly dependent rows.
  • A third participant clarifies that their matrix is sparse and mentions using Mahout's Singular Value Decomposition program, noting its handling of near-zero eigenvalues and its potential insights into linear dependence.
  • A participant provides context for their large matrix, explaining it represents network traffic data with unique IP addresses, and outlines specific questions they hope to answer using eigenvectors related to network structure and behavior.
  • The same participant discusses the nature of their data, emphasizing that matrix elements are integers and mentioning concerns about round-off errors due to data sampling and packet loss.

Areas of Agreement / Disagreement

Participants appear to agree on the challenges of analyzing large matrices, particularly regarding computational limits. However, there are differing views on the methods and tools suitable for identifying linear dependence, and the discussion remains unresolved regarding the best approach.

Contextual Notes

Participants reference specific computational tools and methods, but there are limitations related to the size of the matrix and the nature of the data that may affect the analysis. The discussion does not resolve the effectiveness of the proposed methods or the implications of round-off errors.

Who May Find This Useful

This discussion may be useful for researchers or practitioners working with large data matrices, particularly in fields such as network analysis, data science, or applied mathematics, who are interested in linear dependence and eigenvalue analysis.

Big Data Alan
Messages
3
Reaction score
0
I have a large real symmetric square matrix (with millions of rows/columns). How can I identify the sets of rows that are linearly dependent?

More generally, can I determine linear independence of rows with a continuous function where, say, the function is 1.0 for a row that is linearly independent and 0.0 when it is linearly dependent. I am interested in identifying rows that are almost linearly dependent. For example, say that due to round-off error rows A and B are identical in all columns except that they differ by 1 part per million in one column's entries.
 
Physics news on Phys.org
If your matrix with 1,000,000 rows is full, the answer is "with great difficulty", unless you have access to a world-class supercomputing facility.

If the matrix is sparse, a good way would be to find the smallest eigenvalues or singular values. The corresponding vectors give you linear combinations that are "almost" linearly dependent.
 
Thanks. I should have mentioned that it is a very sparse matrix. Most rows have just a handful of non-zero entries although a few rows are exceptionally dense. I have been using the Mahout's Singular Value Decomposition program http://mahout.apache.org to find the eigenvectors. This program rejects the near-zero eigenvalue (in its "cleaning" stage) but I will check it out to see what it can tell me about the linearly dependent rows.
 
In response to the questions posted by Windowmaker - which I received notice of but cannot find within this thread: What are you doing with a matrix that big? Is it the first LA course?

I am afraid that I have strayed far from my physics education. I have a collection of nearly 2 billion packets that transited the gateways connecting our corporate network to the Internet over a few days. These involve 2.4 million unique IP addresses. So I created a 2.4M by 2.4M matrix where the indices map to the IP addresses and the elements are the number of packets exchanged between them. Perhaps surprisingly you can find individual eigenvectors fairly quickly but finding all of them is not practical. The first few eigenvectors primarily involve sums and differences of the key nodes (in terms of numbers of IPs they are in contact with and the numbers of packets exchanged).

The sorts of questions I hope the eigenvectors will answer are:
1) Which are the principal nodes in the network?
2) Given a node, which nodes are sending similar numbers of packets to the same other nodes? (For example, I have noticed similar network attacks emanating from multiple IPs and would like an algorithmic approach to identifying the clones/copy-cats.)
3) Which nodes share a structural similarity? (For example, given the IP address of the DNS server can I figure out which one is the backup DNS server?)

Note that the matrix elements are all integers; my comment about round-off error was really about minor differences in packets counts due to a) time-limited data sample, b) lost packets being resent, ...
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
1
Views
4K
Replies
2
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 4 ·
Replies
4
Views
7K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K