When to Use Dense Array Observations?
To simplify, I have the test code below:
Contact us to discuss your requirements of Dense Array Observations. Our experienced sales team can help you identify the options that best suit your needs.
from scipy.sparse import csr_matrix, dok_array, issparse
import numpy as np
from tqdm import tqdm
X = np.load('dense.npy')
# convert it to csr sparse matrix
#X = csr_matrix(X)
print(repr(X))
n = X.shape[0]
with tqdm(total=n*(n-1)//2) as pbar:
cooccur = dok_array((n, n), dtype='float32')
for i in range(n):
for j in range(i+1, n):
u, v = X[i], X[j]
if issparse(u):
u = u.toarray()[0]
v = v.toarray()[0]
#import pdb; pdb.set_trace()
m = u - v
min_uv = u - np.maximum(m, 0)
val = np.sum(min_uv - np.abs(m) * min_uv)
pbar.update()
Case 1: Run as it is - the time usage is like this (2min 54sec):
Case 2: Uncomment the line X=csr_matrix(X)
(just for comparison), the running time is 1min 56sec:
It is strange that the operation on a dense array takes longer. The array was subsampled for this test; there is a significant run-time difference between sparse and dense arrays for the original, due to the large number of iterations.
I put the code into a function and used a line profiler to examine the time usage. My findings are: 1. Slicing is much slower for sparse matrices; 2. The three lines above the last line are significantly faster in Case 2; 3. The total runtime is shorter for Case 2, even though slicing and converting to a dense vector takes extra time.
I am perplexed as to why these three lines took different amounts of time in Case 1 and Case 2 when they involve exactly the same numpy vectors in both cases. Any explanations?
The dense.npy
file is uploaded to here to reproduce the observation.
Sparse Matrices For Efficient Machine Learning Pipelines
What are Sparse Matrices?
There are two common matrix types, dense and sparse. The major difference is sparse matrices have lots of zero values, while dense matrices don’t have any. Here is an example of a sparse matrix with four columns and four rows.
Image source — Wikipedia
In the above matrix, 12 out of 16 are zeros. Just a simple question,
Can we store only non-zero values to compress the size of matrices while following our regular machine learning pipelines?
The simple answer is, Yes, We can!!
How so? Let me explain. We can easily convert higher sparse matrices to compressed sparse row matrices in short-form CSR matrices. It's not the only way to do so. But we require to apply matrix operations and efficiently access the matrix. There are more options to store sparse matrices. Some of them are,
- Dictionary of keys (DOK)
- List of Lists (LIL)
- Coordinate list (COO)
- Compressed row storage (CRS)
One of the drawbacks of sparse matrices is accessing the individual elements becomes more complex. Here is a quick guide for selecting a suitable data structure for your use case.
Your concern is efficient modification — Use DOK, LIL, or COO. These are typically used to construct the matrices.
Additional reading:
Acoustic Signals of a Meteoroid Recorded on a Large‐N ...
“Power Quality Analyzer Market” with Business OverviewWant more information on Smartsolo Node Seismometers? Feel free to contact us.
Your concern is efficient access and matrix operations — Use CSR or CSC
For simplicity, let's dive into some examples. Consider the below matrix.
Image by Author
Consider the case where we convert the above matrix into a CSR matrix. Let's begin with a simple example. Here, I'm using the scipy.sparse
module.
import numpy as np
from scipy import sparse#create the metrix with numpy
m = np.array([[1,0,0,0],
[0,1,2,0],
[0,0,0,0],
[2,1,1,1]])
#convert numpy array into scipy csr_matrix
csr_m = sparse.csr_matrix(m)
An important point to note here is that while our original matrix stores data in a 2-D array, the converted CSR matrix stores them in three 1-D arrays.
Image by Author
Value array
As the name implies, this stores all non-zero elements in the original matrix. The length of the array equals the number of non-zero entries in the original matrix. In our example, we have 7 non-zero elements. Hence, the length of the value array is 7.
Column index array
This array stores the column indices of elements in the value array. (Note that zero-based indices are used here)
Row index array
This array stores the cumulative count of the nonzero values in all current and previous rows. row_index_array[j] encodes the total number of nonzeros above row j. The last element represents the number of non-zero elements in the original array. The length is m + 1; where m is the number of rows in the original matrix.
Image by Author
Well, now we have successfully converted our original matrix into CSR format. With the above explanation and image, I hope you understand how csr_matrices work under the hood.
For more SmartSolo IGU-16 5Hz information, please contact us. We will provide professional answers.
66
0
0
Comments
All Comments (0)