Choice of Pipelines for Data Analysis

In summary, the conversation discusses the use of Tensorflow (TF) for processing data and its comparison to traditional calculation methods such as Linear/Multilinear Regression. The main variable affecting this choice is the type of Activation Functions used. TF introduces hidden layers in neural networks, resulting in a non-linear fit that typically performs better. The conversation also mentions working through a Keras tutorial on a dataset and provides resources for further learning, including the books "Hands On ML with Scikit-Learn, TF, and Keras" and "The 100 pg ML book."
  • #1
WWGD
Science Advisor
Gold Member
7,010
10,470
TL;DR Summary
What kind of rules of thumb are there to decide choice of pipeline?
Hi,
So say I have some data to process. I am trying, say, Linear/Multilinear Regression. I know how to do this within Python Pandas. I can learn how with Tensorflow (TF). Would TF produce the same output given the "right" choice of Activation Functions *? Or would it output a model that is somehow "More General"?

* I assume this is the only/main variable affecting this choice and not other variables such as choice of metrics, sessions, etc.
 
Physics news on Phys.org
  • #2
Given the same model structure (choice of metrics, properly normalized data, loss function) I don't see why TF should not converge on the same coefficients as a traditional calculation. However with a neural network we introduce hidden layers that create a non-linear fit which in most cases will perform better.

Have you worked through the Keras tutorial on the fuel efficiency dataset?
 
  • Like
Likes WWGD
  • #3
Does TF stand for TensorFlow or The f$%*? ;). Thanks for your answer. Will look up the link; thanks.
 
  • Haha
Likes pbuk
  • #4
WWGD said:
Does TF stand for TensorFlow or The f$%*? ;).
I must admit to having used the words "why won't you converge you f$%*?" or similar on a number of occasions.
 
Last edited:
  • Like
Likes PhDeezNutz and WWGD

1. What is a pipeline in data analysis?

A pipeline in data analysis is a sequence of steps or processes that are used to transform raw data into meaningful insights. It involves collecting, cleaning, and organizing data, applying statistical and machine learning techniques, and visualizing the results.

2. Why is choosing the right pipeline important in data analysis?

Choosing the right pipeline is important because it can greatly impact the accuracy and reliability of the results. Different pipelines may produce different outcomes, so it is crucial to select the one that is most suitable for the specific dataset and research question.

3. What factors should be considered when selecting a pipeline for data analysis?

Some important factors to consider when selecting a pipeline for data analysis include the type and size of the dataset, the research question, the available tools and resources, and the desired outcome. It is also important to consider the expertise and experience of the data analyst in using different pipelines.

4. What are some commonly used pipelines in data analysis?

Some commonly used pipelines in data analysis include the ETL (extract, transform, load) pipeline, which involves extracting data from various sources, transforming it into a usable format, and loading it into a database for analysis. The CRISP-DM (Cross-Industry Standard Process for Data Mining) pipeline is another popular approach, which involves six phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment.

5. Is it possible to change the pipeline during the data analysis process?

Yes, it is possible to change the pipeline during the data analysis process. This may be necessary if the initial pipeline is not producing the desired results or if new information or tools become available. However, it is important to carefully evaluate the potential impact of changing the pipeline and to document any changes made for transparency and reproducibility.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
10
Views
960
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
7
Views
999
  • Set Theory, Logic, Probability, Statistics
Replies
26
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
900
  • STEM Educators and Teaching
Replies
5
Views
670
  • General Math
Replies
1
Views
816
Back
Top