What is the Best Way to Get Started in Data Mining?

  • Context: Undergrad 
  • Thread starter Thread starter Alternamaton
  • Start date Start date
  • Tags Tags
    Data
Click For Summary
SUMMARY

The best way to get started in data mining involves a solid foundation in undergraduate mathematics and statistics, complemented by knowledge in physics or engineering. Reilly Atkinson emphasizes the importance of practical experience and mentorship, suggesting that aspiring data analysts should focus on real-world applications rather than theoretical exercises. Key statistical techniques mentioned include factor analysis, regression, and cluster analysis, which are essential for analyzing complex datasets. For beginners, starting with business statistics books and learning calculus as needed is recommended.

PREREQUISITES
  • Undergraduate-level mathematics, including calculus
  • Statistical analysis techniques such as regression and factor analysis
  • Basic knowledge of physics or engineering principles
  • Understanding of business problems and how to quantify them
NEXT STEPS
  • Study business statistics through practical textbooks
  • Learn differential and integral calculus with a focus on word problems
  • Explore advanced statistical methods like time series analysis
  • Seek mentorship from experienced data analysts or professionals in the field
USEFUL FOR

Individuals interested in data mining, aspiring data analysts, students in mathematics or statistics, and professionals looking to enhance their analytical skills in business contexts.

Alternamaton
Messages
5
Reaction score
0
Hi all,

A while back I set out to learn Calculus, but I quickly lost interest. I realize now that my problem was my failure to set a clear goal. I was just reading through chapters of online textbooks and doing exercises, but I never felt like I was getting any closer to anything. Sure, I was "learning Calculus", but there's really no end to Calculus, or math in general for that matter. You can "learn" it forever, all day just sitting there reading textbooks.

So I asked myself why I felt the need to learn Calculus, and the answer is that I feel like I am not equipped to analyze the world at large adequately. I constantly wonder about the causes of things, particularly in the context of large groups of people. When I realized this, I realized that the real reason that I need to learn Calculus is so that I can use statistical methods to analyze data about people.

This brings us to the subject of this thread: data mining. I'll admit up front that I haven't "done my research" regarding the topic--that is, I haven't scoured the web for resources on it before resorting to asking for information on a forum. However, I think that such an approach is backwards, anyway; if you are interested in learning about a topic, it is much, much more efficient to find an expert in that topic and ask him to point you in the right direction than it is to wander around blindly, reading this article and that book on the subject at random. For instance, I have played and studied chess for several years, and I know for a fact that someone would be better off asking me where to start in terms of learning the game than they would be wandering the internet reading articles and ebooks on chess.

Anyway, with that disclaimer aside, can anyone with experience in the field of data minining or statistical analysis of data "point me in the right direction"? In my own opinion, the best course of action would be to learn differential and integral calculus by doing mostly word problems (I am a huge believer in word problems when it comes to really learning math), and then to do the same for statistical analysis.

Can anyone modify or flesh out this plan for me (preferrably with useful links? :D)?

Thank you for your time.
 
Physics news on Phys.org
Data mining is just a fancy, marketing generated name, for (statistical) analysis. For centuries, going back at least to Tycho Brahe, Copernicus, Kepler, and on and on, people have been doing data mining. Planck's invention of the quantum was based on data mining.

In more modern times, data mining is primarily focussed on finding patterns in data. Finding patterns in spending of credit card users, or in temporal patterns in renewals of magazine subscriptions are typical business data-mining problems. The credit card issue was to find patterns that might be used to better targeting of automobiles. The subscription work was oriented toward finding optimal times for mailing subscription renewal notices. I've been involved in these and many other similar problems for the past almost 40 years. Most of us who were doing this kind of work in the 1970s and on never knew we were doing data mining, we called it statistical analysis or just plain analysis.

The credit card problem, with a sample size of over a million card holders extending over three years required factor analysis, some regression, and ultimately cluster analysis. We started with neural networks, but they didn't work well. This problem took three PhDs almost a year to solve. The plain fact was that most people tended to exhibit similar spending patterns -- we finally used some non-linear transformations to tease out distinct patterns. The subscription problem was soved using various forms of regression - LMS and logistic.

To do this kind of work you need a sophisticated knowledge of math, preferably through differential equations, a strong level of comfort with statistics, including time series analysis, and the ability to turn business problems into quantitative form -- not always easy, if it were not for clients, the work would be simple.

The best data analysts I've ever encountered were either trained as physicists or engineers -- solving problems with math is central to these disiplines.

The best route to success: solid knowledge of undergraduate math and statistics, some physics or engineering, and a business course or two. Then, often the best way to learn is to find a mentor -- a boss, a friend, whatever. Ultimately the kind of work that goes with data mining requires a strong intuition -- the real problems are seldom similar to the academic course problems.

On your own, start with a book(s) on business statistics; keep it practical. You do need calculus to get a good grounding in statistics, but learn it as needed.

Regards,

Reilly Atkinson
 
Last edited:

Similar threads

  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 14 ·
Replies
14
Views
5K
  • · Replies 9 ·
Replies
9
Views
5K
  • · Replies 11 ·
Replies
11
Views
3K
Replies
4
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
Replies
10
Views
5K