Segway for Presentation: SQl Server/Database and Data Mining

  • #1
WWGD
Science Advisor
Gold Member
5,673
5,765
Hi All,
I need to do a presentation in the subject areas of SQL Server, general Database into the area of Data Mining for a job interview. Any Ideas? First thought is the use of SSAS ( Analysis Service) and SSDT ( Data Tools) from SQL Server in Data Mining. But this does not seem clear-enough to me. I am not sure whether one actually uses SQL or t-SQL in either of these platforms. What language are the Machine-Learning (ML) programs written?
Any other suggestions?
Thanks.
 

Answers and Replies

  • #2
13,177
7,078
Read up on Apache Spark and how it distributes the work of machine learning.

https://en.m.wikipedia.org/wiki/Apache_Spark

You might get some general data mining insight from this IBM Redbook

http://jliusun.bradley.edu/~jiangbo/Redbooks/sg245252IMGuide.pdf

Data mining uses different strategies to find trends in data. One such strategy is to train a neural net to identify customers with a certain trait or behavior using other customers who've demonstrated that behavior or have that trait.

As an example, we have a database of customers to a bank. We want to find out who are dissatisfied with the bank and are thinking of leaving. We train a neural net using customers who have left and ask it to score the remaining customers and then we select out those customers with the highest scores and try to market me bank product to them to keep them as a customer.

SQL is used to extract the customers into a file and the mining tools process the file outputting a score. SQL is used to add the score back to the database. SQL is used to extract the high scoring customers for our marketing campaign.

Apache Spark could be used to manage the mining process is a more efficient distributed fashion,
 
Last edited:
  • #3
WWGD
Science Advisor
Gold Member
5,673
5,765
Thanks, Jedi, I assume if we have an OLTP setup we would want to denormalize, while if we have an OLAP, we may want to eliminate redundancy and we do keep a normalized database? EDIT: I will owe you if I get my ( entry level) big data job.
 
Last edited:
  • #4
13,177
7,078
Yes, that's basically it. We found that SQL queries to collect all the data from the various star schema tables while data mining was far slower than making a flat denormalized file of data to mine. I think this is still true and is used by Apache Spark as it distributes the data across the network machine.
 
  • #7
13,177
7,078
I'm no longer in the data mining area. I moved on to scientific programming a few years ago but we're looking at using Apache Spark for a project. However, who knows what'll happen next.

Cheers, take care. Good luck with the job interview, try not to get bogged down in the details of their questions and answer honestly and confidently as they can't expect you to know everything about data mining but just knowing the terms and strategies will convince them.

Remember when you don't know something say so and then say you'll definitely review that or research that. Try to change things into a dialog instead of a question answer with you providing suggestions on how you can help them with their work.
 

Related Threads on Segway for Presentation: SQl Server/Database and Data Mining

Replies
8
Views
1K
  • Last Post
Replies
2
Views
1K
Replies
8
Views
2K
Replies
1
Views
917
Replies
1
Views
1K
  • Last Post
Replies
12
Views
1K
  • Last Post
Replies
1
Views
2K
Replies
5
Views
2K
  • Last Post
Replies
5
Views
2K
Replies
1
Views
791
Top