Segway for Presentation: SQl Server/Database and Data Mining

WWGD · Aug 3, 2017

Hi All,
I need to do a presentation in the subject areas of SQL Server, general Database into the area of Data Mining for a job interview. Any Ideas? First thought is the use of SSAS ( Analysis Service) and SSDT ( Data Tools) from SQL Server in Data Mining. But this does not seem clear-enough to me. I am not sure whether one actually uses SQL or t-SQL in either of these platforms. What language are the Machine-Learning (ML) programs written?
Any other suggestions?
Thanks.

jedishrfu · Aug 5, 2017

Read up on Apache Spark and how it distributes the work of machine learning.

https://en.m.wikipedia.org/wiki/Apache_Spark

You might get some general data mining insight from this IBM Redbook

http://jliusun.bradley.edu/~jiangbo/Redbooks/sg245252IMGuide.pdf

Data mining uses different strategies to find trends in data. One such strategy is to train a neural net to identify customers with a certain trait or behavior using other customers who've demonstrated that behavior or have that trait.

As an example, we have a database of customers to a bank. We want to find out who are dissatisfied with the bank and are thinking of leaving. We train a neural net using customers who have left and ask it to score the remaining customers and then we select out those customers with the highest scores and try to market me bank product to them to keep them as a customer.

SQL is used to extract the customers into a file and the mining tools process the file outputting a score. SQL is used to add the score back to the database. SQL is used to extract the high scoring customers for our marketing campaign.

Apache Spark could be used to manage the mining process is a more efficient distributed fashion,

WWGD · Aug 6, 2017

Thanks, Jedi, I assume if we have an OLTP setup we would want to denormalize, while if we have an OLAP, we may want to eliminate redundancy and we do keep a normalized database? EDIT: I will owe you if I get my ( entry level) big data job.

jedishrfu · Aug 6, 2017

Yes, that's basically it. We found that SQL queries to collect all the data from the various star schema tables while data mining was far slower than making a flat denormalized file of data to mine. I think this is still true and is used by Apache Spark as it distributes the data across the network machine.

jedishrfu · Aug 6, 2017

This might help with your presentation:

http://www.sqlservercentral.com/blo...duction-to-sql-server-data-mining-algorithms/

WWGD · Aug 6, 2017

jedishrfu said:

This might help with your presentation:

http://www.sqlservercentral.com/blo...duction-to-sql-server-data-mining-algorithms/

Excellent, thanks again. Hope to see you in some Big Data conference, will recognize you from your PF avatar ;).

jedishrfu · Aug 6, 2017

I'm no longer in the data mining area. I moved on to scientific programming a few years ago but we're looking at using Apache Spark for a project. However, who knows what'll happen next.

Cheers, take care. Good luck with the job interview, try not to get bogged down in the details of their questions and answer honestly and confidently as they can't expect you to know everything about data mining but just knowing the terms and strategies will convince them.

Remember when you don't know something say so and then say you'll definitely review that or research that. Try to change things into a dialog instead of a question answer with you providing suggestions on how you can help them with their work.

Segway for Presentation: SQl Server/Database and Data Mining

Is A.I. more than the sum of its parts?

AI vs. Humans as Processors in an Environment

France to ditch Windows for Linux

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Segway for Presentation: SQl Server/Database and Data Mining

Similar threads