Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Segway for Presentation: SQl Server/Database and Data Mining

  1. Aug 3, 2017 #1

    WWGD

    User Avatar
    Science Advisor
    Gold Member

    Hi All,
    I need to do a presentation in the subject areas of SQL Server, general Database into the area of Data Mining for a job interview. Any Ideas? First thought is the use of SSAS ( Analysis Service) and SSDT ( Data Tools) from SQL Server in Data Mining. But this does not seem clear-enough to me. I am not sure whether one actually uses SQL or t-SQL in either of these platforms. What language are the Machine-Learning (ML) programs written?
    Any other suggestions?
    Thanks.
     
  2. jcsd
  3. Aug 5, 2017 #2

    jedishrfu

    Staff: Mentor

    Read up on Apache Spark and how it distributes the work of machine learning.

    https://en.m.wikipedia.org/wiki/Apache_Spark

    You might get some general data mining insight from this IBM Redbook

    http://jliusun.bradley.edu/~jiangbo/Redbooks/sg245252IMGuide.pdf

    Data mining uses different strategies to find trends in data. One such strategy is to train a neural net to identify customers with a certain trait or behavior using other customers who've demonstrated that behavior or have that trait.

    As an example, we have a database of customers to a bank. We want to find out who are dissatisfied with the bank and are thinking of leaving. We train a neural net using customers who have left and ask it to score the remaining customers and then we select out those customers with the highest scores and try to market me bank product to them to keep them as a customer.

    SQL is used to extract the customers into a file and the mining tools process the file outputting a score. SQL is used to add the score back to the database. SQL is used to extract the high scoring customers for our marketing campaign.

    Apache Spark could be used to manage the mining process is a more efficient distributed fashion,
     
    Last edited: Aug 6, 2017
  4. Aug 6, 2017 #3

    WWGD

    User Avatar
    Science Advisor
    Gold Member

    Thanks, Jedi, I assume if we have an OLTP setup we would want to denormalize, while if we have an OLAP, we may want to eliminate redundancy and we do keep a normalized database? EDIT: I will owe you if I get my ( entry level) big data job.
     
    Last edited: Aug 6, 2017
  5. Aug 6, 2017 #4

    jedishrfu

    Staff: Mentor

    Yes, that's basically it. We found that SQL queries to collect all the data from the various star schema tables while data mining was far slower than making a flat denormalized file of data to mine. I think this is still true and is used by Apache Spark as it distributes the data across the network machine.
     
  6. Aug 6, 2017 #5

    jedishrfu

    Staff: Mentor

  7. Aug 6, 2017 #6

    WWGD

    User Avatar
    Science Advisor
    Gold Member

  8. Aug 6, 2017 #7

    jedishrfu

    Staff: Mentor

    I'm no longer in the data mining area. I moved on to scientific programming a few years ago but we're looking at using Apache Spark for a project. However, who knows what'll happen next.

    Cheers, take care. Good luck with the job interview, try not to get bogged down in the details of their questions and answer honestly and confidently as they can't expect you to know everything about data mining but just knowing the terms and strategies will convince them.

    Remember when you don't know something say so and then say you'll definitely review that or research that. Try to change things into a dialog instead of a question answer with you providing suggestions on how you can help them with their work.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted



Similar Discussions: Segway for Presentation: SQl Server/Database and Data Mining
  1. Lenovo Server Data? (Replies: 1)

Loading...