Segway for Presentation: SQl Server/Database and Data Mining

Click For Summary

Discussion Overview

The discussion revolves around preparing a presentation on SQL Server, databases, and data mining for a job interview. Participants explore various tools, strategies, and concepts related to data mining, including the use of SQL and machine learning.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Homework-related

Main Points Raised

  • One participant suggests using SSAS and SSDT from SQL Server for data mining but expresses uncertainty about the clarity of these tools and the programming languages involved.
  • Another participant recommends researching Apache Spark for its capabilities in distributing machine learning tasks and mentions the use of SQL for data extraction and scoring in data mining processes.
  • A participant discusses the difference between OLTP and OLAP setups, suggesting that denormalization may be beneficial for OLTP while normalization is preferred for OLAP.
  • It is noted that SQL queries can be slower when collecting data from star schema tables compared to using a flat denormalized file for mining, which aligns with Apache Spark's distributed approach.
  • Several participants share links to resources that could assist in the presentation preparation, focusing on SQL Server data mining algorithms.
  • A participant reflects on their transition from data mining to scientific programming but acknowledges the potential use of Apache Spark in future projects.
  • Advice is given regarding interview strategies, emphasizing the importance of engaging in dialogue and being honest about knowledge gaps.

Areas of Agreement / Disagreement

Participants express varying opinions on the best practices for data mining and the use of SQL versus other tools. There is no consensus on the optimal approach or the clarity of certain concepts, indicating that multiple competing views remain.

Contextual Notes

Participants discuss the implications of different database setups (OLTP vs. OLAP) and the efficiency of various data mining strategies, but these discussions are contingent on specific use cases and assumptions that are not fully resolved.

Who May Find This Useful

Individuals preparing for job interviews in data-related fields, particularly those focused on SQL Server, data mining, and machine learning, may find this discussion beneficial.

WWGD
Science Advisor
Homework Helper
Messages
7,795
Reaction score
13,095
Hi All,
I need to do a presentation in the subject areas of SQL Server, general Database into the area of Data Mining for a job interview. Any Ideas? First thought is the use of SSAS ( Analysis Service) and SSDT ( Data Tools) from SQL Server in Data Mining. But this does not seem clear-enough to me. I am not sure whether one actually uses SQL or t-SQL in either of these platforms. What language are the Machine-Learning (ML) programs written?
Any other suggestions?
Thanks.
 
Computer science news on Phys.org
Read up on Apache Spark and how it distributes the work of machine learning.

https://en.m.wikipedia.org/wiki/Apache_Spark

You might get some general data mining insight from this IBM Redbook

http://jliusun.bradley.edu/~jiangbo/Redbooks/sg245252IMGuide.pdf

Data mining uses different strategies to find trends in data. One such strategy is to train a neural net to identify customers with a certain trait or behavior using other customers who've demonstrated that behavior or have that trait.

As an example, we have a database of customers to a bank. We want to find out who are dissatisfied with the bank and are thinking of leaving. We train a neural net using customers who have left and ask it to score the remaining customers and then we select out those customers with the highest scores and try to market me bank product to them to keep them as a customer.

SQL is used to extract the customers into a file and the mining tools process the file outputting a score. SQL is used to add the score back to the database. SQL is used to extract the high scoring customers for our marketing campaign.

Apache Spark could be used to manage the mining process is a more efficient distributed fashion,
 
Last edited:
  • Like
Likes   Reactions: WWGD
Thanks, Jedi, I assume if we have an OLTP setup we would want to denormalize, while if we have an OLAP, we may want to eliminate redundancy and we do keep a normalized database? EDIT: I will owe you if I get my ( entry level) big data job.
 
Last edited:
Yes, that's basically it. We found that SQL queries to collect all the data from the various star schema tables while data mining was far slower than making a flat denormalized file of data to mine. I think this is still true and is used by Apache Spark as it distributes the data across the network machine.
 
  • Like
Likes   Reactions: WWGD
I'm no longer in the data mining area. I moved on to scientific programming a few years ago but we're looking at using Apache Spark for a project. However, who knows what'll happen next.

Cheers, take care. Good luck with the job interview, try not to get bogged down in the details of their questions and answer honestly and confidently as they can't expect you to know everything about data mining but just knowing the terms and strategies will convince them.

Remember when you don't know something say so and then say you'll definitely review that or research that. Try to change things into a dialog instead of a question answer with you providing suggestions on how you can help them with their work.
 
  • Like
Likes   Reactions: WWGD

Similar threads

  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 133 ·
5
Replies
133
Views
12K
  • · Replies 10 ·
Replies
10
Views
4K
  • · Replies 1 ·
Replies
1
Views
4K
Replies
2
Views
3K
Replies
2
Views
5K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 7 ·
Replies
7
Views
8K