Projects involving data science

Click For Summary

Discussion Overview

The discussion revolves around project ideas in data science and machine learning, particularly using Python. Participants explore potential projects suitable for building a portfolio, focusing on the application of data manipulation and exploratory data analysis techniques.

Discussion Character

  • Exploratory
  • Technical explanation
  • Conceptual clarification
  • Homework-related

Main Points Raised

  • One participant seeks project ideas to enhance their CV, expressing uncertainty about how to progress beyond basic data manipulation and exploratory data analysis.
  • Another participant suggests projects involving acoustics, such as voice recognition or animal sound recognition, highlighting the availability of relevant datasets.
  • Image character recognition is mentioned as a common project, with a reference to a video explaining neural networks.
  • The original poster expresses concern about the complexity of neural networks and seeks clarification on what employers look for in personal projects, including the necessary skills and project complexity.
  • A later reply emphasizes that the goal of data science is to draw conclusions from data, suggesting that projects should demonstrate the ability to analyze data and derive meaningful insights, rather than just presenting differences in data points.
  • Participants discuss the importance of identifying trends in datasets and reporting on their implications as a key component of data science projects.

Areas of Agreement / Disagreement

Participants present multiple competing views on project ideas and the skills necessary for data science, with no consensus on a single approach or project type. The discussion remains unresolved regarding the complexity and specific requirements for projects.

Contextual Notes

Participants express varying levels of experience and comfort with different aspects of data science, indicating that assumptions about prior knowledge may affect the discussion. The complexity of projects and the expectations of employers are not fully defined, leaving room for interpretation.

Who May Find This Useful

Individuals interested in starting projects in data science or machine learning, particularly those looking to build a portfolio for job applications.

EngWiPy
Messages
1,361
Reaction score
61
Hello,

I am trying to do some projects on data science/machine learning using Python, but I am not sure what to do. I downloaded a very simple dataset from WHO, and I am trying to do something with it, but most of (actually all) what I can do with it is data manipulation and exploratory data analysis (histograms, scatter plots, ... etc). I need these projects for my CV since I don't have previous experience in the field. Any suggestion will be highly appreciated.

Thanks
 
Technology news on Phys.org
Do something with acoustics like voice recognition, or animal sounds recognition. There are a lot of datasets for acoustics that could be used.

Image character recognition is a common project. Here's a video on how neural-nets work that uses the character recognition.

 
  • Like
Likes   Reactions: EngWiPy
jedishrfu said:
Do something with acoustics like voice recognition, or animal sounds recognition. There are a lot of datasets for acoustics that could be used.

Image character recognition is a common project. Here's a video on how neural-nets work that uses the character recognition.



Thanks. I am still novice in the field, and it seems a little complicated to do these things now. Neural networks/deep learning is a topic on its own. This leads me to the following question: What are employers looking for in the personal projects? What do I need to demonstrate as skills? How complex my project should be? Thanks in advance
 
The ultimate goal of a field like data science is to draw conclusion from data. Ideally you will be able to make some conclusion or recommendation from the data that you have.

As an example. Let's say the data-set is related to insurance, age, current estimated risk, number of accidents, cost, type of vehicle, etc...

An employer would look that you can take a block of data and draw a conclusion from it. It is not enough to say here is the difference in number of accidents between age 85 and 75 drivers. You need to say "The price of insurance for 75 year old needs to go up due to the fact that the profit after paying out for accidents is not high enough." You need to develop a program/method that will essentially fill in that data for you. You need to determine a way to evaluate through coding if an age group or car group is worth it. A good way to start would be just looking at histograms and scatter plots, then start thinking of ways to automatically identify those trends.

TLDR: with your dataset, develop ways to identify trends, then think about what those trends mean, and report on it.
 
  • Like
Likes   Reactions: EngWiPy

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 43 ·
2
Replies
43
Views
7K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
6
Views
3K
  • · Replies 7 ·
Replies
7
Views
5K
  • · Replies 3 ·
Replies
3
Views
3K