Personal projects can substantially increase your chances of breaking into data science and machine learning.
The first thing you need is to define a clear problem statement. This might sound like an easy step but it's where most people get stuck. When I talk to aspiring data scientists about the importance of building a project portfolio, their very first response is "Ok, but what project do I work on?"
I want to address this issue by introducing a simple framework for generating an infinite number of project ideas. Throughout this process, you have to remember that there is no good or bad project.
3 steps for generating project ideas
List your interests and hobbies (List 1)
List the technical skills you want to learn about (List 2)
Map the items in list 1 to the items in list 2.
Assuming you have n items in list 1 and m items in list 2, you can come up with a total of m x n project ideas.
Here is how I generate project ideas:
Step 1: List your interests
Here are a few items on my list
Reading
Writing
Football
Board games
...
Step 2: List the tools/techniques you'd like to learn about
This can be a list of broad areas in machine learning/data science that you're curious about such as natural language processing, computer vision, or recommender systems. Or it could be a specific tool, technology, or model architecture that you've recently learned about.
Here are a few items on my list:
Large pre-trained language models
Generative models in computer vision
Recommender systems
...
Step 3: Matching interests with techniques
Focus on how the tools/technologies you have in list 2 can be used to improve the activities in list 1.
At this stage, you should come up with as many ideas as possible. Do not limit yourself. There are no bad ideas. You don't need to think about feasibility, practicality, or the difficulty of the idea at this point.
A few of my ideas:
[Reading x Recommender systems] A newsletter recommender system that recommends newsletters based on your current subscriptions.
[Board games x Recommender systems] Board game recommendation system based on your rating of board games or board game description/rules text.
[Football x Generative models] A generative model for creating unique/imaginary football scenes
[Writing x Language models] An encoder-decoder language model for converting prose to poetry and vice versa
...