Before The Outset
I’ve never been good enough to make complex manipulations on data, but writing algorithms and building web applications, is a big Yes.
→ This is not a guide to becoming a full Data Science Engineer, I’m just sharing what I started with in this field.
→ This is not the only path neither.
Instead, you can consider it as a beginner-friendly study plan for those coming from a Software Development background who are working to be a Junior Data Science Engineer.
The voice of a beginner for other beginners. 🙂
What is Data Science?
Data science is a multidisciplinary blend of data inference, algorithm development, and technology to solve analytically complex problems.
“We have lots of data – now what?” (How can we unlock real value from our data?) Data science is a multidisciplinary…
For me, it is all the things we do with the data that can solve some problems and come out with business value or growth.
Kind of Data
As we said at the top, it is about manipulating data most of the time, but what kind of data can we manipulate?
On the internet or in large enterprise applications there is a lot of data coming from different sources such as social media, calls to action, simple forms, log data, transactions, emails …
All things we are doing online or in others place require to fill in data, these data can be in different types:
- Text
- Photo
- Audio
- Video
- …
Also in different formats:
- Structured data: those with a certain degree of organization for further querying and/or analysis. As the one stored in a Relational Database Management System or Json, Xml, Xls files.
- Semi-structured and Unstructured data: easy to understand (not formatted), the opposite of the first.
Programming Languages for Data Science
There are a lot of languages to Data Sciencing(hahaha), some of them are very popular and more used than others.
There are a few: R, Python, Java …
Job Posting by Indeed

Job Seekers by Indeed

Google Trend

Percentage of search interest in R and Python for Data Science: R in Red and Python in sky Blue
Intro to Python for Data Science
I started Data Science with a free and very educative certification available on, it helped me to introduce myself in this field to be able to start a new path in my career.
The course
Learn Python for Data Science – Online Course
DataCamp’s Intro to Python course teaches you how to use Python programming for data science with interactive video…
What I learned there
- Manipulating python list in deep
- Manipulating Numpy array
- Subsetting Numpy array
- Subsetting 2D Numpy array
- Simple exploration of data
- Basic statistic
There are a lot of exercises and XP I earned there on this free course, I will show you some examples of them, not possible to put all here, it is not the intent of this post:
- 4700 XP Earned
- 1 Courses Completed
- 57 Exercises Aced
1. Python List
- Create a list
mySimpleList = [12, 43, 54, 34, 90] #Simple list with same type myWeirdList = ['a', 43, 54, 'c', 90] #Different types of items
- Print a list
print(myList) # Knowing that the list is already created
- List of list
countries = [["Cameroon", "CM"], ["Nigeria", "NG"], ["France", "FR"], ["Gabon", "GA"]]
- Type of a Variable
To print the type of variable, just hit this:
- Index

list[4] = list[-2] = 5
- Subsetting list
A subsetting always returns a list. Here the first index is included in the result and the last is not.
list[1:4] = [2, 3, 4] # From index 1 to index 3 included list[:4] = [1, 2, 3, 4] # From the start to index 3 included list[1:] = [2, 3, 4, 5, 6] # From index 1 to the end
2. Numpy Array
- Install Numpy: pip3 install numpy
- Import Numpy
There is some way to import Python packages/functions, let’s focus on these two for this post:
import numpy # Here we will address numpy array with numpy.array import numpy as np # Address numpy array with np.array
- From list to numpy Array
# Using countries list declared up there countries_np_array = np.array(countries)

- subsetting np array
countries_np_array[:, 1] = array([‘CM’, ‘NG’, ‘FR’, ‘GA’]) # Return all country code, all rows and the second column age_array = np.array([2, 4, 6, 8]) age_selector = age_array >= 4 # result array([False, True, True, True], dtype=bool) # Now use this selector to index the new array print(age_array[age_selector]) # result array([4, 6, 8])
Note: Numpy does not allow multiple types on an array and will force all types to be the same.
age_array = np.array([True, 4, False, 8]) print(age_array) #result array([1, 4, 0, 8])
- Operation over collections
age = [2, 4, 6, 8] div = [2, 2, 2, 4] age_array = np.array(age) # Numpy array of ages div_array = np.array(div) # Numpy array of divs print(age/div) # divide python list # Traceback (most recent call last): # File “<stdin>”, line 1, in <module> #TypeError: unsupported operand type(s) for /: ‘list’ and ‘list’ print(age_array/div_array) # Will compute without issue on each item array([ 1., 2., 3., 2.]) # result
Data Science deals with a lot of information to analyze, sort, and do other things on, sometimes we need to do mathematical operations over collections quickly.
3. Little stats with Numpy
- Average: np.mean(your_numpy_array_or_axis)
- Median: np.median(your_numpy_array_or_axis)
- Standard Deviation: np.std(your_numpy_array_or_axis)
Supposed we have an array representing the grade of 3 students of a class in two courses(French and English):
import numpy as np student_grades = np.array([[12, 16], [15.5, 9], [5, 16]]) # Average of student's grade in French # Here we select all the rows and the french axis(the first column) french_average = np.average(student_grades[:, 0]) print(french_average) # result: 10.833333333333334 # Standart deviation of student's grade in English # We select all the rows and the english axis(the second column) english_std = np.std(student_grades[:, 1]) print(english_std) # result: 3.299831645537221
Got my first certification in this field 🙂

There are also a lot of interesting community courses available but non-certifying:
Free Data Science and Analysis Training Courses | DataCamp
Are you looking to build your data analysis skill set? Try one of our free open courses and see why over 460,000 data…
Let’s Work together

Ping me if you are looking for or would like to have a partner to study and master Data Science with, I’m available for peer learning.
I think that starting by practicing using a simple and detailed course with a good scope is worth it for learning new things.
Thrilled to have it and excited to learn more using online resources and other posts about Data Science. I would also like to know how you started, or which advice can you give to a beginner like me.
Thanks for reading this post, share if you enjoyed it.
Follow me on LinkedIn and Twitter.