'Data'- the most commonly used word in every field these days. When you hear the term "Data Science", (also called as Data Driven Science) for the first time, you might think it's as complex as Rocket Science.
What is data science?
Data Science is nothing but the study of information from where it is coming and what it says and how it can be tuned into our own form. Information can come from anywhere like excel or from Database or some real time data or it might be of any other form. It can be structured or unstructured but every piece of information coming to the central location will convey something, that's where one of the crucial part lies - "what it says". After analyzing, we must be good enough in visualizing the same using tools like Excel or Power BI, Tableau etc. Finally, from the visualizations we build, we can extract the essential statistics and expect the trends.
Working with the Data:
Let's assume we have sales details of Tea and Coffee at two different public locations by a vendor. Now, stall owner wants to extract some essential stats. based on those sales. Below is the sample screenshot of the data
Using the above data, we can make up few more interesting things. Let's clean up and have a clear picture on what exactly the data is saying. Perform below steps and then look at the data.
Let's add a "Total No of Units" column (=C+D) and sum up coffee and tea units sold on that day.
Then add one more column "Revenue" which multiplies "Total No of Units" and "Price"
Then add a day column(=TEXT(WEEKDAY(A),"dddd")) to the right of the Date
After adding these columns, data will start speaking more. PFB the screenshot of same.
Now, we can draw different types of conclusions on the prepped data.
Descriptive, like what are the totals units sold, how much is the revenue generated etc.,
Associative, like how many units are sold in relative to temperature. Say if temperature is high, what is unit count and what is count if temperature is low.
Comparative, like how many coffees vs Tea units
Predictive, like how many I can sell on a weekday and how many I can sell on weekend (using Day column)
These are few conclusions. But we still can explore the same data by adding few more things like
Total units sold for all the days
Total revenue generated
Average units of Coffee and Tea sold per day
Exploring the Data:
Till this point, everything is looking fine. But the table is more of numbers. If a person wants to get an idea on this, he has to go through the table and if he is not interested he might move away. But we can analyze a bit more using conditional formatting to find out
the day we generated more revenue (cell that is completely filled in revenue column is considered as the one that generated highest)
top x and bottom x days in view of sold units (top x days are filled with green and bottom x days are filled with red and 'x' can be a variable value depending on the requirement)
Data Visualization:
So far we have done a good amount of analysis with the data provided. We can visualize better to make it more interesting. Here, I am using Power BI and Excel to visualize the data. Let's start visualizing the given data in different aspects of our analysis.
Below graph shows us the sales of Tea and coffee w.r.t temperature
One more analysis is revenue generated over a period of time for which data is available. Below graph shows the same.
Tea and coffee sales over time
Like this we can create many other visualizations depending on the tool we use. Likewise, we can also perform slicing & dice operations and we also can create pivot tables and pivot charts.
Below are the sample pivot table views from the available data.
Sales and revenue by day:
Sales and revenue by day Location-wise:
In this document, I have covered Data Science Fundamentals which includes the study of information, analyzing, tuning and visualizing. In the next series of this blog, I will provide you information on Basic Introduction to Statistics (core Data Science concepts).