Data visualization is often the first step in any type of data analysis. This course will teach you several essential data visualization techniques, when to use them, and how to implement them with Python and Matplotlib.
At the core of data science and data analytics is a thorough knowledge of data visualization. In this course, Introduction to Data Visualization with Python, you'll learn how to use several essential data visualization techniques to answer real-world questions. First, you'll explore techniques including scatter plots. Next, you'll discover line charts and time series. Finally, you'll learn what to do when your data is too big. When you're finished with this course, you'll have a foundational knowledge of data visualization that will help you as you move forward to analyze your own data.
YK Sugi is the creator of CS Dojo, a popular programming education YouTube channel. He is also active on Twitter and Instagram @ykdojo.
Section Introduction Transcripts
Section Introduction Transcripts
Course Introduction Hi, everyone; my name is YK and welcome to Introduction to Data Visualization with Python. I've worked at various software companies as a data scientist and a software developer in the past. I had many data analysis projects, and I used Python in many of them. This is a course I wish I had had before working on those projects. Not only will it give you tutorials on how to use Python to visualize data, but it will also give you fundamentals of data visualization in general. Each module from Module 4, the Histogram module, to Module 8, which is on what to do when your data is too big, is centered around a single topic; and each module consists of two components. The first video is about why that particular tool or topic is important and when you should use it. The second few videos after that explore how to implement it in Python. So if you'd like to get a quick overview of what kind of analysis you can conduct with data visualization, you can just watch the first video in each of these modules. If you are only interested in learning about how to implement it in Python, then you can watch the full-length videos after that. Of course, you can watch both of them as well. Module 3, the module right after this, will introduce you to the Python libraries we're going to use throughout this course; and Module 9 will give you some example problems to practice what you're going to learn throughout this course. Feel free to use them, as well, as you see fit. As you go through this course, I'd highly recommend practicing what you learn throughout this course with either your own data or with the data that I'm going to provide. If you have comments or questions about this course's material, feel free to post comments as well so learners can help each other.
Examining Relationships in Data with Scatter Plots In this module, we're going to examine when and how to use scatter plots. Scatter plots give you a convenient way to visualize how two numeric variables are related in your data. And here's a simple example of a scatter plot. It shows how weights and heights are related in a hundred people. Obviously, you can see that the taller someone is, the more he or she tends to weigh, but there's some variation, as you can see. So this is a very simple example. But when should you use scatter plots in a real world situation? Suppose as an example you're a sales executive at a Fortune 500 company, and you're in charge of your company's entire sales force, about 200 salespeople in total, and you want to see how to improve their performance. And in particular, you want to see how the amount of experience is related to each salesperson's performance. One way to analyze this is with a scatter plot. You can plot years of experience at your company, say, Company X on the X axis, and the sales volume for the last quarter in dollars on the Y axis. Then you might see a graph like this. From this chart, you can see that in general, the more experience a salesperson has, the more sales he or she brings. But you can also see that there are a few outliers who have very little experience compared to other salespeople but still have some extraordinary performance in the last quarter. Once you identify who they are, you might then interview then to see what they do differently that allowed them to do so well. Whatever their techniques are, perhaps you can spread that knowledge to other salespeople in the company to increase the sales figures for the entire company. So, in summary, scatter plots help you better understand relationships between multiple variables, for example, weight and height or years of experience and the sales volume. And in the process, sometimes it helps you find outliers as well. And in the next video, we're going to see how to create scatter plots using Matplotlib.
Comparing Data with Bar Graphs A bar graph, of course, is a convenient way to compare numeric values of several groups. Let's see an example of when to use a bar graph. Suppose you're working in the finance division of a company that provides software as a service. You're trying to better understand the company's finances. The company has several software as a service products, but you're not sure which ones are the most important to the company's bottom line. In a situation like this, a bar graph could be useful. Let's say you decided to use a bar graph to compare the revenue from the different products that your company offers. You might have products for email marketing, client management, survey creation, and so on. You can create a bar graph that shows the revenue from each product like this. From this graph, you can see that most of your revenue comes from the email marketing product and the client management product. But you might say, what about the costs? You can show that in a bar graph as well. Once you have a graph of revenue and cost by product category, it might look like this. From this graph, you can see that the client management product is the most profitable one, more so than the email marketing product. You can see that the document editing product is actually losing money because the cost is higher than the revenue. If you only have this information say, in a table format, it would have been much harder to grasp the situation. Putting this kind of information in a bar graph, helps you understand it more intuitively. So to summarize, bar graphs help you compare numeric values, and you can use them to compare multiple numeric values of several groups as well, for example, revenue and cost. Putting data in a bar graph sometimes gives you a new insight that you might have missed otherwise. This is true in general for any other data visualization tool as well.