# Visualize matrix python ## How To Visualize Sparse Matrix in Python?

When you work with sparse matrix data structure with SciPy in Python, sometimes you might want to visualize the sparse matrix. A quick visualization can reveal the pattern in the sparse matrix and can tell how “sparse” the matrix is. And it is a great sanity check.

One way to visualize sparse matrix is to use 2d plot. Python’s matplotlib has a special function called Spy for visualizing sparse matrix. Spy is very similar to matplotlib’s imshow, which is great for plotting a matrix or an array as an image. imshow works with dense matrix, while Spy works with sparse matrix.

Let us first load the modules needed to make sparse matrix and visualize it. We will be using sparse module in SciPy to create sparse matrix and matplotlib’s pyplot to visualize

import matplotlib.pylab as plt import scipy.sparse as sparse

Let us create simple sparse matrix, here a diagonal sparse matrix with ones along the diagonal with sparse.eye function. We can use the spy function with the sparse matrix as an argument.

# create a sparse diagonal matrix with ones on the diagonal A = sparse.eye(100) # visualize the sparse matrix with Spy plt.spy(A)

It will create 2-D image with blue color squares representing non-zero elements and white color for elements zeros. Since our matrix is diagonal matrix, we see a blue line along the diagonal.

Let us create sparse matrix with a specific density

# create a sparse matrix with specific density A = sparse.random(100,100, density=0.01) # visualize the sparse matrix with Spy plt.spy(A)

And visualize this 100×100 sparse matrix with the density 1%.

You can see the blue square is kind of big. We can control the size of the blue squares with the argument “markersize” as shown below. This will help us get a real sense of the actual sparsity of sparse matrix.

A = sparse.random(100,100, density=0.01) plt.spy(A, markersize=4)

Here is image of same sparse matrix, but with smaller markersize. Now we get a smaller blue square representing the non-zero items and a better sense of the sparsity.

Let us create a bigger sparse matrix of dimension 10k x 10k with density of 0.00001.

A = sparse.random(10000,10000, density=0.00001) plt.spy(A, markersize=1)

Now the visualization of the sparse matrix using Spy is much better with a smaller markersize=1.

Filed Under: Spy to visualize sparse matrix, Visualizing Sparse MatrixTagged With: Sparse Matrix in SciPy, Spy to Visualize Sparse Matrix, Visualizing Sparse Matrix

Sours: https://cmdlinetips.com/2019/02/how-to-visualize-sparse-matrix-in-python/

Say you have a very rectangular 2D array , whose columns and rows correspond to very specific sampling locations and . That is, the entry corresponds to some measurement taken at and .

Matlab’s shows you this quite meaningfully:

x =linspace(-100, -10, 10); y = [-8-3]; data =randn(numel(y), numel(x)); figure() imagesc(x, y, data) export_fig('matlab.png') The two left-most matrix elements’ y-positions are indeed at -8 and -3, as specified in (although the effect is obscured because of all the extra tick marks). Each matrix element’s horizontal position falls exactly on a multiple of -10, as in the vector.

Furthermore, the matrix is stretched to cover the figure window, causing non-square matrix elements—very valuable when you want to observe the behavior of the full dataset. A given rectangular matrix element (‘matel’?) has the same flat color over its extent.

In Python and Matplotlib, an image like this is a little harder to obtain, because by default, Matplotlib’s forces square pixels. If you ask for rectangular pixels via , it interpolates the underlying array, so each matrix element has a blend of colors. Finally, if you specify , it takes these limits to mean the farthest edges of the image shown, rather than to the center of the edge matrix elements—so matrix elements do not line up with the sampling locations in and .

Here’s how to fix all these issues:

importnumpyasnpimportmatplotlib.pyplotaspltdefextents(f): delta=f -f return [f -delta/2, f[-1] +delta/2] x=np.linspace(-100, -10, 10) y=np.array([-8, -3.0]) data=np.random.randn(y.size,x.size) plt.imshow(data, aspect='auto', interpolation='none', extent=extents(x) +extents(y), origin='lower') plt.savefig('py.png') We write a custom function that takes a vector containing sampling locations, and converts it into a 2-tuple suitable for being used in ’s keyword argument. We call this function on both and before giving their combination to . We also specify that no interpolation should happen, and request automatic aspect ratios for rectangular matrix elements. Finally, we request that the image be flipped vertically via (in Matlab, instead of ).

With these tweaks, we get a visualization that with the same useful properties as :

1. rectangular pixels,
2. flat color over a single matrix element,
3. x and y axes that correspond to specified sampling locations, and last but definitely not least,
4. origin that matches these axes.

## How to plot a 2D matrix in Python with colorbar Matplotlib?

To plot a 2D matrix in Python with colorbar, we can use numpy to create a 2D array matrix and use that matrix in the imshow() method.

### Steps

• Create data2D using numpy.

• Use imshow() method to display data as an image, i.e., on a 2D regular raster.

• Create a colorbar for a ScalarMappable instance *mappable* using colorbar() method and imshow() scalar mappable image.

• To display the figure, use show() method.

### Example

import numpy as np from matplotlib import pyplot as plt plt.rcParams["figure.figsize"] = [7.00, 3.50] plt.rcParams["figure.autolayout"] = True data2D = np.random.random((50, 50)) im = plt.imshow(data2D, cmap="copper_r") plt.colorbar(im) plt.show()

### Output Sours: https://www.tutorialspoint.com/how-to-plot-a-2d-matrix-in-python-with-colorbar-matplotlib
Intro to Data Analysis / Visualization with Python, Matplotlib and Pandas - Matplotlib Tutorial

## How To Visualize Sparse Matrix in Python using Matplotlib?

agg_filtera filter function, which takes a (m, n, 3) float array and a dpi value, and returns a (m, n, 3) arrayalphafloat or Noneanimatedboolantialiasedboolclip_boxBboxclip_onboolclip_pathPatch or (Path, Transform) or Nonecolorcolorcontainscallabledash_capstyle{‘butt’, ’round’, ‘projecting’}dash_joinstyle{‘miter’, ’round’, ‘bevel’}dashessequence of floats (on/off ink in points) or (None, None)data(2, N) array or two 1D arraysdrawstyle{‘default’, ‘steps’, ‘steps-pre’, ‘steps-mid’, ‘steps-post’}figurefigurefillstyle{‘full’, ‘left’, ‘right’, ‘bottom’, ‘top’, ‘none’}gridstrin_layoutboollabelobjectlinestyle{‘-‘, ‘–‘, ‘-.’, ‘:’, ”, (offset, on-off-seq), …}linewidthfloatmarkermarker stylemarkeredgecolorcolormarkeredgewidthfloatmarkerfacecolorcolormarkerfacecoloraltcolormarkersizefloatmarkeveryNone or int or (int, int) or slice or List[int] or float or (float, float)path_effectsAbstract path effectspickerfloat or callable[[Artist, Event], Tuple[bool, dict]]pickradiusfloatrasterizedbool or Nonesketch_params(scale: float, length: float, randomness: float)snapbool or Nonesolid_capstyle{‘butt’, ’round’, ‘projecting’}solid_joinstyle{‘miter’, ’round’, ‘bevel’}transformmatplotlib.transforms.Transformurlstrvisibleboolxdata1D arrayydata1D arrayzorderfloat
Sours: https://www.geeksforgeeks.org/how-to-visualize-sparse-matrix-in-python-using-matplotlib/

## Chapter 4. Visualization with Matplotlib

We’ll now take an in-depth look at the Matplotlib tool for visualization in Python. Matplotlib is a multiplatform data visualization library built on NumPy arrays, and designed to work with the broader SciPy stack. It was conceived by John Hunter in 2002, originally as a patch to IPython for enabling interactive MATLAB-style plotting via gnuplot from the IPython command line. IPython’s creator, Fernando Perez, was at the time scrambling to finish his PhD, and let John know he wouldn’t have time to review the patch for several months. John took this as a cue to set out on his own, and the Matplotlib package was born, with version 0.1 released in 2003. It received an early boost when it was adopted as the plotting package of choice of the Space Telescope Science Institute (the folks behind the Hubble Telescope), which financially supported Matplotlib’s development and greatly expanded its capabilities.

One of Matplotlib’s most important features is its ability to play well with many operating systems and graphics backends. Matplotlib supports dozens of backends and output types, which means you can count on it to work regardless of which operating system you are using or which output format you wish. This cross-platform, everything-to-everyone approach has been one of the great strengths of Matplotlib. It has led to a large userbase, which in turn has led to an active developer base and Matplotlib’s powerful tools and ubiquity within the scientific Python world.

In recent years, however, the interface and style of Matplotlib have begun to show their age. Newer tools like ggplot and ggvis in the R language, along with web visualization toolkits based on D3js and HTML5 canvas, often make Matplotlib feel clunky and old-fashioned. Still, I’m of the opinion that we cannot ignore Matplotlib’s strength as a well-tested, cross-platform graphics engine. Recent Matplotlib versions make it relatively easy to set new global plotting styles (see “Customizing Matplotlib: Configurations and Stylesheets”), and people have been developing new packages that build on its powerful internals to drive Matplotlib via cleaner, more modern APIs—for example, Seaborn (discussed in “Visualization with Seaborn”), ggplot, HoloViews, Altair, and even Pandas itself can be used as wrappers around Matplotlib’s API. Even with wrappers like these, it is still often useful to dive into Matplotlib’s syntax to adjust the final plot output. For this reason, I believe that Matplotlib itself will remain a vital piece of the data visualization stack, even if new tools mean the community gradually moves away from using the Matplotlib API directly.

Before we dive into the details of creating visualizations with Matplotlib, there are a few useful things you should know about using the package.

### Importing matplotlib

Just as we use the shorthand for NumPy and the shorthand for Pandas, we will use some standard shorthands for Matplotlib imports:

The interface is what we will use most often, as we’ll see throughout this chapter.

### Setting Styles

We will use the directive to choose appropriate aesthetic styles for our figures. Here we will set the style, which ensures that the plots we create use the classic Matplotlib style:

Throughout this section, we will adjust this style as needed. Note that the stylesheets used here are supported as of Matplotlib version 1.5; if you are using an earlier version of Matplotlib, only the default style is available. For more information on stylesheets, see “Customizing Matplotlib: Configurations and Stylesheets”.

### show() or No show()? How to Display Your Plots

A visualization you can’t see won’t be of much use, but just how you view your Matplotlib plots depends on the context. The best use of Matplotlib differs depending on how you are using it; roughly, the three applicable contexts are using Matplotlib in a script, in an IPython terminal, or in an IPython notebook.

### Plotting from a script

If you are using Matplotlib from within a script, the function is your friend. starts an event loop, looks for all currently active figure objects, and opens one or more interactive windows that display your figure or figures.

So, for example, you may have a file called myplot.py containing the following:

You can then run this script from the command-line prompt, which will result in a window opening with your figure displayed:

\$ python myplot.py

The command does a lot under the hood, as it must interact with your system’s interactive graphical backend. The details of this operation can vary greatly from system to system and even installation to installation, but Matplotlib does its best to hide all these details from you.

One thing to be aware of: the command should be used only once per Python session, and is most often seen at the very end of the script. Multiple commands can lead to unpredictable backend-dependent behavior, and should mostly be avoided.

### Plotting from an IPython shell

It can be very convenient to use Matplotlib interactively within an IPython shell (see Chapter 1). IPython is built to work well with Matplotlib if you specify Matplotlib mode. To enable this mode, you can use the magic command after starting :

At this point, any plot command will cause a figure window to open, and further commands can be run to update the plot. Some changes (such as modifying properties of lines that are already drawn) will not draw automatically; to force an update, use . Using in Matplotlib mode is not required.

### Plotting from an IPython notebook

The IPython notebook is a browser-based interactive data analysis tool that can combine narrative, code, graphics, HTML elements, and much more into a single executable document (see Chapter 1).

Plotting interactively within an IPython notebook can be done with the command, and works in a similar way to the IPython shell. In the IPython notebook, you also have the option of embedding graphics directly in the notebook, with two possible options:

• will lead to interactive plots embedded within the notebook

• will lead to static images of your plot embedded in the notebook

For this book, we will generally opt for :

After you run this command (it needs to be done only once per kernel/session), any cell within the notebook that creates a plot will embed a PNG image of the resulting graphic (Figure 4-1):

### Saving Figures to File

One nice feature of Matplotlib is the ability to save figures in a wide variety of formats. You can save a figure using the command. For example, to save the previous figure as a PNG file, you can run this:

We now have a file called my_figure.png in the current working directory:

-rw-r--r-- 1 jakevdp staff 16K Aug 11 10:59 my_figure.png

To confirm that it contains what we think it contains, let’s use the IPython object to display the contents of this file (Figure 4-2):

In , the file format is inferred from the extension of the given filename. Depending on what backends you have installed, many different file formats are available. You can find the list of supported file types for your system by using the following method of the figure object:

Out: {'eps': 'Encapsulated Postscript', 'jpeg': 'Joint Photographic Experts Group', 'jpg': 'Joint Photographic Experts Group', 'pdf': 'Portable Document Format', 'pgf': 'PGF code for LaTeX', 'png': 'Portable Network Graphics', 'ps': 'Postscript', 'raw': 'Raw RGBA bitmap', 'rgba': 'Raw RGBA bitmap', 'svg': 'Scalable Vector Graphics', 'svgz': 'Scalable Vector Graphics', 'tif': 'Tagged Image File Format', 'tiff': 'Tagged Image File Format'}

Note that when saving your figure, it’s not necessary to use or related commands discussed earlier.

A potentially confusing feature of Matplotlib is its dual interfaces: a convenient MATLAB-style state-based interface, and a more powerful object-oriented interface. We’ll quickly highlight the differences between the two here.

### MATLAB-style interface

Matplotlib was originally written as a Python alternative for MATLAB users, and much of its syntax reflects that fact. The MATLAB-style tools are contained in the pyplot () interface. For example, the following code will probably look quite familiar to MATLAB users (Figure 4-3):

It’s important to note that this interface is stateful: it keeps track of the “current” figure and axes, which are where all commands are applied. You can get a reference to these using the (get current figure) and (get current axes) routines.

While this stateful interface is fast and convenient for simple plots, it is easy to run into problems. For example, once the second panel is created, how can we go back and add something to the first? This is possible within the MATLAB-style interface, but a bit clunky. Fortunately, there is a better way.

### Object-oriented interface

The object-oriented interface is available for these more complicated situations, and for when you want more control over your figure. Rather than depending on some notion of an “active” figure or axes, in the object-oriented interface the plotting functions are methods of explicit and objects. To re-create the previous plot using this style of plotting, you might do the following (Figure 4-4):

For more simple plots, the choice of which style to use is largely a matter of preference, but the object-oriented approach can become a necessity as plots become more complicated. Throughout this chapter, we will switch between the MATLAB-style and object-oriented interfaces, depending on what is most convenient. In most cases, the difference is as small as switching to , but there are a few gotchas that we will highlight as they come up in the following sections.

Perhaps the simplest of all plots is the visualization of a single function . Here we will take a first look at creating a simple plot of this type. As with all the following sections, we’ll start by setting up the notebook for plotting and importing the functions we will use:

For all Matplotlib plots, we start by creating a figure and an axes. In their simplest form, a figure and axes can be created as follows (Figure 4-5):

In Matplotlib, the figure (an instance of the class ) can be thought of as a single container that contains all the objects representing axes, graphics, text, and labels. The axes (an instance of the class ) is what we see above: a bounding box with ticks and labels, which will eventually contain the plot elements that make up our visualization. Throughout this book, we’ll commonly use the variable name to refer to a figure instance, and to refer to an axes instance or group of axes instances.

Once we have created an axes, we can use the function to plot some data. Let’s start with a simple sinusoid (Figure 4-6):

Alternatively, we can use the pylab interface and let the figure and axes be created for us in the background (Figure 4-7; see “Two Interfaces for the Price of One” for a discussion of these two interfaces):

If we want to create a single figure with multiple lines, we can simply call the function multiple times (Figure 4-8):

That’s all there is to plotting simple functions in Matplotlib! We’ll now dive into some more details about how to control the appearance of the axes and lines.

### Adjusting the Plot: Line Colors and Styles

The first adjustment you might wish to make to a plot is to control the line colors and styles. The function takes additional arguments that can be used to specify these. To adjust the color, you can use the keyword, which accepts a string argument representing virtually any imaginable color. The color can be specified in a variety of ways (Figure 4-9):

If no color is specified, Matplotlib will automatically cycle through a set of default colors for multiple lines.

Similarly, you can adjust the line style using the keyword (Figure 4-10):

If you would like to be extremely terse, these and codes can be combined into a single nonkeyword argument to the function (Figure 4-11):

These single-character color codes reflect the standard abbreviations in the RGB (Red/Green/Blue) and CMYK (Cyan/Magenta/Yellow/blacK) color systems, commonly used for digital color graphics.

There are many other keyword arguments that can be used to fine-tune the appearance of the plot; for more details, I’d suggest viewing the docstring of the function using IPython’s help tools (see “Help and Documentation in IPython”).

### Adjusting the Plot: Axes Limits

Matplotlib does a decent job of choosing default axes limits for your plot, but sometimes it’s nice to have finer control. The most basic way to adjust axis limits is to use the and methods (Figure 4-12):

If for some reason you’d like either axis to be displayed in reverse, you can simply reverse the order of the arguments (Figure 4-13):

A useful related method is (note here the potential confusion between axes with an e, and axis with an i). The method allows you to set the and limits with a single call, by passing a list that specifies (Figure 4-14):

The method goes even beyond this, allowing you to do things like automatically tighten the bounds around the current plot (Figure 4-15):

It allows even higher-level specifications, such as ensuring an equal aspect ratio so that on your screen, one unit in is equal to one unit in (Figure 4-16):

For more information on axis limits and the other capabilities of the method, refer to the docstring.

### Labeling Plots

As the last piece of this section, we’ll briefly look at the labeling of plots: titles, axis labels, and simple legends.

Titles and axis labels are the simplest such labels—there are methods that can be used to quickly set them (Figure 4-17):

You can adjust the position, size, and style of these labels using optional arguments to the function. For more information, see the Matplotlib documentation and the docstrings of each of these functions.

When multiple lines are being shown within a single axes, it can be useful to create a plot legend that labels each line type. Again, Matplotlib has a built-in way of quickly creating such a legend. It is done via the (you guessed it) method. Though there are several valid ways of using this, I find it easiest to specify the label of each line using the keyword of the plot function (Figure 4-18):

As you can see, the function keeps track of the line style and color, and matches these with the correct label. More information on specifying and formatting plot legends can be found in the docstring; additionally, we will cover some more advanced legend options in “Customizing Plot Legends”.

Another commonly used plot type is the simple scatter plot, a close cousin of the line plot. Instead of points being joined by line segments, here the points are represented individually with a dot, circle, or other shape. We’ll start by setting up the notebook for plotting and importing the functions we will use:

### Scatter Plots with plt.plot

In the previous section, we looked at / to produce line plots. It turns out that this same function can produce scatter plots as well (Figure 4-20):

The third argument in the function call is a character that represents the type of symbol used for the plotting. Just as you can specify options such as and to control the line style, the marker style has its own set of short string codes. The full list of available symbols can be seen in the documentation of , or in Matplotlib’s online documentation. Most of the possibilities are fairly intuitive, and we’ll show a number of the more common ones here (Figure 4-21):

For even more possibilities, these character codes can be used together with line and color codes to plot points along with a line connecting them (Figure 4-22):

Additional keyword arguments to specify a wide range of properties of the lines and markers (Figure 4-23):

This type of flexibility in the function allows for a wide variety of possible visualization options. For a full description of the options available, refer to the documentation.

### Scatter Plots with plt.scatter

A second, more powerful method of creating scatter plots is the function, which can be used very similarly to the function (Figure 4-24):

The primary difference of from is that it can be used to create scatter plots where the properties of each individual point (size, face color, edge color, etc.) can be individually controlled or mapped to data.

Let’s show this by creating a random scatter plot with points of many colors and sizes. In order to better see the overlapping results, we’ll also use the keyword to adjust the transparency level (Figure 4-25):

Notice that the color argument is automatically mapped to a color scale (shown here by the command), and the size argument is given in pixels. In this way, the color and size of points can be used to convey information in the visualization, in order to illustrate multidimensional data.

For example, we might use the Iris data from Scikit-Learn, where each sample is one of three types of flowers that has had the size of its petals and sepals carefully measured (Figure 4-26):

We can see that this scatter plot has given us the ability to simultaneously explore four different dimensions of the data: the (x, y) location of each point corresponds to the sepal length and width, the size of the point is related to the petal width, and the color is related to the particular species of flower. Multicolor and multifeature scatter plots like this can be useful for both exploration and presentation of data.

### plot Versus scatter: A Note on Efficiency

Aside from the different features available in and , why might you choose to use one over the other? While it doesn’t matter as much for small amounts of data, as datasets get larger than a few thousand points, can be noticeably more efficient than . The reason is that has the capability to render a different size and/or color for each point, so the renderer must do the extra work of constructing each point individually. In , on the other hand, the points are always essentially clones of each other, so the work of determining the appearance of the points is done only once for the entire set of data. For large datasets, the difference between these two can lead to vastly different performance, and for this reason, should be preferred over for large datasets.

For any scientific measurement, accurate accounting for errors is nearly as important, if not more important, than accurate reporting of the number itself. For example, imagine that I am using some astrophysical observations to estimate the Hubble Constant, the local measurement of the expansion rate of the universe. I know that the current literature suggests a value of around 71 (km/s)/Mpc, and I measure a value of 74 (km/s)/Mpc with my method. Are the values consistent? The only correct answer, given this information, is this: there is no way to know.

Suppose I augment this information with reported uncertainties: the current literature suggests a value of around 71 2.5 (km/s)/Mpc, and my method has measured a value of 74 5 (km/s)/Mpc. Now are the values consistent? That is a question that can be quantitatively answered.

In visualization of data and results, showing these errors effectively can make a plot convey much more complete information.

### Basic Errorbars

A basic errorbar can be created with a single Matplotlib function call (Figure 4-27):

Here the is a format code controlling the appearance of lines and points, and has the same syntax as the shorthand used in , outlined in “Simple Line Plots” and “Simple Scatter Plots”.

In addition to these basic options, the function has many options to fine-tune the outputs. Using these additional options you can easily customize the aesthetics of your errorbar plot. I often find it helpful, especially in crowded plots, to make the errorbars lighter than the points themselves (Figure 4-28):

In addition to these options, you can also specify horizontal errorbars (), one-sided errorbars, and many other variants. For more information on the options available, refer to the docstring of .

### Continuous Errors

In some situations it is desirable to show errorbars on continuous quantities. Though Matplotlib does not have a built-in convenience routine for this type of application, it’s relatively easy to combine primitives like and for a useful result.

Here we’ll perform a simple Gaussian process regression (GPR), using the Scikit-Learn API (see “Introducing Scikit-Learn” for details). This is a method of fitting a very flexible nonparametric function to data with a continuous measure of the uncertainty. We won’t delve into the details of Gaussian process regression at this point, but will focus instead on how you might visualize such a continuous error measurement:

We now have , , and , which sample the continuous fit to our data. We could pass these to the function as above, but we don’t really want to plot 1,000 points with 1,000 errorbars. Instead, we can use the function with a light color to visualize this continuous error (Figure 4-29):

Note what we’ve done here with the function: we pass an x value, then the lower y-bound, then the upper y-bound, and the result is that the area between these regions is filled.

The resulting figure gives a very intuitive view into what the Gaussian process regression algorithm is doing: in regions near a measured data point, the model is strongly constrained and this is reflected in the small model errors. In regions far from a measured data point, the model is not strongly constrained, and the model errors increase.

For more information on the options available in (and the closely related function), see the function docstring or the Matplotlib documentation.

Finally, if this seems a bit too low level for your taste, refer to “Visualization with Seaborn”, where we discuss the Seaborn package, which has a more streamlined API for visualizing this type of continuous errorbar.

Sometimes it is useful to display three-dimensional data in two dimensions using contours or color-coded regions. There are three Matplotlib functions that can be helpful for this task: for contour plots, for filled contour plots, and for showing images. This section looks at several examples of using these. We’ll start by setting up the notebook for plotting and importing the functions we will use:

### Visualizing a Three-Dimensional Function

We’ll start by demonstrating a contour plot using a function , using the following particular choice for (we’ve seen this before in “Computation on Arrays: Broadcasting”, when we used it as a motivating example for array broadcasting):

A contour plot can be created with the function. It takes three arguments: a grid of x values, a grid of y values, and a grid of z values. The x and y values represent positions on the plot, and the z values will be represented by the contour levels. Perhaps the most straightforward way to prepare such data is to use the function, which builds two-dimensional grids from one-dimensional arrays:

Now let’s look at this with a standard line-only contour plot (Figure 4-30):

Notice that by default when a single color is used, negative values are represented by dashed lines, and positive values by solid lines. Alternatively, you can color-code the lines by specifying a colormap with the argument. Here, we’ll also specify that we want more lines to be drawn—20 equally spaced intervals within the data range (Figure 4-31):

Here we chose the (short for Red-Gray) colormap, which is a good choice for centered data. Matplotlib has a wide range of colormaps available, which you can easily browse in IPython by doing a tab completion on the module:

plt.cm.<TAB>

Our plot is looking nicer, but the spaces between the lines may be a bit distracting. We can change this by switching to a filled contour plot using the function (notice the at the end), which uses largely the same syntax as .

Additionally, we’ll add a command, which automatically creates an additional axis with labeled color information for the plot (Figure 4-32):

The colorbar makes it clear that the black regions are “peaks,” while the red regions are “valleys.”

One potential issue with this plot is that it is a bit “splotchy.” That is, the color steps are discrete rather than continuous, which is not always what is desired. You could remedy this by setting the number of contours to a very high number, but this results in a rather inefficient plot: Matplotlib must render a new polygon for each step in the level. A better way to handle this is to use the function, which interprets a two-dimensional grid of data as an image.

Figure 4-33 shows the result of the following code:

There are a few potential gotchas with , however:

• doesn’t accept an x and y grid, so you must manually specify the extent of the image on the plot.

• by default follows the standard image array definition where the origin is in the upper left, not in the lower left as in most contour plots. This must be changed when showing gridded data.

• will automatically adjust the axis aspect ratio to match the input data; you can change this by setting, for example, to make x and y units match.

Finally, it can sometimes be useful to combine contour plots and image plots. For example, to create the effect shown in Figure 4-34, we’ll use a partially transparent background image (with transparency set via the parameter) and over-plot contours with labels on the contours themselves (using the function):

The combination of these three functions—, , and —gives nearly limitless possibilities for displaying this sort of three-dimensional data within a two-dimensional plot. For more information on the options available in these functions, refer to their docstrings. If you are interested in three-dimensional visualizations of this type of data, see “Three-Dimensional Plotting in Matplotlib”.

A simple histogram can be a great first step in understanding a dataset. Earlier, we saw a preview of Matplotlib’s histogram function (see “Comparisons, Masks, and Boolean Logic”), which creates a basic histogram in one line, once the normal boilerplate imports are done (Figure 4-35):

The function has many options to tune both the calculation and the display; here’s an example of a more customized histogram (Figure 4-36):

The docstring has more information on other customization options available. I find this combination of along with some transparency to be very useful when comparing histograms of several distributions (Figure 4-37):

If you would like to simply compute the histogram (that is, count the number of points in a given bin) and not display it, the function is available:

[ 12 190 468 301 29]

### Two-Dimensional Histograms and Binnings

Just as we create histograms in one dimension by dividing the number line into bins, we can also create histograms in two dimensions by dividing points among two-dimensional bins. We’ll take a brief look at several ways to do this here. We’ll start by defining some data—an and array drawn from a multivariate Gaussian distribution:

### plt.hist2d: Two-dimensional histogram

One straightforward way to plot a two-dimensional histogram is to use Matplotlib’s function (Figure 4-38):

Just as with , has a number of extra options to fine-tune the plot and the binning, which are nicely outlined in the function docstring. Further, just as has a counterpart in , has a counterpart in , which can be used as follows:

For the generalization of this histogram binning in dimensions higher than two, see the function.

### plt.hexbin: Hexagonal binnings

The two-dimensional histogram creates a tessellation of squares across the axes. Another natural shape for such a tessellation is the regular hexagon. For this purpose, Matplotlib provides the routine, which represents a two-dimensional dataset binned within a grid of hexagons (Figure 4-39):

has a number of interesting options, including the ability to specify weights for each point, and to change the output in each bin to any NumPy aggregate (mean of weights, standard deviation of weights, etc.).

### Kernel density estimation

Another common method of evaluating densities in multiple dimensions is kernel density estimation (KDE). This will be discussed more fully in “In-Depth: Kernel Density Estimation”, but for now we’ll simply mention that KDE can be thought of as a way to “smear out” the points in space and add up the result to obtain a smooth function. One extremely quick and simple KDE implementation exists in the package. Here is a quick example of using the KDE on this data (Figure 4-40):

KDE has a smoothing length that effectively slides the knob between detail and smoothness (one example of the ubiquitous bias–variance trade-off). The literature on choosing an appropriate smoothing length is vast: uses a rule of thumb to attempt to find a nearly optimal smoothing length for the input data.

Other KDE implementations are available within the SciPy ecosystem, each with its own various strengths and weaknesses; see, for example, and . For visualizations based on KDE, using Matplotlib tends to be overly verbose. The Seaborn library, discussed in “Visualization with Seaborn”, provides a much more terse API for creating KDE-based visualizations.

Plot legends give meaning to a visualization, assigning labels to the various plot elements. We previously saw how to create a simple legend; here we’ll take a look at customizing the placement and aesthetics of the legend in Matplotlib.

The simplest legend can be created with the command, which automatically creates a legend for any labeled plot elements (Figure 4-41):

But there are many ways we might want to customize such a legend. For example, we can specify the location and turn off the frame (Figure 4-42):

We can use the command to specify the number of columns in the legend (Figure 4-43):

We can use a rounded box () or add a shadow, change the transparency (alpha value) of the frame, or change the padding around the text (Figure 4-44):

### Choosing Elements for the Legend

As we’ve already seen, the legend includes all labeled elements by default. If this is not what is desired, we can fine-tune which elements and labels appear in the legend by using the objects returned by plot commands. The command is able to create multiple lines at once, and returns a list of created line instances. Passing any of these to will tell it which to identify, along with the labels we’d like to specify (Figure 4-45):

I generally find in practice that it is clearer to use the first method, applying labels to the plot elements you’d like to show on the legend (Figure 4-46):

Notice that by default, the legend ignores all elements without a attribute set.

### Legend for Size of Points

Sometimes the legend defaults are not sufficient for the given visualization. For example, perhaps you’re using the size of points to mark certain features of the data, and want to create a legend reflecting this. Here is an example where we’ll use the size of points to indicate populations of California cities. We’d like a legend that specifies the scale of the sizes of the points, and we’ll accomplish this by plotting some labeled data with no entries (Figure 4-47):

The legend will always reference some object that is on the plot, so if we’d like to display a particular shape we need to plot it. In this case, the objects we want (gray circles) are not on the plot, so we fake them by plotting empty lists. Notice too that the legend only lists plot elements that have a label specified.

By plotting empty lists, we create labeled plot objects that are picked up by the legend, and now our legend tells us some useful information. This strategy can be useful for creating more sophisticated visualizations.

Finally, note that for geographic data like this, it would be clearer if we could show state boundaries or other map-specific elements. For this, an excellent choice of tool is Matplotlib’s Basemap add-on toolkit, which we’ll explore in “Geographic Data with Basemap”.

### Multiple Legends

Sometimes when designing a plot you’d like to add multiple legends to the same axes. Unfortunately, Matplotlib does not make this easy: via the standard interface, it is only possible to create a single legend for the entire plot. If you try to create a second legend using or , it will simply override the first one. We can work around this by creating a new legend artist from scratch, and then using the lower-level method to manually add the second artist to the plot (Figure 4-48):

This is a peek into the low-level artist objects that compose any Matplotlib plot. If you examine the source code of (recall that you can do this within the IPython notebook using ) you’ll see that the function simply consists of some logic to create a suitable artist, which is then saved in the attribute and added to the figure when the plot is drawn.

Plot legends identify discrete labels of discrete points. For continuous labels based on the color of points, lines, or regions, a labeled colorbar can be a great tool. In Matplotlib, a colorbar is a separate axes that can provide a key for the meaning of colors in a plot. Because the book is printed in black and white, this section has an accompanying online appendix where you can view the figures in full color (https://github.com/jakevdp/PythonDataScienceHandbook). We’ll start by setting up the notebook for plotting and importing the functions we will use:

As we have seen several times throughout this section, the simplest colorbar can be created with the function (Figure 4-49):

We’ll now discuss a few ideas for customizing these colorbars and using them effectively in various situations.

### Customizing Colorbars

We can specify the colormap using the argument to the plotting function that is creating the visualization (Figure 4-50):

All the available colormaps are in the namespace; using IPython’s tab-completion feature will give you a full list of built-in possibilities:

plt.cm.<TAB>

But being able to choose a colormap is just the first step: more important is how to decide among the possibilities! The choice turns out to be much more subtle than you might initially expect.

### Choosing the colormap

A full treatment of color choice within visualization is beyond the scope of this book, but for entertaining reading on this subject and others, see the article “Ten Simple Rules for Better Figures”. Matplotlib’s online documentation also has an interesting discussion of colormap choice.

Broadly, you should be aware of three different categories of colormaps:

Sequential colormaps

These consist of one continuous sequence of colors (e.g., or ).

Divergent colormaps

These usually contain two distinct colors, which show positive and negative deviations from a mean (e.g., or ).

Qualitative colormaps

These mix colors with no particular sequence (e.g., or ).

The colormap, which was the default in Matplotlib prior to version 2.0, is an example of a qualitative colormap. Its status as the default was quite unfortunate, because qualitative maps are often a poor choice for representing quantitative data. Among the problems is the fact that qualitative maps usually do not display any uniform progression in brightness as the scale increases.

We can see this by converting the colorbar into black and white (Figure 4-51):

Notice the bright stripes in the grayscale image. Even in full color, this uneven brightness means that the eye will be drawn to certain portions of the color range, which will potentially emphasize unimportant parts of the dataset. It’s better to use a colormap such as (the default as of Matplotlib 2.0), which is specifically constructed to have an even brightness variation across the range. Thus, it not only plays well with our color perception, but also will translate well to grayscale printing (Figure 4-52):

If you favor rainbow schemes, another good option for continuous data is the colormap (Figure 4-53):

For other situations, such as showing positive and negative deviations from some mean, dual-color colorbars such as (short for Red-Blue) can be useful. However, as you can see in Figure 4-54, it’s important to note that the positive-negative information will be lost upon translation to grayscale!

We’ll see examples of using some of these color maps as we continue.

There are a large number of colormaps available in Matplotlib; to see a list of them, you can use IPython to explore the submodule. For a more principled approach to colors in Python, you can refer to the tools and documentation within the Seaborn library (see “Visualization with Seaborn”).

### Color limits and extensions

Matplotlib allows for a large range of colorbar customization. The colorbar itself is simply an instance of , so all of the axes and tick formatting tricks we’ve learned are applicable. The colorbar has some interesting flexibility; for example, we can narrow the color limits and indicate the out-of-bounds values with a triangular arrow at the top and bottom by setting the property. This might come in handy, for example, if you’re displaying an image that is subject to noise (Figure 4-55):

Notice that in the left panel, the default color limits respond to the noisy pixels, and the range of the noise completely washes out the pattern we are interested in. In the right panel, we manually set the color limits, and add extensions to indicate values that are above or below those limits. The result is a much more useful visualization of our data.

### Discrete colorbars

Colormaps are by default continuous, but sometimes you’d like to represent discrete values. The easiest way to do this is to use the function, and pass the name of a suitable colormap along with the number of desired bins (Figure 4-56):

The discrete version of a colormap can be used just like any other colormap.

### Example: Handwritten Digits

For an example of where this might be useful, let’s look at an interesting visualization of some handwritten digits data. This data is included in Scikit-Learn, and consists of nearly 2,000 8×8 thumbnails showing various handwritten digits.

For now, let’s start by downloading the digits data and visualizing several of the example images with (Figure 4-57):

Because each digit is defined by the hue of its 64 pixels, we can consider each digit to be a point lying in 64-dimensional space: each dimension represents the brightness of one pixel. But visualizing relationships in such high-dimensional spaces can be extremely difficult. One way to approach this is to use a dimensionality reduction technique such as manifold learning to reduce the dimensionality of the data while maintaining the relationships of interest. Dimensionality reduction is an example of unsupervised machine learning, and we will discuss it in more detail in “What Is Machine Learning?”.

Deferring the discussion of these details, let’s take a look at a two-dimensional manifold learning projection of this digits data (see “In-Depth: Manifold Learning” for details):

We’ll use our discrete colormap to view the results, setting the and to improve the aesthetics of the resulting colorbar (Figure 4-58):

The projection also gives us some interesting insights on the relationships within the dataset: for example, the ranges of 5 and 3 nearly overlap in this projection, indicating that some handwritten fives and threes are difficult to distinguish, and therefore more likely to be confused by an automated classification algorithm. Other values, like 0 and 1, are more distantly separated, and therefore much less likely to be confused. This observation agrees with our intuition, because 5 and 3 look much more similar than do 0 and 1.

Sours: https://www.oreilly.com/library/view/python-data-science/9781491912126/ch04.html
Matrix correlation plot using matplotlib, python

.

### You will also be interested:

.

160 161 162 163 164