Common Charts and Graphs
Charts and graphs help people understand data. Through effective use of data visualization techniques, you can create a graphical representation of data that demonstrates complex concepts – quickly and effectively. Graphs and charts help you show trends, spot patterns, find relationships and locate outliers in even the most immense data sets. Here are some of the most popular charts and graphs used to visualize data.
Bar charts consist of a grid and some vertical or horizontal columns (bars). Each column represents quantitative data. Bar charts are most commonly used for comparing the quantities of different categories or groups. Values of a category are represented using bars, and they can be configured with either vertical or horizontal bars with the length or height of each bar representing the value.
Box plots (also called box-and-whisker plots) represent the distribution of data values by using a rectangular box and lines called whiskers. They display five statistics (the minimum, lower quartile, median, upper quartile and the maximum) that summarize the distribution of a set of data. The lower quartile (25th percentile) is represented by the lower edge of the box, and the upper quartile (75th percentile) is represented by the upper edge of the box. The median (50th percentile) is represented by a central line that divides the box into sections. The extreme values are represented by whiskers that extend out from the edges of the box.
The bottom and top edges of the box indicate the interquartile range (IQR). That is the range of values that are between the first and third quartiles (the 25th and 75th percentiles). The marker inside the box indicates the mean value. The line inside the box indicates the median value.
You can optionally enable outliers, which are data points whose distance from the interquartile range is greater than 1.5 times the size of the interquartile range. The whiskers (lines protruding from the box) indicate the range of values that are outside of the interquartile range. If you do not enable outliers, then the whiskers extend to the maximum and minimum values in the plot. If you enable outliers, then the whiskers indicate the range of values that are outside of the interquartile range but are close enough not to be considered outliers. If there are a large number of outlier points, then the range of outlier values is represented by a bar. The data tip for the bar displays additional information about the outlier points.
Bubble plots are variations of a scatter plot in which the data markers are replaced with bubbles, with each bubble representing an observation (or group of observations). The values of two measures are represented by the position on the graph axes, and the value of the third measure is represented by the marker size. These are useful for data sets with many values or when the values differ by orders of magnitude. Animated bubble plots are a good way to display changing data over time.
Correlation matrices combine big data and fast response times to quickly identify which variables among millions or billions are related. They also show how strong the relationship is between the variables. A correlation matrix displays the degree of correlation between multiple intersections of measures as a matrix of rectangular cells. Each cell in the matrix represents the intersection of two measures, and the color of the cell indicates the degree of correlation between those two measures. A positive correlation value means that as one variable increases, the second variable increases. A negative correlation means that as one variable increases, the second variable decreases.
Cross-tabulation (or crosstab) tables show frequency distributions or other aggregate statistics for the intersections of two or more category data items. If the crosstab does not contain measures, then each cell contains the frequency of an intersection of category values. Crosstabs enable you to examine data for intersections of hierarchy nodes or category values. You can rearrange the rows and columns and apply sorting.
Geo maps display data as a bubble plot that is overlaid on a geographic map. Each bubble is located either at the center of a geographic region or at the coordinates of a location. To display a geo map, you must define one or more of your categories as a geography data item.
Heat maps display the distribution of values for two data items using a table with colored cells. Colors are used to communicate relationships between data values that would be harder to understand if presented in a spreadsheet.
Histograms are variations of a bar chart that use rectangles to show the frequency of data items in successive numerical intervals of equal size. The bar height can represent either the exact number of observations or the percentage of all observations for each value range. These are often used to show at a quick glance the distribution of values in large data sets.
Line charts show the relationship of one variable to another by using a line that connects the data values. They are most often used to track changes or trends over time. Stacked line charts are used to compare the trend or individual values for several variables.
Pareto charts are a specialized type of vertical bar chart where the values of the dependent variables are plotted in decreasing order of frequency from left to right. These can be used to quickly identify when certain issues need attention. The taller bars on the chart (which represent frequency) illustrate which variables have the greatest cumulative effect.
Scatter plots (or X-Y plots) are two-dimensional plots that show the joint variation of two (or three) variables from a group of table rows. The coordinates of each point in the plot correspond to the data values for a single table row (observation). When you assign more than two measures, the visualization displays as a scatter plot matrix. A scatter plot matrix is a series of scatter plots that compare each pairing of measures. They are useful for examining the relationships, or correlations, between numeric data items. Scatter plots can help you gain a sense of how spread out the data is and how closely related the data points are. They can also quickly identify patterns present in the distribution of data.
Tree maps are a variation of heat maps that use rectangles (called tiles) to represent data components. The largest rectangle represents the dominant division of the data and smaller rectangles represent subdivisions. The color of each rectangle can indicate the value of an additional measure. A tree map could be used to represent sales data where the tile sizes vary according to the number of orders invoiced and the tile colors are derived from a color gradient that represents low to high sales figures.