当前位置：首页 > news >正文

织梦两个网站网站开发项目

news 2026/1/13 19:55:55

织梦两个网站,网站开发项目,湘潭自助建站系统,品牌推广工作内容数据聚合、汇总和可视化是支撑数据分析领域的三大支柱。长久以来#xff0c;数据可视化都是一个强有力的工具#xff0c;被业界广泛使用#xff0c;却受限于 2 维。在本文中#xff0c;作者将探索一些有效的多维数据可视化策略#xff08;范围从 1 维到 6 维#xff09;。…数据聚合、汇总和可视化是支撑数据分析领域的三大支柱。长久以来数据可视化都是一个强有力的工具被业界广泛使用却受限于 2 维。在本文中作者将探索一些有效的多维数据可视化策略范围从 1 维到 6 维。一、可视化介绍描述性分析descriptive analytics是任何分析生命周期的数据科学项目或特定研究的核心组成部分之一。数据聚合aggregation、汇总summarization和可视化visualization是支撑数据分析领域的主要支柱。从传统商业智能Business Intelligence开始甚至到如今人工智能时代数据可视化都是一个强有力的工具由于其能有效抽取正确的信息同时清楚容易地理解和解释结果可视化被业界组织广泛使用。然而处理多维数据集通常具有 2 个以上属性开始引起问题因为我们的数据分析和通信的媒介通常限于 2 个维度。在本文中我们将探索一些有效的多维数据可视化策略范围从 1 维到 6 维。二、可视化动机「一图胜千言」这是一句我们熟悉的非常流行的英语习语可以充当将数据可视化作为分析的有效工具的灵感和动力。永远记住「有效的数据可视化既是一门艺术也是一门科学。」在开始之前我还要提及下面一句非常相关的引言它强调了数据可视化的必要性。「一张图片的最大价值在于它迫使我们注意到我们从未期望看到的东西。」 ——John Tukey 三、快速回顾可视化本文假设一般读者知道用于绘图和可视化数据的基本图表类型因此这里不再赘述但在本文随后的实践中我们将会涉及大部分图表类型。著名的可视化先驱和统计学家 Edward Tufte 说过数据可视化应该在数据的基础上以清晰、精确和高效的方式传达数据模式和洞察信息。结构化数据通常包括由行和特征表征的数据观测值或由列表征的数据属性。每列也可以被称为数据集的某特定维度。最常见的数据类型包括连续型数值数据和离散型分类数据。因此任何数据可视化将基本上以散点图、直方图、箱线图等简单易懂的形式描述一个或多个数据属性。本文将涵盖单变量1 维和多变量多维数据可视化策略。这里将使用 Python 机器学习生态系统我们建议先检查用于数据分析和可视化的框架包括 pandas、matplotlib、seaborn、plotly 和 bokeh。除此之外如果你有兴趣用数据制作精美而有意义的可视化文件那么了解 D3.jshttps://d3js.org/也是必须的。有兴趣的读者可以阅读 Edward Tufte 的「The Visual Display of Quantitative Information」。闲话至此我们来看看可视化和代码吧别在这儿谈论理论和概念了让我们开始进入正题吧。我们将使用 UCI 机器学习库https://archive.ics.uci.edu/ml/index.php中的 Wine Quality Data Set。这些数据实际上是由两个数据集组成的这两个数据集描述了葡萄牙「Vinho Verde」葡萄酒中红色和白色酒的各种成分。本文中的所有分析都在我的 GitHub 存储库中你可以用 Jupyter Notebook 中的代码来尝试一下我们将首先加载以下必要的依赖包进行分析。 import pandas as pd import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D import matplotlib as mpl import numpy as np import seaborn as sns %matplotlib inline 我们将主要使用 matplotlib 和 seaborn 作为我们的可视化框架但你可以自由选择并尝试任何其它框架。首先进行基本的数据预处理步骤。 white_wine pd.read_csv(winequality-white.csv, sep;) red_wine pd.read_csv(winequality-red.csv, sep;) # store wine type as an attribute red_wine[wine_type] red white_wine[wine_type] white # bucket wine quality scores into qualitative quality labels red_wine[quality_label] red_wine[quality].apply(lambda value: low if value 5 else medium if value 7 else high) red_wine[quality_label] pd.Categorical(red_wine[quality_label], categories[low, medium, high]) white_wine[quality_label] white_wine[quality].apply(lambda value: low if value 5 else medium if value 7 else high) white_wine[quality_label] pd.Categorical(white_wine[quality_label], categories[low, medium, high]) # merge red and white wine datasets wines pd.concat([red_wine, white_wine]) # re-shuffle records just to randomize data points wines wines.sample(frac1, random_state42).reset_index(dropTrue) 我们通过合并有关红、白葡萄酒样本的数据集来创建单个葡萄酒数据框架。我们还根据葡萄酒样品的质量属性创建一个新的分类变量 quality_label。现在我们来看看数据前几行。 wines.head() 葡萄酒质量数据集很明显我们有几个葡萄酒样本的数值和分类属性。每个观测样本属于红葡萄酒或白葡萄酒样品属性是从物理化学测试中测量和获得的特定属性或性质。如果你想了解每个属性属性对应的变量名称一目了然详细的解释你可以查看 Jupyter Notebook。让我们快速对这些感兴趣的属性进行基本的描述性概括统计。 subset_attributes [residual sugar, total sulfur dioxide, sulphates, alcohol, volatile acidity, quality] rs round(red_wine[subset_attributes].describe(),2) ws round(white_wine[subset_attributes].describe(),2) pd.concat([rs, ws], axis1, keys[Red Wine Statistics, White Wine Statistics]) 葡萄酒类型的基本描述性统计比较这些不同类型的葡萄酒样本的统计方法相当容易。注意一些属性的明显差异。稍后我们将在一些可视化中强调这些内容。 1.单变量分析单变量分析基本上是数据分析或可视化的最简单形式因为只关心分析一个数据属性或变量并将其可视化1 维。可视化 1 维数据1-D 使所有数值数据及其分布可视化的最快、最有效的方法之一是利用 pandas 画直方图histogram。 wines.hist(bins15, colorsteelblue, edgecolorblack, linewidth1.0, xlabelsize8, ylabelsize8, gridFalse) plt.tight_layout(rect(0, 0, 1.2, 1.2)) 将属性作为 1 维数据可视化上图给出了可视化任何属性的基本数据分布的一个好主意。让我们进一步可视化其中一个连续型数值属性。直方图或核密度图能够很好地帮助理解该属性数据的分布。 # Histogram fig plt.figure(figsize (6,4)) title fig.suptitle(Sulphates Content in Wine, fontsize14) fig.subplots_adjust(top0.85, wspace0.3) ax fig.add_subplot(1,1, 1) ax.set_xlabel(Sulphates) ax.set_ylabel(Frequency) ax.text(1.2, 800, r$\mu$str(round(wines[sulphates].mean(),2)), fontsize12) freq, bins, patches ax.hist(wines[sulphates], colorsteelblue, bins15, edgecolorblack, linewidth1) # Density Plot fig plt.figure(figsize (6, 4)) title fig.suptitle(Sulphates Content in Wine, fontsize14) fig.subplots_adjust(top0.85, wspace0.3) ax1 fig.add_subplot(1,1, 1) ax1.set_xlabel(Sulphates) ax1.set_ylabel(Frequency) sns.kdeplot(wines[sulphates], axax1, shadeTrue, colorsteelblue) 可视化 1 维连续型数值数据从上面的图表中可以看出葡萄酒中硫酸盐的分布存在明显的右偏right skew。可视化一个离散分类型数据属性稍有不同条形图是bar plot最有效的方法之一。你也可以使用饼图pie-chart但一般来说要尽量避免尤其是当不同类别的数量超过 3 个时。 # Histogram fig plt.figure(figsize (6,4)) title fig.suptitle(Sulphates Content in Wine, fontsize14) fig.subplots_adjust(top0.85, wspace0.3) ax fig.add_subplot(1,1, 1) ax.set_xlabel(Sulphates) ax.set_ylabel(Frequency) ax.text(1.2, 800, r$\mu$str(round(wines[sulphates].mean(),2)), fontsize12) freq, bins, patches ax.hist(wines[sulphates], colorsteelblue, bins15, edgecolorblack, linewidth1) # Density Plot fig plt.figure(figsize (6, 4)) title fig.suptitle(Sulphates Content in Wine, fontsize14) fig.subplots_adjust(top0.85, wspace0.3) ax1 fig.add_subplot(1,1, 1) ax1.set_xlabel(Sulphates) ax1.set_ylabel(Frequency) sns.kdeplot(wines[sulphates], axax1, shadeTrue, colorsteelblue) 可视化 1 维离散分类型数据现在我们继续分析更高维的数据。 2.多变量分析多元分析才是真正有意思并且有复杂性的领域。这里我们分析多个数据维度或属性2 个或更多。多变量分析不仅包括检查分布还包括这些属性之间的潜在关系、模式和相关性。你也可以根据需要解决的问题利用推断统计inferential statistics和假设检验检查不同属性、群体等的统计显著性significance。可视化 2 维数据2-D 检查不同数据属性之间的潜在关系或相关性的最佳方法之一是利用配对相关性矩阵pair-wise correlation matrix并将其可视化为热力图。 # Correlation Matrix Heatmap f, ax plt.subplots(figsize(10, 6)) corr wines.corr() hm sns.heatmap(round(corr,2), annotTrue, axax, cmapcoolwarm,fmt.2f, linewidths.05) f.subplots_adjust(top0.93) t f.suptitle(Wine Attributes Correlation Heatmap, fontsize14) 用相关性热力图可视化 2 维数据热力图中的梯度根据相关性的强度而变化你可以很容易发现彼此之间具有强相关性的潜在属性。另一种可视化的方法是在感兴趣的属性之间使用配对散点图。 # Correlation Matrix Heatmap f, ax plt.subplots(figsize(10, 6)) corr wines.corr() hm sns.heatmap(round(corr,2), annotTrue, axax, cmapcoolwarm,fmt.2f, linewidths.05) f.subplots_adjust(top0.93) t f.suptitle(Wine Attributes Correlation Heatmap, fontsize14) 用配对散点图可视化 2 维数据根据上图可以看到散点图也是观察数据属性的 2 维潜在关系或模式的有效方式。另一种将多元数据可视化为多个属性的方法是使用平行坐标图。 # Correlation Matrix Heatmap f, ax plt.subplots(figsize(10, 6)) corr wines.corr() hm sns.heatmap(round(corr,2), annotTrue, axax, cmapcoolwarm,fmt.2f, linewidths.05) f.subplots_adjust(top0.93) t f.suptitle(Wine Attributes Correlation Heatmap, fontsize14) 用平行坐标图可视化多维数据基本上在如上所述的可视化中点被表征为连接的线段。每条垂直线代表一个数据属性。所有属性中的一组完整的连接线段表征一个数据点。因此趋于同一类的点将会更加接近。仅仅通过观察就可以清楚看到与白葡萄酒相比红葡萄酒的密度略高。与红葡萄酒相比白葡萄酒的残糖和二氧化硫总量也较高红葡萄酒的固定酸度高于白葡萄酒。查一下我们之前得到的统计表中的统计数据看看能否验证这个假设因此让我们看看可视化两个连续型数值属性的方法。散点图和联合分布图joint plot是检查模式、关系以及属性分布的特别好的方法。 # Scatter Plot plt.scatter(wines[sulphates], wines[alcohol], alpha0.4, edgecolorsw) plt.xlabel(Sulphates) plt.ylabel(Alcohol) plt.title(Wine Sulphates - Alcohol Content,y1.05) # Joint Plot jp sns.jointplot(xsulphates, yalcohol, datawines, kindreg, space0, size5, ratio4) 使用散点图和联合分布图可视化 2 维连续型数值数据散点图在上图左侧联合分布图在右侧。就像我们提到的那样你可以查看联合分布图中的相关性、关系以及分布。如何可视化两个连续型数值属性一种方法是为分类维度画单独的图子图或分面facet。 # Using subplots or facets along with Bar Plots fig plt.figure(figsize (10, 4)) title fig.suptitle(Wine Type - Quality, fontsize14) fig.subplots_adjust(top0.85, wspace0.3) # red wine - wine quality ax1 fig.add_subplot(1,2, 1) ax1.set_title(Red Wine) ax1.set_xlabel(Quality) ax1.set_ylabel(Frequency) rw_q red_wine[quality].value_counts() rw_q (list(rw_q.index), list(rw_q.values)) ax1.set_ylim([0, 2500]) ax1.tick_params(axisboth, whichmajor, labelsize8.5) bar1 ax1.bar(rw_q[0], rw_q[1], colorred, edgecolorblack, linewidth1) # white wine - wine quality ax2 fig.add_subplot(1,2, 2) ax2.set_title(White Wine) ax2.set_xlabel(Quality) ax2.set_ylabel(Frequency) ww_q white_wine[quality].value_counts() ww_q (list(ww_q.index), list(ww_q.values)) ax2.set_ylim([0, 2500]) ax2.tick_params(axisboth, whichmajor, labelsize8.5) bar2 ax2.bar(ww_q[0], ww_q[1], colorwhite, edgecolorblack, linewidth1) 使用条形图和子图可视化 2 维离散型分类数据虽然这是一种可视化分类数据的好方法但正如所见利用 matplotlib 需要编写大量的代码。另一个好方法是在单个图中为不同的属性画堆积条形图或多个条形图。可以很容易地利用 seaborn 做到。 # Multi-bar Plot cp sns.countplot(xquality, huewine_type, datawines, palette{red: #FF9999, white: #FFE888}) 在一个条形图中可视化 2 维离散型分类数据这看起来更清晰你也可以有效地从单个图中比较不同的类别。让我们看看可视化 2 维混合属性大多数兼有数值和分类。一种方法是使用分图\子图与直方图或核密度图。 # facets with histograms fig plt.figure(figsize (10,4)) title fig.suptitle(Sulphates Content in Wine, fontsize14) fig.subplots_adjust(top0.85, wspace0.3) ax1 fig.add_subplot(1,2, 1) ax1.set_title(Red Wine) ax1.set_xlabel(Sulphates) ax1.set_ylabel(Frequency) ax1.set_ylim([0, 1200]) ax1.text(1.2, 800, r$\mu$str(round(red_wine[sulphates].mean(),2)), fontsize12) r_freq, r_bins, r_patches ax1.hist(red_wine[sulphates], colorred, bins15, edgecolorblack, linewidth1) ax2 fig.add_subplot(1,2, 2) ax2.set_title(White Wine) ax2.set_xlabel(Sulphates) ax2.set_ylabel(Frequency) ax2.set_ylim([0, 1200]) ax2.text(0.8, 800, r$\mu$str(round(white_wine[sulphates].mean(),2)), fontsize12) w_freq, w_bins, w_patches ax2.hist(white_wine[sulphates], colorwhite, bins15, edgecolorblack, linewidth1) # facets with density plots fig plt.figure(figsize (10, 4)) title fig.suptitle(Sulphates Content in Wine, fontsize14) fig.subplots_adjust(top0.85, wspace0.3) ax1 fig.add_subplot(1,2, 1) ax1.set_title(Red Wine) ax1.set_xlabel(Sulphates) ax1.set_ylabel(Density) sns.kdeplot(red_wine[sulphates], axax1, shadeTrue, colorr) ax2 fig.add_subplot(1,2, 2) ax2.set_title(White Wine) ax2.set_xlabel(Sulphates) ax2.set_ylabel(Density) sns.kdeplot(white_wine[sulphates], axax2, shadeTrue, colory) 利用分面和直方图\核密度图可视化 2 维混合属性虽然这很好但是我们再一次编写了大量代码我们可以通过利用 seaborn 避免这些在单个图表中画出这些图。 # Using multiple Histograms fig plt.figure(figsize (6, 4)) title fig.suptitle(Sulphates Content in Wine, fontsize14) fig.subplots_adjust(top0.85, wspace0.3) ax fig.add_subplot(1,1, 1) ax.set_xlabel(Sulphates) ax.set_ylabel(Frequency) g sns.FacetGrid(wines, huewine_type, palette{red: r, white: y}) g.map(sns.distplot, sulphates, kdeFalse, bins15, axax) ax.legend(titleWine Type) plt.close(2) 利用多维直方图可视化 2 维混合属性可以看到上面生成的图形清晰简洁我们可以轻松地比较各种分布。除此之外箱线图box plot是根据分类属性中的不同数值有效描述数值数据组的另一种方法。箱线图是了解数据中四分位数值以及潜在异常值的好方法。 # Box Plots f, (ax) plt.subplots(1, 1, figsize(12, 4)) f.suptitle(Wine Quality - Alcohol Content, fontsize14) sns.boxplot(xquality, yalcohol, datawines, axax) ax.set_xlabel(Wine Quality,size 12,alpha0.8) ax.set_ylabel(Wine Alcohol %,size 12,alpha0.8) 2 维混合属性的有效可视化方法——箱线图另一个类似的可视化是小提琴图这是使用核密度图显示分组数值数据的另一种有效方法描绘了数据在不同值下的概率密度。 # Violin Plots f, (ax) plt.subplots(1, 1, figsize(12, 4)) f.suptitle(Wine Quality - Sulphates Content, fontsize14) sns.violinplot(xquality, ysulphates, datawines, axax) ax.set_xlabel(Wine Quality,size 12,alpha0.8) ax.set_ylabel(Wine Sulphates,size 12,alpha0.8) 2 维混合属性的有效可视化方法——小提琴图你可以清楚看到上面的不同酒品质类别的葡萄酒硫酸盐的密度图。将 2 维数据可视化非常简单直接但是随着维数属性数量的增加数据开始变得复杂。原因是因为我们受到显示媒介和环境的双重约束。对于 3 维数据可以通过在图表中采用 z 轴或利用子图和分面来引入深度的虚拟坐标。但是对于 3 维以上的数据来说更难以直观地表征。高于 3 维的最好方法是使用图分面、颜色、形状、大小、深度等等。你还可以使用时间作为维度为随时间变化的属性制作一段动画这里时间是数据中的维度。看看 Hans Roslin 的精彩演讲就会获得相同的想法可视化 3 维数据3-D 这里研究有 3 个属性或维度的数据我们可以通过考虑配对散点图并引入颜色或色调将分类维度中的值分离出来。 # Scatter Plot with Hue for visualizing data in 3-D cols [density, residual sugar, total sulfur dioxide, fixed acidity, wine_type] pp sns.pairplot(wines[cols], huewine_type, size1.8, aspect1.8, palette{red: #FF9999, white: #FFE888}, plot_kwsdict(edgecolorblack, linewidth0.5)) fig pp.fig fig.subplots_adjust(top0.93, wspace0.3) t fig.suptitle(Wine Attributes Pairwise Plots, fontsize14) 用散点图和色调颜色可视化 3 维数据上图可以查看相关性和模式也可以比较葡萄酒组。就像我们可以清楚地看到白葡萄酒的总二氧化硫和残糖比红葡萄酒高。让我们来看看可视化 3 个连续型数值属性的策略。一种方法是将 2 个维度表征为常规长度x 轴和宽度y 轴并且将第 3 维表征为深度z 轴的概念。 # Visualizing 3-D numeric data with Scatter Plots # length, breadth and depth fig plt.figure(figsize(8, 6)) ax fig.add_subplot(111, projection3d) xs wines[residual sugar] ys wines[fixed acidity] zs wines[alcohol] ax.scatter(xs, ys, zs, s50, alpha0.6, edgecolorsw) ax.set_xlabel(Residual Sugar) ax.set_ylabel(Fixed Acidity) ax.set_zlabel(Alcohol) 通过引入深度的概念来可视化 3 维数值数据我们还可以利用常规的 2 维坐标轴并将尺寸大小的概念作为第 3 维本质上是气泡图其中点的尺寸大小表征第 3 维的数量。 # Visualizing 3-D numeric data with a bubble chart # length, breadth and size plt.scatter(wines[fixed acidity], wines[alcohol], swines[residual sugar]*25, alpha0.4, edgecolorsw) plt.xlabel(Fixed Acidity) plt.ylabel(Alcohol) plt.title(Wine Alcohol Content - Fixed Acidity - Residual Sugar,y1.05) 通过引入尺寸大小的概念来可视化 3 维数值数据因此你可以看到上面的图表不是一个传统的散点图而是点气泡大小基于不同残糖量的的气泡图。当然并不总像这种情况可以发现数据明确的模式我们看到其它两个维度的大小也不同。为了可视化 3 个离散型分类属性我们可以使用常规的条形图可以利用色调的概念以及分面或子图表征额外的第 3 个维度。seaborn 框架帮助我们最大程度地减少代码并高效地绘图。 # Visualizing 3-D categorical data using bar plots # leveraging the concepts of hue and facets fc sns.factorplot(xquality, huewine_type, colquality_label, datawines, kindcount, palette{red: #FF9999, white: #FFE888}) 通过引入色调和分面的概念可视化 3 维分类数据上面的图表清楚地显示了与每个维度相关的频率可以看到通过图表能够容易有效地理解相关内容。考虑到可视化 3 维混合属性我们可以使用色调的概念来将其中一个分类属性可视化同时使用传统的如散点图来可视化数值属性的 2 个维度。 # Visualizing 3-D mix data using scatter plots # leveraging the concepts of hue for categorical dimension jp sns.pairplot(wines, x_vars[sulphates], y_vars[alcohol], size4.5, huewine_type, palette{red: #FF9999, white: #FFE888}, plot_kwsdict(edgecolork, linewidth0.5)) # we can also view relationships\correlations as needed lp sns.lmplot(xsulphates, yalcohol, huewine_type, palette{red: #FF9999, white: #FFE888}, datawines, fit_regTrue, legendTrue, scatter_kwsdict(edgecolork, linewidth0.5)) 通过利用散点图和色调的概念可视化 3 维混合属因此色调作为类别或群体的良好区分虽然如上图观察没有相关性或相关性非常弱但从这些图中我们仍可以理解与白葡萄酒相比红葡萄酒的硫酸盐含量较高。你也可以使用核密度图代替散点图来理解 3 维数据。 # Visualizing 3-D mix data using kernel density plots # leveraging the concepts of hue for categorical dimension ax sns.kdeplot(white_wine[sulphates], white_wine[alcohol], cmapYlOrBr, shadeTrue, shade_lowestFalse) ax sns.kdeplot(red_wine[sulphates], red_wine[alcohol], cmapReds, shadeTrue, shade_lowestFalse) 通过利用核密度图和色调的概念可视化 3 维混合属性与预期一致且相当明显红葡萄酒样品比白葡萄酒具有更高的硫酸盐含量。你还可以根据色调强度查看密度浓度。如果我们正在处理有多个分类属性的 3 维数据我们可以利用色调和其中一个常规轴进行可视化并使用如箱线图或小提琴图来可视化不同的数据组。 # Visualizing 3-D mix data using violin plots # leveraging the concepts of hue and axes for 1 categorical dimensions f, (ax1, ax2) plt.subplots(1, 2, figsize(14, 4)) f.suptitle(Wine Type - Quality - Acidity, fontsize14) sns.violinplot(xquality, yvolatile acidity, datawines, innerquart, linewidth1.3,axax1) ax1.set_xlabel(Wine Quality,size 12,alpha0.8) ax1.set_ylabel(Wine Volatile Acidity,size 12,alpha0.8) sns.violinplot(xquality, yvolatile acidity, huewine_type, datawines, splitTrue, innerquart, linewidth1.3, palette{red: #FF9999, white: white}, axax2) ax2.set_xlabel(Wine Quality,size 12,alpha0.8) ax2.set_ylabel(Wine Volatile Acidity,size 12,alpha0.8) l plt.legend(locupper right, titleWine Type) 通过利用分图小提琴图和色调的概念来可视化 3 维混合属性在上图中我们可以看到在右边的 3 维可视化图中我们用 x 轴表示葡萄酒质量wine_type 用色调表征。我们可以清楚地看到一些有趣的见解例如与白葡萄酒相比红葡萄酒的挥发性酸度更高。你也可以考虑使用箱线图来代表具有多个分类变量的混合属性。 # Visualizing 3-D mix data using box plots # leveraging the concepts of hue and axes for 1 categorical dimensions f, (ax1, ax2) plt.subplots(1, 2, figsize(14, 4)) f.suptitle(Wine Type - Quality - Alcohol Content, fontsize14) sns.boxplot(xquality, yalcohol, huewine_type, datawines, palette{red: #FF9999, white: white}, axax1) ax1.set_xlabel(Wine Quality,size 12,alpha0.8) ax1.set_ylabel(Wine Alcohol %,size 12,alpha0.8) sns.boxplot(xquality_label, yalcohol, huewine_type, datawines, palette{red: #FF9999, white: white}, axax2) ax2.set_xlabel(Wine Quality Class,size 12,alpha0.8) ax2.set_ylabel(Wine Alcohol %,size 12,alpha0.8) l plt.legend(locbest, titleWine Type) 通过利用箱线图和色调的概念可视化 3 维混合属性我们可以看到对于质量和 quality_label 属性葡萄酒酒精含量都会随着质量的提高而增加。另外红葡萄酒与相同品质类别的白葡萄酒相比具有更高的酒精含量中位数。然而如果检查质量等级我们可以看到对于较低等级的葡萄酒3 和 4白葡萄酒酒精含量中位数大于红葡萄酒样品。否则红葡萄酒与白葡萄酒相比似乎酒精含量中位数略高。可视化 4 维数据4-D 基于上述讨论我们利用图表的各个组件可视化多个维度。一种可视化 4 维数据的方法是在传统图如散点图中利用深度和色调表征特定的数据维度。 # Visualizing 4-D mix data using scatter plots # leveraging the concepts of hue and depth fig plt.figure(figsize(8, 6)) t fig.suptitle(Wine Residual Sugar - Alcohol Content - Acidity - Type, fontsize14) ax fig.add_subplot(111, projection3d) xs list(wines[residual sugar]) ys list(wines[alcohol]) zs list(wines[fixed acidity]) data_points [(x, y, z) for x, y, z in zip(xs, ys, zs)] colors [red if wt red else yellow for wt in list(wines[wine_type])] for data, color in zip(data_points, colors): x, y, z data ax.scatter(x, y, z, alpha0.4, ccolor, edgecolorsnone, s30) ax.set_xlabel(Residual Sugar) ax.set_ylabel(Alcohol) ax.set_zlabel(Fixed Acidity) 通过利用散点图以及色调和深度的概念可视化 4 维数据 wine_type 属性由上图中的色调表征得相当明显。此外由于图的复杂性解释这些可视化开始变得困难但我们仍然可以看出例如红葡萄酒的固定酸度更高白葡萄酒的残糖更高。当然如果酒精和固定酸度之间有某种联系我们可能会看到一个逐渐增加或减少的数据点趋势。另一个策略是使用二维图但利用色调和数据点大小作为数据维度。通常情况下这将类似于气泡图等我们先前可视化的图表。 # Visualizing 4-D mix data using bubble plots # leveraging the concepts of hue and size size wines[residual sugar]*25 fill_colors [#FF9999 if wtred else #FFE888 for wt in list(wines[wine_type])] edge_colors [red if wtred else orange for wt in list(wines[wine_type])] plt.scatter(wines[fixed acidity], wines[alcohol], ssize, alpha0.4, colorfill_colors, edgecolorsedge_colors) plt.xlabel(Fixed Acidity) plt.ylabel(Alcohol) plt.title(Wine Alcohol Content - Fixed Acidity - Residual Sugar - Type,y1.05) 通过利用气泡图以及色调和大小的概念可视化 4 维数据我们用色调代表 wine_type 和数据点大小代表残糖。我们确实看到了与前面图表中观察到的相似模式白葡萄酒气泡尺寸更大表征了白葡萄酒的残糖值更高。如果我们有多于两个分类属性表征可在常规的散点图描述数值数据的基础上利用色调和分面来描述这些属性。我们来看几个实例。 # Visualizing 4-D mix data using scatter plots # leveraging the concepts of hue and facets for 1 categorical attributes g sns.FacetGrid(wines, colwine_type, huequality_label, col_order[red, white], hue_order[low, medium, high], aspect1.2, size3.5, palettesns.light_palette(navy, 4)[1:]) g.map(plt.scatter, volatile acidity, alcohol, alpha0.9, edgecolorwhite, linewidth0.5, s100) fig g.fig fig.subplots_adjust(top0.8, wspace0.3) fig.suptitle(Wine Type - Alcohol - Quality - Acidity, fontsize14) l g.add_legend(titleWine Quality Class) 通过利用散点图以及色调和分面的概念可视化 4 维数据这种可视化的有效性使得我们可以轻松识别多种模式。白葡萄酒的挥发酸度较低同时高品质葡萄酒具有较低的酸度。也基于白葡萄酒样本高品质的葡萄酒有更高的酒精含量和低品质的葡萄酒有最低的酒精含量让我们借助一个类似实例并建立一个 4 维数据的可视化。 # Visualizing 4-D mix data using scatter plots # leveraging the concepts of hue and facets for 1 categorical attributes g sns.FacetGrid(wines, colwine_type, huequality_label, col_order[red, white], hue_order[low, medium, high], aspect1.2, size3.5, palettesns.light_palette(navy, 4)[1:]) g.map(plt.scatter, volatile acidity, alcohol, alpha0.9, edgecolorwhite, linewidth0.5, s100) fig g.fig fig.subplots_adjust(top0.8, wspace0.3) fig.suptitle(Wine Type - Alcohol - Quality - Acidity, fontsize14) l g.add_legend(titleWine Quality Class) 通过利用散点图以及色调和分面的概念可视化 4 维数据我们清楚地看到高品质的葡萄酒有较低的二氧化硫含量这是非常相关的与葡萄酒成分的相关领域知识一致。我们也看到红葡萄酒的二氧化硫总量低于白葡萄酒。在几个数据点中红葡萄酒的挥发性酸度水平较高。可视化 5 维数据5-D 我们照旧遵从上文提出的策略要想可视化 5 维数据我们要利用各种绘图组件。我们使用深度、色调、大小来表征其中的三个维度。其它两维仍为常规轴。因为我们还会用到大小这个概念并借此画出一个三维气泡图。 # Visualizing 5-D mix data using bubble charts # leveraging the concepts of hue, size and depth fig plt.figure(figsize(8, 6)) ax fig.add_subplot(111, projection3d) t fig.suptitle(Wine Residual Sugar - Alcohol Content - Acidity - Total Sulfur Dioxide - Type, fontsize14) xs list(wines[residual sugar]) ys list(wines[alcohol]) zs list(wines[fixed acidity]) data_points [(x, y, z) for x, y, z in zip(xs, ys, zs)] ss list(wines[total sulfur dioxide]) colors [red if wt red else yellow for wt in list(wines[wine_type])] for data, color, size in zip(data_points, colors, ss): x, y, z data ax.scatter(x, y, z, alpha0.4, ccolor, edgecolorsnone, ssize) ax.set_xlabel(Residual Sugar) ax.set_ylabel(Alcohol) ax.set_zlabel(Fixed Acidity) 利用气泡图和色调、深度、大小的概念来可视化 5 维数据气泡图灵感来源与上文所述一致。但是我们还可以看到以二氧化硫总量为指标的点数发现白葡萄酒的二氧化硫含量高于红葡萄酒。除了深度之外我们还可以使用分面和色调来表征这五个数据维度中的多个分类属性。其中表征大小的属性可以是数值表征甚至是类别但是我们可能要用它的数值表征来表征数据点大小。由于缺乏类别属性此处我们不作展示但是你可以在自己的数据集上试试。 # Visualizing 5-D mix data using bubble charts # leveraging the concepts of hue, size and depth fig plt.figure(figsize(8, 6)) ax fig.add_subplot(111, projection3d) t fig.suptitle(Wine Residual Sugar - Alcohol Content - Acidity - Total Sulfur Dioxide - Type, fontsize14) xs list(wines[residual sugar]) ys list(wines[alcohol]) zs list(wines[fixed acidity]) data_points [(x, y, z) for x, y, z in zip(xs, ys, zs)] ss list(wines[total sulfur dioxide]) colors [red if wt red else yellow for wt in list(wines[wine_type])] for data, color, size in zip(data_points, colors, ss): x, y, z data ax.scatter(x, y, z, alpha0.4, ccolor, edgecolorsnone, ssize) ax.set_xlabel(Residual Sugar) ax.set_ylabel(Alcohol) ax.set_zlabel(Fixed Acidity) 借助色调、分面、大小的概念和气泡图来可视化 5 维数据通常还有一个前文介绍的 5 维数据可视化的备选方法。当看到我们先前绘制的图时很多人可能会对多出来的维度深度困惑。该图重复利用了分面的特性所以仍可以在 2 维面板上绘制出来且易于说明和绘制。我们已经领略到多位数据可视化的复杂性如果还有人想问为何不增加维度让我们继续简单探索下可视化 6 维数据6-D 目前我们画得很开心我希望是如此我们继续在可视化中添加一个数据维度。我们将利用深度、色调、大小和形状及两个常规轴来描述所有 6 个数据维度。我们将利用散点图和色调、深度、形状、大小的概念来可视化 6 维数据。 # Visualizing 6-D mix data using scatter charts # leveraging the concepts of hue, size, depth and shape fig plt.figure(figsize(8, 6)) t fig.suptitle(Wine Residual Sugar - Alcohol Content - Acidity - Total Sulfur Dioxide - Type - Quality, fontsize14) ax fig.add_subplot(111, projection3d) xs list(wines[residual sugar]) ys list(wines[alcohol]) zs list(wines[fixed acidity]) data_points [(x, y, z) for x, y, z in zip(xs, ys, zs)] ss list(wines[total sulfur dioxide]) colors [red if wt red else yellow for wt in list(wines[wine_type])] markers [, if q high else x if q medium else o for q in list(wines[quality_label])] for data, color, size, mark in zip(data_points, colors, ss, markers): x, y, z data ax.scatter(x, y, z, alpha0.4, ccolor, edgecolorsnone, ssize, markermark) ax.set_xlabel(Residual Sugar) ax.set_ylabel(Alcohol) ax.set_zlabel(Fixed Acidity) 这可是在一张图上画出 6 维数据我们用形状表征葡萄酒的质量标注优质用方块标记一般用 x 标记差用圆标记用色调表示红酒的类型由深度和数据点大小确定的酸度表征总二氧化硫含量。这个解释起来可能有点费劲但是在试图理解多维数据的隐藏信息时最好结合一些绘图组件将其可视化。结合形状和 y 轴的表现我们知道高中档的葡萄酒的酒精含量比低质葡萄酒更高。结合色调和大小的表现我们知道白葡萄酒的总二氧化硫含量比红葡萄酒更高。结合深度和色调的表现我们知道白葡萄酒的酸度比红葡萄酒更低。结合色调和 x 轴的表现我们知道红葡萄酒的残糖比白葡萄酒更低。结合色调和形状的表现似乎白葡萄酒的高品质产量高于红葡萄酒。可能是由于白葡萄酒的样本量较大我们也可以用分面属性来代替深度构建 6 维数据可视化效果。 # Visualizing 6-D mix data using scatter charts # leveraging the concepts of hue, facets and size g sns.FacetGrid(wines, rowwine_type, colquality, huequality_label, size4) g.map(plt.scatter, residual sugar, alcohol, alpha0.5, edgecolork, linewidth0.5, swines[total sulfur dioxide]*2) fig g.fig fig.set_size_inches(18, 8) fig.subplots_adjust(top0.85, wspace0.3) fig.suptitle(Wine Type - Sulfur Dioxide - Residual Sugar - Alcohol - Quality Class - Quality Rating, fontsize14) l g.add_legend(titleWine Quality Class)借助色调、深度、面、大小的概念和散点图来可视化 6 维数据。因此在这种情况下我们利用分面和色调来表征三个分类属性并使用两个常规轴和大小来表征 6 维数据可视化的三个数值属性。四、结论数据可视化与科学一样重要。如果你看到这我很欣慰你能坚持看完这篇长文。我们的目的不是为了记住所有数据也不是给出一套固定的数据可视化规则。本文的主要目的是理解并学习高效的数据可视化策略尤其是当数据维度增大时。希望你以后可以用本文知识可视化你自己的数据集。原文链接 https://towardsdatascience.com/the-art-of-effective-visualization-of-multi-dimensional-data-6c7202990c57 转自机器之心 ![](https://mmbiz.qpic.cn/mmbiz_jpg/jupejmznDCicCEDfm4Q5koCraSm45XoTnY8A5RQMIFlLNVKlC8bo97y7Pibp6VwDZmUGebhLN3akM0R19icNU6tCw/640?wx_fmtjpeg) ---------------------------END--------------------------- 题外话感兴趣的小伙伴赠送全套Python学习资料包含面试题、简历资料等具体看下方。 CSDN大礼包全网最全《Python学习资料》免费赠送安全链接放心点击一、Python所有方向的学习路线 Python所有方向的技术点做的整理形成各个领域的知识点汇总它的用处就在于你可以按照下面的知识点去找对应的学习资源保证自己学得较为全面。二、Python必备开发工具工具都帮大家整理好了安装就可直接上手三、最新Python学习笔记当我学到一定基础有自己的理解能力的时候会去阅读一些前辈整理的书籍或者手写的笔记资料这些笔记详细记载了他们对一些技术点的理解这些理解是比较独到可以学到不一样的思路。四、Python视频合集观看全面零基础学习视频看视频学习是最快捷也是最有效果的方式跟着视频中老师的思路从基础到深入还是很容易入门的。五、实战案例纸上得来终觉浅要学会跟着视频一起敲要动手实操才能将自己的所学运用到实际当中去这时候可以搞点实战案例来学习。六、面试宝典简历模板 CSDN大礼包全网最全《Python学习资料》免费赠送安全链接放心点击若有侵权请联系删除

查看全文

http://www.yutouwan.com/news/375127/