信阳网站公司,企业展厅设计要点,视差网站,化工外贸网站建设sklearn实战-乳腺癌细胞数据挖掘(博客主亲自录制视频教程) https://study.163.com/course/introduction.htm?courseId1005269003utm_campaigncommissionutm_sourcecp-400000000398149utm_mediumshare 数据统计分析联系:#xff31;#xff31;#xff1a;1005269003utm_campaigncommissionutm_sourcecp-400000000398149utm_mediumshare 数据统计分析联系: 英国酒精和香烟官网 http://lib.stat.cmu.edu/DASL/Stories/AlcoholandTobacco.html Story Name: Alcohol and TobaccoImage: Scatterplot of Alcohol vs. Tobacco, with Northern Ireland marked with a blue X. Story Topics: Consumer , HealthDatafile Name: Alcohol and TobaccoMethods: Correlation , Dummy variable , Outlier , Regression , ScatterplotAbstract: Data from a British government survey of household spending may be used to examine the relationship between household spending on tobacco products and alcholic beverages. A scatterplot of spending on alcohol vs. spending on tobacco in the 11 regions of Great Britain shows an overall positive linear relationship with Northern Ireland as an outlier. Northern Irelands influence is illustrated by the fact that the correlation between alcohol and tobacco spending jumps from .224 to .784 when Northern Ireland is eliminated from the dataset. This dataset may be used to illustrate the effect of a single influential observation on regression results. In a simple regression of alcohol spending on tobacco spending, tobacco spending does not appear to be a significant predictor of tobacco spending. However, including a dummy variable that takes the value 1 for Northern Ireland and 0 for all other regions results in significant coefficients for both tobacco spending and the dummy variable, and a high R-squared. 两个模块算出的R平方值一样的 # -*- coding: utf-8 -*-python3.0
Alcohol and Tobacco 酒精和烟草的关系
http://lib.stat.cmu.edu/DASL/Stories/AlcoholandTobacco.html
很多时候数据读写不一定是文件也可以在内存中读写。
StringIO顾名思义就是在内存中读写str。
要把str写入StringIO我们需要先创建一个StringIO然后像文件一样写入即可
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import statsmodels.formula.api as sm
from sklearn.linear_model import LinearRegression
from scipy import statslist_alcohol[6.47,6.13,6.19,4.89,5.63,4.52,5.89,4.79,5.27,6.08,4.02]
list_tobacco[4.03,3.76,3.77,3.34,3.47,2.92,3.20,2.71,3.53,4.51,4.56]
plt.plot(list_tobacco,list_alcohol,ro)
plt.ylabel(Alcohol)
plt.ylabel(Tobacco)
plt.title(Sales in Several UK Regions)
plt.show()datapd.DataFrame({Alcohol:list_alcohol,Tobacco:list_tobacco})result sm.ols(Alcohol ~ Tobacco, data[:-1]).fit()
print(result.summary())python2.7 # -*- coding: utf-8 -*-
#斯皮尔曼等级相关Spearman’s correlation coefficient for ranked data
import numpy as np
import scipy.stats as stats
from scipy.stats import f
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.stats.diagnostic import lillifors
import normality_checky[6.47,6.13,6.19,4.89,5.63,4.52,5.89,4.79,5.27,6.08]
x[4.03,3.76,3.77,3.34,3.47,2.92,3.20,2.71,3.53,4.51]
list_group[x,y]
samplelen(x)#数据可视化
plt.plot(x,y,ro)
#斯皮尔曼等级相关非参数检验
def Spearmanr(x,y):printuse spearmanr,Nonparametric tests#样本不一致时发出警告if len(x)!len(y):print warming,the samples are not equal!r,pstats.spearmanr(x,y)printspearman r**2:,r**2printspearman p:,pif sample500 and p0.05:printwhen sample 500p has no mean0.05printwhen sample 500p has mean#皮尔森 参数检验
def Pearsonr(x,y):printuse Pearson,parametric testsr,pstats.pearsonr(x,y)printpearson r**2:,r**2printpearson p:,pif sample30:printwhen sample 30,pearson has no mean#kendalltau非参数检验
def Kendalltau(x,y):printuse kendalltau,Nonparametric testsr,pstats.kendalltau(x,y)printkendalltau r**2:,r**2printkendalltau p:,p#选择模型
def mode(x,y):#正态性检验Normal_resultnormality_check.NormalTest(list_group)print normality result:,Normal_resultif len(list_group)2:Kendalltau(x,y)if Normal_resultFalse:Spearmanr(x,y)Kendalltau(x,y)if Normal_resultTrue: Pearsonr(x,y)mode(x,y)x[50,60,70,80,90,95]
y[500,510,530,580,560,1000]
use shapiro:
data are normal distributed
use shapiro:
data are not normal distributed
normality result: False
use spearmanr,Nonparametric tests
spearman r: 0.942857142857
spearman p: 0.00480466472303
use kendalltau,Nonparametric tests
kendalltau r: 0.866666666667
kendalltau p: 0.0145950349193#肯德尔系数测试
x[3,5,2,4,1]
y[3,5,2,4,1]
z[3,4,1,5,2]
h[3,5,1,4,2]
k[3,5,2,4,1]python2.7 # -*- coding: utf-8 -*-AuthorToby
QQ231469242all right reversed,no commercial use
normality_check.py
正态性检验脚本import scipy
from scipy.stats import f
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
# additional packages
from statsmodels.stats.diagnostic import lillifors#正态分布测试
def check_normality(testData):#20样本数50用normal test算法检验正态分布性if 20len(testData) 50:p_value stats.normaltest(testData)[1]if p_value0.05:printuse normaltestprint data are not normal distributedreturn Falseelse:printuse normaltestprint data are normal distributedreturn True#样本数小于50用Shapiro-Wilk算法检验正态分布性if len(testData) 50:p_value stats.shapiro(testData)[1]if p_value0.05:print use shapiro:print data are not normal distributedreturn Falseelse:print use shapiro:print data are normal distributedreturn Trueif 300len(testData) 50:p_value lillifors(testData)[1]if p_value0.05:print use lillifors:print data are not normal distributedreturn Falseelse:print use lillifors:print data are normal distributedreturn Trueif len(testData) 300: p_value stats.kstest(testData,norm)[1]if p_value0.05:print use kstest:print data are not normal distributedreturn Falseelse:print use kstest:print data are normal distributedreturn True#对所有样本组进行正态性检验
def NormalTest(list_groups):for group in list_groups:#正态性检验statuscheck_normality(group)if statusFalse :return Falsereturn True
group1[2,3,7,2,6]
group2[10,8,7,5,10]
group3[10,13,14,13,15]
list_groups[group1,group2,group3]
list_totalgroup1group2group3
#对所有样本组进行正态性检验
NormalTest(list_groups)python风控评分卡建模和风控常识(博客主亲自录制视频教程) https://study.163.com/course/introduction.htm?courseId1005214003utm_campaigncommissionutm_sourcecp-400000000398149utm_mediumshare 转载于:https://www.cnblogs.com/webRobot/p/7140749.html