当前位置: 首页 > news >正文

广西网站建设教程wordpress 图片管理插件

广西网站建设教程,wordpress 图片管理插件,淮南政务,织梦wap网站模板原标题#xff1a;用Python对哈利波特系列小说进行情感分析准备数据现有的数据是一部小说放在一个txt里#xff0c;我们想按照章节(列表中第一个就是章节1的内容#xff0c;列表中第二个是章节2的内容)进行分析#xff0c;这就需要用到正则表达式整理数据。比如我们先看看 …原标题用Python对哈利波特系列小说进行情感分析准备数据现有的数据是一部小说放在一个txt里我们想按照章节(列表中第一个就是章节1的内容列表中第二个是章节2的内容)进行分析这就需要用到正则表达式整理数据。比如我们先看看 01-Harry Potter and the Sorcerers Stone.txt 里的章节情况我们打开txt经过检索发现所有章节存在规律性表达[Chapter][空格][整数][换行符n][可能含有空格的英文标题][换行符n]我们先熟悉下正则使用这个设计一个模板pattern提取章节信息import reimport nltkraw_text open(data/01-Harry Potter and the Sorcerers Stone.txt).readpattern Chapter dn[a-zA-Z ]nre.findall(pattern, raw_text)[Chapter 1nThe Boy Who Livedn,Chapter 2nThe Vanishing Glassn,Chapter 3nThe Letters From No Onen,Chapter 4nThe Keeper Of The Keysn,Chapter 5nDiagon Alleyn,Chapter 7nThe Sorting Hatn,Chapter 8nThe Potions Mastern,Chapter 9nThe Midnight Dueln,Chapter 10nHalloweenn,Chapter 11nQuidditchn,Chapter 12nThe Mirror Of Erisedn,Chapter 13nNicholas Flameln,Chapter 14nNorbert the Norwegian Ridgebackn,Chapter 15nThe Forbidden Forestn,Chapter 16nThrough the Trapdoorn,Chapter 17nThe Man With Two Facesn]熟悉上面的正则表达式操作我们想更精准一些。我准备了一个test文本与实际小说中章节目录表达相似只不过文本更短更利于理解。按照我们的预期我们数据中只有5个章节那么列表的长度应该是5。这样操作后的列表中第一个内容就是章节1的内容列表中第二个内容是章节2的内容。import retest Chapter 1nThe Boy Who LivednMr. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much. They were the last people you’d expect to be involved in anything strange or mysterious, because they just didn’t hold with such nonsense.nMr. Dursley was the director of a firm called Grunnings,Chapter 2nThe Vanishing GlassnFor a second, Mr. Dursley didn’t realize what he had seen — then he jerked his head around to look again. There was a tabby cat standing on the corner of Privet Drive, but there wasn’t a map in sight. What could he have been thinking of? It must have been a trick of the light. Mr. Dursley blinked and stared at the cat.Chapter 3nThe Letters From No OnenThe traffic moved on and a few minutes later, Mr. Dursley arrived in the Grunnings parking lot, his mind back on drills.nMr. Dursley always sat with his back to the window in his office on the ninth floor. If he hadn’t, he might have found it harder to concentrate on drills that morning.Chapter 4nThe Keeper Of The KeysnHe didn’t know why, but they made him uneasy. This bunch were whispering excitedly, too, and he couldn’t see a single collecting tin.Chapter 5nDiagon AlleynIt was a few seconds before Mr. Dursley realized that the man was wearing a violet cloak. #获取章节内容列表(列表中第一个内容就是章节1的内容列表中第二个内容是章节2的内容)#为防止列表中有空内容这里加了一个条件判断保证列表长度与章节数预期一致chapter_contents [c for c in re.split(Chapter dn[a-zA-Z ]n, test) if c]chapter_contents[Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much. They were the last people you’d expect to be involved in anything strange or mysterious, because they just didn’t hold with such nonsense.nMr. Dursley was the director of a firm called Grunnings,n ,For a second, Mr. Dursley didn’t realize what he had seen — then he jerked his head around to look again. There was a tabby cat standing on the corner of Privet Drive, but there wasn’t a map in sight. What could he have been thinking of? It must have been a trick of the light. Mr. Dursley blinked and stared at the cat.n ,The traffic moved on and a few minutes later, Mr. Dursley arrived in the Grunnings parking lot, his mind back on drills.nMr. Dursley always sat with his back to the window in his office on the ninth floor. If he hadn’t, he might have found it harder to concentrate on drills that morning.n ,He didn’t know why, but they made him uneasy. This bunch were whispering excitedly, too, and he couldn’t see a single collecting tin. n ,It was a few seconds before Mr. Dursley realized that the man was wearing a violet cloak. ]能得到哈利波特的章节内容列表也就意味着我们可以做真正的文本分析了数据分析章节数对比import osimport reimport matplotlib.pyplot as pltcolors [#78C850, #A8A878,#F08030,#C03028,#6890F0, #A890F0,#A040A0]harry_potters [Harry Potter and the Sorcerers Stone.txt,Harry Potter and the Chamber of Secrets.txt,Harry Potter and the Prisoner of Azkaban.txt,Harry Potter and the Goblet of Fire.txt,Harry Potter and the Order of the Phoenix.txt,Harry Potter and the Half-Blood Prince.txt,Harry Potter and the Deathly Hallows.txt]#横坐标为小说名harry_potter_names [n.replace(Harry Potter and the , )[:-4]for n in harry_potters]#纵坐标为章节数chapter_nums []for harry_potter in harry_potters:file data/harry_potterraw_text open(file).readpattern Chapter dn[a-zA-Z ]nchapter_contents [c for c in re.split(pattern, raw_text) if c]chapter_nums.append(len(chapter_contents))#设置画布尺寸plt.figure(figsize(20, 10))#图的名字字体大小粗体plt.title(Chapter Number of Harry Potter, fontsize25, weightbold)#绘制带色条形图plt.bar(harry_potter_names, chapter_nums, colorcolors)#横坐标刻度上的字体大小及倾斜角度plt.xticks(rotation25, fontsize16, weightbold)plt.yticks(fontsize16, weightbold)#坐标轴名字plt.xlabel(Harry Potter Series, fontsize20, weightbold)plt.ylabel(Chapter Number, rotation25, fontsize20, weightbold)plt.show从上面可以看出哈利波特系列小说的后四部章节数据较多(这分析没啥大用处主要是练习)用词丰富程度如果说一句100个词的句子同时词语不带重样的那么用词的丰富程度为100。而如果说同样长度的句子只用到20个词语那么用词的丰富程度为100/205。import osimport reimport matplotlib.pyplot as pltfrom nltk import word_tokenizefrom nltk.stem.snowball importSnowballStemmerplt.style.use(fivethirtyeight)colors [#78C850, #A8A878,#F08030,#C03028,#6890F0, #A890F0,#A040A0]harry_potters [Harry Potter and the Sorcerers Stone.txt,Harry Potter and the Chamber of Secrets.txt,Harry Potter and the Prisoner of Azkaban.txt,Harry Potter and the Goblet of Fire.txt,Harry Potter and the Order of the Phoenix.txt,Harry Potter and the Half-Blood Prince.txt,Harry Potter and the Deathly Hallows.txt]#横坐标为小说名harry_potter_names [n.replace(Harry Potter and the , )[:-4]for n in harry_potters]#用词丰富程度richness_of_words []stemmer SnowballStemmer(english)for harry_potter in harry_potters:file data/harry_potterraw_text open(file).readwords word_tokenize(raw_text)words [stemmer.stem(w.lower) for w in words]wordset set(words)richness len(words)/len(wordset)richness_of_words.append(richness)#设置画布尺寸plt.figure(figsize(20, 10))#图的名字字体大小粗体plt.title(The Richness of Word in Harry Potter, fontsize25, weightbold)#绘制带色条形图plt.bar(harry_potter_names, richness_of_words, colorcolors)#横坐标刻度上的字体大小及倾斜角度plt.xticks(rotation25, fontsize16, weightbold)plt.yticks(fontsize16, weightbold)#坐标轴名字plt.xlabel(Harry Potter Series, fontsize20, weightbold)plt.ylabel(Richness of Words, rotation25, fontsize20, weightbold)plt.show情感分析哈利波特系列小说情绪发展趋势这里使用VADER,有现成的库vaderSentiment这里使用其中的polarity_scores函数可以得到neg:负面得分neu中性得分pos积极得分compound: 综合情感得分from vaderSentiment.vaderSentiment importSentimentIntensityAnalyzeranalyzer SentimentIntensityAnalyzertest i am so sorryanalyzer.polarity_scores(test){neg: 0.443, neu: 0.557, pos: 0.0, compound: -0.1513}import osimport reimport matplotlib.pyplot as pltfrom nltk.tokenize import sent_tokenizefrom vaderSentiment.vaderSentiment importSentimentIntensityAnalyzerharry_potters [Harry Potter and the Sorcerers Stone.txt,Harry Potter and the Chamber of Secrets.txt,Harry Potter and the Prisoner of Azkaban.txt,Harry Potter and the Goblet of Fire.txt,Harry Potter and the Order of the Phoenix.txt,Harry Potter and the Half-Blood Prince.txt,Harry Potter and the Deathly Hallows.txt]#横坐标为章节序列chapter_indexes []#纵坐标为章节情绪得分compounds []analyzer SentimentIntensityAnalyzerchapter_index 1for harry_potter in harry_potters:file data/harry_potterraw_text open(file).readpattern Chapter dn[a-zA-Z ]nchapters [c for c in re.split(pattern, raw_text) if c]#计算每个章节的情感得分for chapter in chapters:compound 0sentences sent_tokenize(chapter)for sentence in sentences:score analyzer.polarity_scores(sentence)compound score[compound]compounds.append(compound/len(sentences))chapter_indexes.append(chapter_index)chapter_index1#设置画布尺寸plt.figure(figsize(20, 10))#图的名字字体大小粗体plt.title(Average Sentiment of the Harry Potter, fontsize25, weightbold)#绘制折线图plt.plot(chapter_indexes, compounds, color#A040A0)#横坐标刻度上的字体大小及倾斜角度plt.xticks(rotation25, fontsize16, weightbold)plt.yticks(fontsize16, weightbold)#坐标轴名字plt.xlabel(Chapter, fontsize20, weightbold)plt.ylabel(Average Sentiment, rotation25, fontsize20, weightbold)plt.show曲线不够平滑为了熨平曲线波动自定义了一个函数import numpy as npimport osimport reimport matplotlib.pyplot as pltfrom nltk.tokenize import sent_tokenizefrom vaderSentiment.vaderSentiment importSentimentIntensityAnalyzer#曲线平滑函数def movingaverage(value_series, window_size):window np.ones(int(window_size))/float(window_size)return np.convolve(value_series, window, same)harry_potters [Harry Potter and the Sorcerers Stone.txt,Harry Potter and the Chamber of Secrets.txt,Harry Potter and the Prisoner of Azkaban.txt,Harry Potter and the Goblet of Fire.txt,Harry Potter and the Order of the Phoenix.txt,Harry Potter and the Half-Blood Prince.txt,Harry Potter and the Deathly Hallows.txt]#横坐标为章节序列chapter_indexes []#纵坐标为章节情绪得分compounds []analyzer SentimentIntensityAnalyzerchapter_index 1for harry_potter in harry_potters:file data/harry_potterraw_text open(file).readpattern Chapter dn[a-zA-Z ]nchapters [c for c in re.split(pattern, raw_text) if c]#计算每个章节的情感得分for chapter in chapters:compound 0sentences sent_tokenize(chapter)for sentence in sentences:score analyzer.polarity_scores(sentence)compound score[compound]compounds.append(compound/len(sentences))chapter_indexes.append(chapter_index)chapter_index1#设置画布尺寸plt.figure(figsize(20, 10))#图的名字字体大小粗体plt.title(Average Sentiment of the Harry Potter,fontsize25,weightbold)#绘制折线图plt.plot(chapter_indexes, compounds,colorred)plt.plot(movingaverage(compounds, 10),colorblack,linestyle:)#横坐标刻度上的字体大小及倾斜角度plt.xticks(rotation25,fontsize16,weightbold)plt.yticks(fontsize16,weightbold)#坐标轴名字plt.xlabel(Chapter,fontsize20,weightbold)plt.ylabel(Average Sentiment,rotation25,fontsize20,weightbold)plt.show全新打卡学习模式每天30分钟30天学会Python编程世界正在奖励坚持学习的人返回搜狐查看更多责任编辑
http://www.sadfv.cn/news/223724/

相关文章:

  • 网站维护中页面北京到广州火车时刻表查询
  • 完整网站模板下载手机网站建设多少钿
  • 上海闵行做网站网络seo首页
  • 网站建设任务书广告公司起名如何起
  • 青海网站建设怎么建设网站美化的目标
  • 网络营销导向企业网站建设广州商务网站建设电话
  • 高端手机网站 制作公司做一个主题wordpress
  • 专业网站定制哪家好网站自动更新
  • 编辑网站教程关于建设工程招标的网站
  • 曹县有没有做网站最旺的公司名称大全
  • 如何做公司自己的网站首页wordpress 美化
  • 泸州网站建设兼职常州网络公司网站
  • 武昌网站建设朋友圈广告投放价格表
  • 中卫网站网站建设2022年中国企业500强
  • 做英语题的网站做网站的语言叫什么
  • 集团网站建设思路海外推广都有哪些渠道
  • 如何加快百度收录网站在线制作文字
  • 做图赚钱的网站广告营销案例分析
  • 网站模板双语做计算机网站有哪些功能
  • 长沙品牌网站设计wordpress做表格插件
  • 网站改版需要怎么做优设网官网首页
  • 佛山网站制作咨询企业的漏沟设计公司
  • 网站建设图片logo辽宁工程技术大学电子信息网
  • 建一个网站需要什么条件网站进入
  • 金华市建设局婺城分局网站有没有做英语试题的网站
  • wix做的网站在国内访问不了国际快递网站建设
  • 企业网站建设合同方案wordpress飘花特效
  • 网站建设与管理 情况总结网站定位策划书
  • 温州 网站优化做seo的公司
  • 建新建设集团有限公司网站网络营销的理论基础