当前位置：首页 > news >正文

阿里巴巴网站怎样做的漂亮wordpress加载latex慢

news 2026/1/9 20:35:31

阿里巴巴网站怎样做的漂亮,wordpress加载latex慢,软件代理,优化网站seo方案AI 前线导读#xff1a;随着互联网行业的井喷式发展#xff0c;数据规模呈现爆炸式增长。大数据中蕴含了巨大的价值#xff0c;但同时也来了很 “信息过载” 的问题。推荐系统作为一个广泛应用的信息过滤系统#xff0c;在很多领域取得了巨大的成功。在电子商务上#xff…AI 前线导读随着互联网行业的井喷式发展数据规模呈现爆炸式增长。大数据中蕴含了巨大的价值但同时也来了很 “信息过载” 的问题。推荐系统作为一个广泛应用的信息过滤系统在很多领域取得了巨大的成功。在电子商务上AmazoneBay阿里巴巴推荐系统为用户提供个性化产品发掘用户潜在需求。那些电商的 “猜你喜欢” 其实就是推荐系统的应用。简单的说推荐系统的目标是根据用户的偏好为其找到并推荐可能感兴趣的项目。当今机器学习中最有价值的应用之一就是推荐系统。Amazon 将其 35% 的收入归功于其推荐系统。译注关于 35% 这一数据详见《The Amazon Recommendations Secret to Selling More Online》http://rejoiner.com/resources/amazon-recommendations-secret-selling-online/评估是研究和开发任何推荐系统的重要组成部分。根据你的业务和可用数据有很多方法可以评估推荐系统。在本文中我们会尝试一些评估方法。评级预测在我上一篇文章中《Building and Testing Recommender Systems With Surprise, Step-By-Step 》https://towardsdatascience.com/building-and-testing-recommender-systems-with-surprise-step-by-step-d4ba702ef80b使用 Surprise 构建和测试推荐系统Surprise 以各种机器学习算法为中心来预测用户对商品条目的评级即评级预测。它要求用户提供明确的反馈比如让用户在购买图书后对其进行 0~10 星的评级。然后我们用这些数据来建立用户兴趣的档案。问题是不是每个人都愿意留下评级因此数据往往是稀疏的就像我们之前看到的 Book-Crossing 数据集一样译注Book-Crossing 数据集可见 http://www2.informatik.uni-freiburg.de/~cziegler/BX/大多数推荐系统是这样试图预测的如果用户对相应的图书进行评级的话他们会在里面放入什么内容。如果 “NaN” 太多那么推荐系统就没有足够的数据来理解用户究竟喜欢什么。但是如果你能说服用户给你评级那么明确的评级是很好的。因此如果你拥有大量的数据和用户评级那么评估指标应该为 RMSE 或 MAE。让我们展示一个带有 Surprise 库的 Movielens 数据集示例。 movies pd.read_csv(movielens_data/movies.csv)ratings pd.read_csv(movielens_data/ratings.csv)df pd.merge(movies, ratings, onmovieId, howinner)reader Reader(rating_scale(0.5, 5))data Dataset.load_from_df(df[[userId, title, rating]], reader)trainSet, testSet train_test_split(data, test_size.25, random_state0)algo SVD(random_state0)algo.fit(trainSet)predictions algo.test(testSet)def MAE(predictions): return accuracy.mae(predictions, verboseFalse)def RMSE(predictions): return accuracy.rmse(predictions, verboseFalse) print(\u0026quot;RMSE: \u0026quot;, RMSE(predictions))print(\u0026quot;MAE: \u0026quot;, MAE(predictions)) ratings_prediction.pyTop-N 从网上购物网站到视频门户网站Top-N 推荐系统的身影无处不在。它们为用户提供他们可能感兴趣的 N 个项目的排名列表以鼓励用户浏览、下单购买。译注Top-N 推荐系统的介绍可观看 YouTube 视频https://www.youtube.com/watch?vEeXBdQYs0CQAmazon 的推荐系统之一就是 “Top-N” 系统它可以为个人提供顶级结果列表Amazon 的 “Top-N” 推荐包括 9 页第一页有 6 项。一个好的推荐系统应该能够识别某个用户感兴趣的一组 N 个条目。因为我很少在 Amazon 上买书因此我的 “Top-N” 就差得很远。换言之我可能只会点击或阅读我的 “Top-N” 列表中的某本书。下面的脚本为测试集中的每个用户生成了前 10 条推荐。 def GetTopN(predictions, n10, minimumRating4.0): topN defaultdict(list) for userID, movieID, actualRating, estimatedRating, _ in predictions: if (estimatedRating \u0026gt; minimumRating): topN[int(userID)].append((int(movieID), estimatedRating)) for userID, ratings in topN.items(): ratings.sort(keylambda x: x[1], reverseTrue) topN[int(userID)] ratings[:n] return topN LOOCV LeaveOneOut(n_splits1, random_state1)for trainSet, testSet in LOOCV.split(data): # Train model without left-out ratings algo.fit(trainSet) # Predicts ratings for left-out ratings only leftOutPredictions algo.test(testSet) # Build predictions for all ratings not in the training set bigTestSet trainSet.build_anti_testset() allPredictions algo.test(bigTestSet) # Compute top 10 recs for each user topNPredicted GetTopN(allPredictions, n10) top-N.py下面是我们预测的 userId 2 和 userId 3 的前 10 项。命中率让我们看看生成的前 10 项推荐究竟有多好。为评估前 10 项我们使用命中率这一指标也就是说如果用户对我们推荐的前 10 项中的一个进行了评级那么我们就认为这是一个 “命中”。计算单个用户命中率的过程如下在训练数据中查找此用户历史记录中的所有项。有意删除其中一项条目使用留一法一种交叉验证方法。使用所有其他项目为推荐系统提供信息并要求提供前 10 项推荐。如果删除的条目出现在前 10 项推荐中那么它就是命中的。如果没有那就不算命中。 def HitRate(topNPredicted, leftOutPredictions): hits 0 total 0 # For each left-out rating for leftOut in leftOutPredictions: userID leftOut[0] leftOutMovieID leftOut[1] # Is it in the predicted top 10 for this user? hit False for movieID, predictedRating in topNPredicted[int(userID)]: if (int(leftOutMovieID) int(movieID)): hit True break if (hit) : hits 1 total 1 # Compute overall precision return hits/totalprint(\u0026quot;\Hit Rate: \u0026quot;, HitRate(topNPredicted, leftOutPredictions)) HitRate.py系统的总命中率是命中数除以测试用户数。它衡量的是我们推荐删除评级的频率越高越好。如果命中率非常低的话这只是意味着我们没有足够的数据可供使用。就像 Amazon 对我来说命中率就非常低因为它没有足够的我购买图书的数据。基于评级值的命中率我们还可以通过预测的评级值来细分命中率。在理想情况下我们希望预测用户喜欢的电影因此我们关心的是高评级值而不是低评级值。 def RatingHitRate(topNPredicted, leftOutPredictions): hits defaultdict(float) total defaultdict(float) # For each left-out rating for userID, leftOutMovieID, actualRating, estimatedRating, _ in leftOutPredictions: # Is it in the predicted top N for this user? hit False for movieID, predictedRating in topNPredicted[int(userID)]: if (int(leftOutMovieID) movieID): hit True break if (hit) : hits[actualRating] 1 total[actualRating] 1 # Compute overall precision for rating in sorted(hits.keys()): print(rating, hits[rating] / total[rating])print(\u0026quot;Hit Rate by Rating value: \u0026quot;)RatingHitRate(topNPredicted, leftOutPredictions) RatingHitRate.py我们的命中率细分正是我们所期望的评级值为 5 的命中率远高于 4 或 3。越高越好。累积命中率因为我们关心更高的评级我们可以忽略低于 4 的预测评级来计算 \u0026gt; 4 的评级命中率。 def CumulativeHitRate(topNPredicted, leftOutPredictions, ratingCutoff0): hits 0 total 0 # For each left-out rating for userID, leftOutMovieID, actualRating, estimatedRating, _ in leftOutPredictions: # Only look at ability to recommend things the users actually liked... if (actualRating \u0026gt; ratingCutoff): # Is it in the predicted top 10 for this user? hit False for movieID, predictedRating in topNPredicted[int(userID)]: if (int(leftOutMovieID) movieID): hit True break if (hit) : hits 1 total 1 # Compute overall precision return hits/totalprint(\u0026quot;Cumulative Hit Rate (rating \u0026gt; 4): \u0026quot;, CumulativeHitRate(topNPredicted, leftOutPredictions, 4.0)) CumulativeHitRate.py越高越好。平均对等命中排名Average Reciprocal Hit RankingARHR 常用于 Top-N 推荐系统排名评估的指标只考虑第一个相关结果出现的地方。我们在推荐用户排名靠前而不是靠后的产品获得了更多的好评。越高越好。 def AverageReciprocalHitRank(topNPredicted, leftOutPredictions): summation 0 total 0 # For each left-out rating for userID, leftOutMovieID, actualRating, estimatedRating, _ in leftOutPredictions: # Is it in the predicted top N for this user? hitRank 0 rank 0 for movieID, predictedRating in topNPredicted[int(userID)]: rank rank 1 if (int(leftOutMovieID) movieID): hitRank rank break if (hitRank \u0026gt; 0) : summation 1.0 / hitRank total 1 return summation / totalprint(\u0026quot;Average Reciprocal Hit Rank: \u0026quot;, AverageReciprocalHitRank(topNPredicted, leftOutPredictions))view rawAverageReciprocalHitRank.py hosted with ❤ by GitHub AverageReciprocalHitRank.py你的第一个真实推荐系统可能质量很低哪怕是成熟系统用于新用户的表现也是一样。但是这仍然比没有推荐系统要好多得多。推荐系统的目的之一就是在推荐系统中了解用户 / 新用户的偏好这样他们就可以开始从系统中接收准确的个性化推荐。然而如果你刚刚起步的话那么你的网站就是全新的这时候推荐系统并不能为任何人提供个性化的推荐因为这时候并没有任何人的评价。然后这就变成了一个系统引导问题。译注有关系统引导问题可参阅《Learning Preferences of New Users in RecommenderSystems: An Information Theoretic Approach》https://www.kdd.org/exploration_files/WebKDD08-Al-Rashid.pdf本文的Jupyter Notebook 可以在 Github 上找到https://github.com/susanli2016/Machine-Learning-with-Python/blob/master/Movielens Recommender Metrics.ipynb。参考文献Building Recommender Systems with Machine Learning and AI《使用机器学习和人工智能构建推荐系统》https://learning.oreilly.com/videos/building-recommender-systems/9781789803273 原文链接https://towardsdatascience.com/evaluating-a-real-life-recommender-system-error-based-and-ranking-based-84708e3285b

查看全文

http://www.sadfv.cn/news/328048/