做学校和企业对接的网站,html网站模板怎么用,网站维护基本概念认知,百度一下你就知道移动首页自编码可以跟PCA 一样#xff0c;给特征属性降维
一些matlab函数 bsxfun:Cbsxfun(fun,A,B)表达的是两个数组A和B间元素的二值操作#xff0c;fun是函数句柄或者m文件#xff0c;或者是内嵌的函数。在实际使用过程中fun有很多选择比如说加#xff0c;减等#xff0c;前面需…自编码可以跟PCA 一样给特征属性降维
一些matlab函数 bsxfun:Cbsxfun(fun,A,B)表达的是两个数组A和B间元素的二值操作fun是函数句柄或者m文件或者是内嵌的函数。在实际使用过程中fun有很多选择比如说加减等前面需要使用符号’’.一般情况下A和B需要尺寸大小相同如果不相同的话则只能有一个维度不同同时A和B中在该维度处必须有一个的维度为1。比如说bsxfun(minus, A, mean(A))其中A和mean(A)的大小是不同的这里的意思需要先将mean(A)扩充到和A大小相同然后用A的每个元素减去扩充后的mean(A)对应元素的值。rand生成均匀分布的伪随机数。分布在0~1之间主要语法rand(m,n)生成m行n列的均匀分布的伪随机数rand(m,n,double)生成指定精度的均匀分布的伪随机数参数还可以是singlerand(RandStream,m,n)利用指定的RandStream(我理解为随机种子)生成伪随机数randn生成标准正态分布的伪随机数均值为0方差为1。主要语法和上面一样randi生成均匀分布的伪随机整数 主要语法randiiMax在闭区间0iMax生成均匀分布的伪随机整数 randiiMaxmn在闭区间0iMax生成mXn型随机矩阵r randi([iMin,iMax],m,n)在闭区间iMiniMax生成mXn型随机矩阵exist:测试参数是否存在比如说exist(opt_normalize, var)表示检测变量opt_normalize是否存在其中的’var’表示变量的意思。colormap:设置当前常见的颜色值表。floorfloor(A):取不大于A的最大整数。ceil:ceil(A):取不小于A的最小整数。imagesc:imagesc和image类似可以用于显示图像。比如imagesc(array,EraseMode,none,[-1 1])这里的意思是将array中的数据线性映射到[-1,1]之间然后使用当前设置的颜色表进行显示。此时的[-1,1]充满了整个颜色表。背景擦除模式设置为node表示不擦除背景。repmat:该函数是扩展一个矩阵并把原来矩阵中的数据复制进去。比如说B repmat(A,m,n)就是创建一个矩阵BB中复制了共m*n个A矩阵因此B矩阵的大小为[size(A,1)*m size(A,2)*n]。---------------------------------
matlab中的用法
问f(x)acos(x)表示什么意思其中代表什么
答表示f为函数句柄是定义句柄的运算符。f(x)acos(x) 相当于建立了一个函数文件
% f.m
function yf(x)
yacos(x);若有下列语句xsqual(x)1/2.*(x-1/2)1.*(x-1/28x1/2)1.2.*(x-1/2);
则相当于建立了一个函数文件
% xsqual.m
function yxsqual(x)
y1/2.*(x-1/2)1.*(x-1/28x1/2)1.2.*(x-1/2);函数句柄的好处①提高运行速度。因为matlab对函数的调用每次都是要搜索所有的路径从set path中我们可以看到路径是非常的多的所以如果一个函数在你的程序中需要经常用到的话使用函数句柄对你的速度会有提高的。②使用可以与变量一样方便。比如说我再这个目录运行后创建了本目录的一个函数句柄当我转到其他的目录下的时候创建的函数句柄还是可以直接调用的而不需要把那个函数文件拷贝过来。因为你创建的function handles中已经包含了路径。使用函数句柄的作用
不使用函数句柄的情况下对函数多次调用每次都要为该函数进行全面的路径搜索直接影响计算速度借助句柄可以完全避免这种时间损耗。也就是直接指定了函数的指针。函数句柄就像一个函数的名字有点类似于C程序中的引用。 重点公式回顾 公式1 公式2 公式3 公式4 公式5 公式6 公式7 公式8 公式9 公式10 公式11 反向传播推导过程中第l层第i个节点残差的推导过程
教程中反向传播算法的推导中对于第3.步的推导(ng并没有在教程中给出推导但是译者进行了推导)我用了不同于译者的推导过程
教程回顾及译者对第3步的推导 实验基础 其实实现该功能的主要步骤还是需要计算出网络的损失函数以及其偏导数. 1. 计算出网络每个节点的输入值即程序中的z值和输出值即程序中的a值a是z的sigmoid函数值。 2. 利用z值和a值计算出网络每个节点的误差值即程序中的delta值。 3. 这样可以利用上面计算出的每个节点的azdelta来表达出系统的损失函数以及损失函数的偏导数。 其实步骤1是前向进行的也就是说按照输入层——隐含层——输出层的方向进行计算。而步骤2是方向进行的这也是该算法叫做BP算法的来源即每个节点的误差值是按照输出层——隐含层——输入层方向进行的。 步骤 1.产生训练集。从10张512*512的图片中随机选择10000张8*8的小图块然后再把它归一化得到训练集patches。具体见程序 sampleIMAGES.m 2.计算出代价函数 Jsparse(W,b) 及其梯度。具体见程序sparseAutoencoderCost.m。 3.通过函数 computeNumericalGradient.m计算出大概梯度EPSILON 10-4然后通过函数checkNumericalGradient.m检查上一步写的计算梯度的代码是否正确。首先通过计算函数 在点[410]处的梯度对比用computeNumericalGradient.m中的方法计算该函数的梯度这两者梯度的差值小于10-9就代表computeNumericalGradient.m中方法是正确的。然后用computeNumericalGradient.m中方法计算代价函数 Jsparse(W,b) 的梯度对比用sparseAutoencoderCost.m中的方法计算代价函数 Jsparse(W,b) 的梯度如果这两者梯度的差值小于10-9就证明sparseAutoencoderCost.m中方法是正确的。 4.训练稀疏自动编码器。用的 L-BFGS算法注意这个算法不能将它用于商业用途若用与商业用途的话可以使用fminlbfgs函数他比L-BFGS慢但可用于商业用途具体见文件夹 minFunc。另外初始化参数矩阵θ包含W(1),W(2),b(1),b(2)时W(1),W(2)的初始值是从 中随机均匀分布产生其中 nin是隐藏层神经元个数 nout 是输出层神经元个数。b(1),b(2)初始化为0. 5.可视化结果。点击train.m运行总程序训练稀疏自动编码器得到可视化结果。把产生的权重结果可视化通过它我们能够知道该算法究竟从图片中学习了哪些特征。 代码及注释 train.m 1调用sampleIMAGES函数从已知图像中扣取多个图像块儿 2调用display_network函数以网格的形式随机显示多个扣取的图像块儿 3梯度校验该部分的目的是测试函数是否正确可以由单独的函数checkSparseAutoencoderCost实现 ①利用sparseAutoencoderCost函数计算网路的代价函数和梯度值 ②利用computeNumericalGradient函数计算梯度值这里要利用checkNumericalGradient函数验证该梯度计算函数是否正确 ③比较①和②的梯度计算结果判断编写的sparseAutoencoderCost函数是否正确 如果sparseAutoencoderCost函数是正确的那么在实际训练中不需要运行checkSparseAutoencoderCost 4利用L-BFGS方法对网络进行训练从而得到最优化的网络的权值和偏执项 5对训练结果进行可视化 %http://blog.csdn.net/jiandanjinxin/article/details/72875977%% CS294A/CS294W Programming Assignment Starter Code% Instructions
% ------------
%
% This file contains code that helps you get started on the
% programming assignment. You will need to complete the code in sampleIMAGES.m,
% sparseAutoencoderCost.m and computeNumericalGradient.m.
% For the purpose of completing the assignment, you do not need to
% change the code in this file.
%
%%
%% STEP 0: Here we provide the relevant parameters values that will
% allow your sparse autoencoder to get good filters; you do not need to
% change the parameters below.
%第0步提供可得到较好滤波器的相关参数值不得改变以下参数
visibleSize 8*8; % number of input units 输入层单元数
hiddenSize 25; % number of hidden units 隐藏层单元数
sparsityParam 0.01; % desired average activation of the hidden units.稀疏值% (This was denoted by the Greek alphabet rho, which looks like a lower-case p,% in the lecture notes).
lambda 0.0001; % weight decay parameter 权重衰减系数
beta 3; % weight of sparsity penalty term 稀疏值惩罚项的权重 %%
%% STEP 1: Implement sampleIMAGES 第1步实现图片采样
%实现图片采样后函数display_network从训练集中随机显示200张
% After implementing sampleIMAGES, the display_network command should
% display a random sample of 200 patches from the dataset
%从10000张中随机选择200张显示
patches sampleIMAGES;
figure
display_network(patches(:,randi(size(patches,2),200,1)),8)
title(sampleIMAGES)
%%为产生一个200维的列向量每一维的值为0~10000中的随机数说明是随机取200个patch来显示% Obtain random parameters theta 初始化参数向量theta
theta initializeParameters(hiddenSize, visibleSize);%%
%% STEP 2: Implement sparseAutoencoderCost
%在计算代价函数时可以一次计算其所有的元素项值均方差项、权重衰减项、惩罚项但是一步一步地计算各元素项值
% 然后每步完成后运行梯度检验的方法可能会更容易实现建议按照下面的步骤来实现函数sparseAutoencoderCost
% You can implement all of the components (squared error cost, weight decay term,
% sparsity penalty) in the cost function at once, but it may be easier to do
% it step-by-step and run gradient checking (see STEP 3) after each step. We
% suggest implementing the sparseAutoencoderCost function using the following steps:
%
% (a) Implement forward propagation in your neural network, and implement the
% squared error term of the cost function. Implement backpropagation to
% compute the derivatives. Then (using lambdabeta0), run Gradient Checking
% to verify that the calculations corresponding to the squared error cost
% term are correct.实现神经网络中的前向传播和代价函数中的均方差项。通过反向传导计算偏导数。
% 然后运行梯度检验法来检查均方差项是否计算错误。
%
% (b) Add in the weight decay term (in both the cost function and the derivative
% calculations), then re-run Gradient Checking to verify correctness.
%在代价函数和偏导数计算中加入权重衰减项然后运行梯度检验法来检查其正确性。
% (c) Add in the sparsity penalty term, then re-run Gradient Checking to
% verify correctness.加入惩罚项然后运行梯度检验法来检查其正确性。
%
% Feel free to change the training settings when debugging your
% code. (For example, reducing the training set size or
% number of hidden units may make your code run faster; and setting beta
% and/or lambda to zero may be helpful for debugging.) However, in your
% final submission of the visualized weights, please use parameters we
% gave in Step 0 above.
% 计算代价函数和梯度
[cost, grad] sparseAutoencoderCost(theta, visibleSize, hiddenSize, lambda, ...sparsityParam, beta, patches);%%
%% STEP 3: Gradient Checking
%
% Hint: If you are debugging your code, performing gradient checking on smaller models
% and smaller training sets (e.g., using only 10 training examples and 1-2 hidden
% units) may speed things up.% First, lets make sure your numerical gradient computation is correct for a
% simple function. After you have implemented computeNumericalGradient.m,
% run the following:
checkNumericalGradient();% Now we can use it to check your cost function and derivative calculations
% for the sparse autoencoder.
% 利用近似方法计算梯度要调用自编码器的代价函数计算程序
numgrad computeNumericalGradient( (x) sparseAutoencoderCost(x, visibleSize, ...hiddenSize, lambda, ...sparsityParam, beta, ...patches), theta);% Use this to visually compare the gradients side by side
% 比较cost函数计算得到的梯度和由近似得到的梯度
disp( numgrad grad)
disp([numgrad grad]); % Compare numerically computed gradients with the ones obtained from backpropagation
diff norm(numgrad-grad)/norm(numgradgrad);
disp(diff); % Should be small. In our implementation, these values are% usually less than 1e-9.% When you got this working, Congratulations!!! %%
%% STEP 4: After verifying that your implementation of
% sparseAutoencoderCost is correct, You can start training your sparse
% autoencoder with minFunc (L-BFGS).% Randomly initialize the parameters
theta initializeParameters(hiddenSize, visibleSize);% Use minFunc to minimize the function
addpath minFunc/
options.Method lbfgs; % Here, we use L-BFGS to optimize our cost% function. Generally, for minFunc to work, you% need a function pointer with two outputs: the% function value and the gradient. In our problem,% sparseAutoencoderCost.m satisfies this.
options.maxIter 400; % Maximum number of iterations of L-BFGS to run
options.display on;% opttheta是整个神经网络的权值和偏执项构成的向量
[opttheta, cost] minFunc( (p) sparseAutoencoderCost(p, ...visibleSize, hiddenSize, ...lambda, sparsityParam, ...beta, patches), ...theta, options);%%
%% STEP 5: Visualization
%%第一层的权值矩阵
W1 reshape(opttheta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);
figure;
display_network(W1, 12)
title(Visiualization of Weight1)
print -djpeg weights.jpg % save the visualization to a file sampleIMAGES.m
function patches sampleIMAGES()
% sampleIMAGES
% Returns 10000 patches for trainingload IMAGES; % load images from disk
figure;
imshow3D(IMAGES)
patchsize 8; % well use 8x8 patches
numpatches 10000;% Initialize patches with zeros. Your code will fill in this matrix--one
% column per patch, 10000 columns.
patches zeros(patchsize*patchsize, numpatches);%% ---------- YOUR CODE HERE --------------------------------------
% Instructions: Fill in the variable called patches using data
% from IMAGES.
%
% IMAGES is a 3D array containing 10 images
% For instance, IMAGES(:,:,6) is a 512x512 array containing the 6th image,
% and you can type imagesc(IMAGES(:,:,6)), colormap gray; to visualize
% it. (The contrast on these images look a bit off because they have
% been preprocessed using using whitening. See the lecture notes for
% more details.) As a second example, IMAGES(21:30,21:30,1) is an image
% patch corresponding to the pixels in the block (21,21) to (30,30) of
% Image 1tic
image_sizesize(IMAGES);
irandi(image_size(1)-patchsize1,1,numpatches);%生成元素值随机为大于0且小于image_size(1)-patchsize1的1行numpatches矩阵
jrandi(image_size(2)-patchsize1,1,numpatches);
krandi(image_size(3),1,numpatches);
for num1:numpatchespatches(:,num)reshape(IMAGES(i(num):i(num)patchsize-1,j(num):j(num)patchsize-1,k(num)),1,patchsize*patchsize);
end
toc%% ---------------------------------------------------------------
% For the autoencoder to work well we need to normalize the data
% Specifically, since the output of the network is bounded between [0,1]
% (due to the sigmoid activation function), we have to make sure
% the range of pixel values is also bounded between [0,1]
patches normalizeData(patches);end%% ---------------------------------------------------------------
function patches normalizeData(patches)% Squash data to [0.1, 0.9] since we use sigmoid as the activation
% function in the output layer% Remove DC (mean of images). 把patches数组中的每个元素值都减去mean(patches)
patches bsxfun(minus, patches, mean(patches));% Truncate to /-3 standard deviations and scale to -1 to 1
pstd 3 * std(patches(:));%把patches的标准差变为其原来的3倍
patches max(min(patches, pstd), -pstd) / pstd;
%因为根据3sigma法则95%以上的数据都在该区域内
% 这里转换后将数据变到了-1到1之间% Rescale from [-1,1] to [0.1,0.9]
patches (patches 1) * 0.4 0.1;end
sparseAutoencoderCost.m
function [cost,grad] sparseAutoencoderCost(theta, visibleSize, hiddenSize, ...lambda, sparsityParam, beta, data)
% 计算网络的代价函数和梯度
% visibleSize: the number of input units (probably 64) 输入层神经单元节点数
% hiddenSize: the number of hidden units (probably 25) 隐藏层神经单元节点数
% lambda: weight decay parameter权重衰减系数
% sparsityParam: The desired average activation for the hidden units (denoted in the lecture
% 稀疏性参数 notes by the greek alphabet rho, which looks like a lower-case p).
% beta: weight of sparsity penalty term稀疏惩罚项的权重
% data: Our 64x10000 matrix containing the training data. So, data(:,i) is the i-th training example.
% 训练集64x10000
%theta参数向量包含W1、W2、b1、b2
% The input theta is a vector (because minFunc expects the parameters to be a vector).
% We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this
% follows the notation convention of the lecture notes.
%%将长向量转换成每一层的权值矩阵和偏置向量值
W1 reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);
W2 reshape(theta(hiddenSize*visibleSize1:2*hiddenSize*visibleSize), visibleSize, hiddenSize);
b1 theta(2*hiddenSize*visibleSize1:2*hiddenSize*visibleSizehiddenSize);
b2 theta(2*hiddenSize*visibleSizehiddenSize1:end);% Cost and gradient variables (your code needs to compute these values).
% Here, we initialize them to zeros.
cost 0;
W1grad zeros(size(W1));
W2grad zeros(size(W2));
b1grad zeros(size(b1));
b2grad zeros(size(b2));%% ---------- YOUR CODE HERE --------------------------------------
% Instructions: Compute the cost/optimization objective J_sparse(W,b) for the Sparse Autoencoder,
% and the corresponding gradients W1grad, W2grad, b1grad, b2grad.
%
% W1grad, W2grad, b1grad and b2grad should be computed using backpropagation.
% Note that W1grad has the same dimensions as W1, b1grad has the same dimensions
% as b1, etc. Your code should set W1grad to be the partial derivative of J_sparse(W,b) with
% respect to W1. I.e., W1grad(i,j) should be the partial derivative of J_sparse(W,b)
% with respect to the input parameter W1(i,j). Thus, W1grad should be equal to the term
% [(1/m) \Delta W^{(1)} \lambda W^{(1)}] in the last block of pseudo-code in Section 2.2
% of the lecture notes (and similarly for W2grad, b1grad, b2grad).
%
% Stated differently, if we were using batch gradient descent to optimize the parameters,
% the gradient descent update to W1 would be W1 : W1 - alpha * W1grad, and similarly for W2, b1, b2.
% %% 1.前向传播forward propagation
disp(公式1)
data_sizesize(data);
active_value2repmat(b1,1,data_size(2));
active_value3repmat(b2,1,data_size(2));
active_value2sigmoid(W1*dataactive_value2); %第2层激活值
active_value3sigmoid(W2*active_value2active_value3); %第3层激活值
%% 2.计算代价函数computing error term and cost
ave_squaresum(sum((active_value3-data).^2)./2)/data_size(2);%均方差项误差项所有样本代价函数均值
weight_decaylambda/2*(sum(sum(W1.^2))sum(sum(W2.^2))); %权重衰减项,所有权值项平方和
disp(公式2p_real)
p_realsum(active_value2,2)./data_size(2); %对active_value2每行求和再平均得到每个隐藏单元的平均活跃度25行1列
p_pararepmat(sparsityParam,hiddenSize,1); %稀疏性参数25行1列
sparsitybeta.*sum(p_para.*log(p_para./p_real)(1-p_para).*log((1-p_para)./(1-p_real)));%惩罚项所有隐藏层的神经元相对熵之和 括号内为公式3
disp(公式4公式5cost)
costave_squareweight_decaysparsity; %代价函数disp(公式7delta3)
delta3(active_value3-data).*(active_value3).*(1-active_value3); %第3层残差
average_sparsityrepmat(sum(active_value2,2)./data_size(2),1,data_size(2));%每个隐藏单元的平均活跃度25行10000列default_sparsityrepmat(sparsityParam,hiddenSize,data_size(2)); %稀疏性参数25行10000列
disp(公式6sparsity_penalty)
sparsity_penaltybeta.*(-(default_sparsity./average_sparsity)((1-default_sparsity)./(1-average_sparsity)));
disp(公式8delta2)
delta2(W2*delta3sparsity_penalty).*((active_value2).*(1-active_value2));%第2层残差这里加入了稀疏项
%% 3.反向传导backword propagation
% 计算代价函数对各层权值和偏执项的梯度
W2graddelta3*active_value2./data_size(2)lambda.*W2; %W2梯度
W1graddelta2*data./data_size(2)lambda.*W1; %W1梯度
b2gradsum(delta3,2)./data_size(2); %b2梯度
b1gradsum(delta2,2)./data_size(2); %b1梯度%-------------------------------------------------------------------
% After computing the cost and gradient, we will convert the gradients back
% to a vector format (suitable for minFunc). Specifically, we will unroll
% your gradient matrices into a vector.grad [W1grad(:) ; W2grad(:) ; b1grad(:) ; b2grad(:)];end%%% %% 前向传播算法
% a1data;
% z2bsxfun(plus,W1*a1,b1);
% a2sigmoid(z2);
% z3bsxfun(plus,W2*a2,b2);
% a3sigmoid(z3);
%
% %% 计算网络误差
% % 误差项J1所有样本代价函数均值
% ydata; % 网络的理想输出值
% Eisum((a3-y).^2)/2; %每一个样本的代价函数
% J1sum(Ei)/m;
% % 正则化项J2所有权值项平方和
% J2sum(W1(:).^2)sum(W2(:).^2);
% % 稀疏项J3所有隐藏层的神经元相对熵之和
% rho_hatsum(a2,2)/m;
% KLsum(sparsityParam*log(sparsityParam./rho_hat)...
% (1-sparsityParam)*log((1-sparsityParam)./(1-rho_hat)));
% J3KL;
% % 网络的代价函数
% costJ1lambda*J2/2beta*J3;
%
%
% %% 反向传播算法计算各层敏感度delta
% delta3-(data-a3).*dsigmoid(z3);
% spare_deltabeta*(-sparsityParam./rho_hat(1-sparsityParam)./(1-rho_hat));
% delta2bsxfun(plus,W2*delta3,spare_delta).*dsigmoid(z2); % 这里加入了稀疏项
%
% %% 计算代价函数对各层权值和偏执项的梯度
% W1graddelta2*a1/mlambda*W1;
% W2graddelta3*a2/mlambda*W2;
% b1gradsum(delta2,2)/m;
% b2gradsum(delta3,2)/m;
%%%-------------------------------------------------------------------
% Heres an implementation of the sigmoid function, which you may find useful
% in your computation of the costs and the gradients. This inputs a (row or
% column) vector (say (z1, z2, z3)) and returns (f(z1), f(z2), f(z3)). function sigm sigmoid(x)sigm 1 ./ (1 exp(-x));
end%% 求解sigmoid函数的导数
% function dsigm dsigmoid(x)
% sigx sigmoid(x);
% dsigmsigx.*(1-sigx);
% end
computeNumericalGradient.m
function numgrad computeNumericalGradient(J, theta)
% numgrad computeNumericalGradient(J, theta)
% theta: a vector of parameters参数向量包含W1、W2、b1、b2
% J: a function that outputs a real-number. Calling y J(theta) will return the
% function value at theta. % Initialize numgrad with zeros
numgrad zeros(size(theta));%% ---------- YOUR CODE HERE --------------------------------------
% Instructions:
% Implement numerical gradient checking, and return the result in numgrad.
% (See Section 2.3 of the lecture notes.)
% You should write code so that numgrad(i) is (the numerical approximation to) the
% partial derivative of J with respect to the i-th input argument, evaluated at theta.
% I.e., numgrad(i) should be the (approximately) the partial derivative of J with
% respect to theta(i).
%
% Hint: You will probably want to compute the elements of numgrad one at a time.
EPSILON0.0001;
for i1:size(theta)theta_plustheta;theta_minutheta;theta_plus(i)theta_plus(i)EPSILON;theta_minu(i)theta_minu(i)-EPSILON;numgrad(i)(J(theta_plus)-J(theta_minu))/(2*EPSILON);
end
%% ---------------------------------------------------------------
end
checkNumericalGradient.m 梯度检验是在编写机器学习算法时必备的技术可以检验所编写的cost函数是否正确 cost函数的主要功能是计算代价函数、计算代价函数对参数的梯度 实际程序中梯度检验要配合cost函数一起使用可以将该部分单独放在一个测试函数checkCost() 中 ① 给定一组样本及参数初始值 ② 利用cost函数计算grad ③ 利用computeNumericalGradient函数计算梯度的近似值numGrad ④ 比较grad和numGrad是否比较相近如果diff小于1e-6则cost函数是正确的否则需要检查cost函数 diff norm(numGrad-grad)/norm(numGradgrad); disp(diff); 在确定cost函数没有问题后要屏蔽掉梯度检验部分的代码否则将会浪费许多时间 function [] checkNumericalGradient()
% 该函数主要目的是检验SparseAutoencoderCost函数是否正确
% This code can be used to check your numerical gradient implementation
% in computeNumericalGradient.m
% It analytically evaluates the gradient of a very simple function called
% simpleQuadraticFunction (see below) and compares the result with your numerical
% solution. Your numerical gradient implementation is incorrect if
% your numerical solution deviates too much from the analytical solution.% Evaluate the function and gradient at x [4; 10]; (Here, x is a 2d vector.)
x [4; 10];
[value, grad] simpleQuadraticFunction(x);% Use your code to numerically compute the gradient of simpleQuadraticFunction at x.
% (The notation simpleQuadraticFunction denotes a pointer to a function.)
numgrad computeNumericalGradient(simpleQuadraticFunction, x);% Visually examine the two gradient computations. The two columns
% you get should be very similar.
disp([numgrad grad]);
fprintf(The above two columns you get should be very similar.\n(Left-Your Numerical Gradient, Right-Analytical Gradient)\n\n);% Evaluate the norm of the difference between two solutions.
% If you have a correct implementation, and assuming you used EPSILON 0.0001
% in computeNumericalGradient.m, then diff below should be 2.1452e-12
diff norm(numgrad-grad)/norm(numgradgrad);
disp(diff);
fprintf(Norm of the difference between numerical and analytical gradient (should be 1e-9)\n\n);
endfunction [value,grad] simpleQuadraticFunction(x)
% this function accepts a 2D vector as input.
% Its outputs are:
% value: h(x1, x2) x1^2 3*x1*x2
% grad: A 2x1 vector that gives the partial derivatives of h with respect to x1 and x2
% Note that when we pass simpleQuadraticFunction(x) to computeNumericalGradients, were assuming
% that computeNumericalGradients will use only the first returned value of this function.value x(1)^2 3*x(1)*x(2);grad zeros(2, 1);
grad(1) 2*x(1) 3*x(2);
grad(2) 3*x(1);end
% %% some initialize
% numgrad zeros(size(theta));%Initialize numgrad with zeros
% n size(theta,1);% theta(1),...,theta(n)
% EPSILON 1e-4;
%
% %% calculate the partial derivative of J with respect to theta(i)
% for i 1:n
% theta_add zeros(n,1);
% theta_add(i) EPSILON;
% numgrad(i) (J(theta theta_add) - J(theta-theta_add))./EPSILON/2;
% end 参考文献 UFLDL教程
Exercise:Sparse Autoencoder
Deep Learning 1_深度学习UFLDL教程Sparse Autoencoder练习斯坦福大学深度学习教程
Deep learning九(Sparse Autoencoder练习)
UFLDL教程答案(1):Exercise:Sparse_Autoencoder
UFLDL教程之一sparseae_exercise
梯度检验
吴恩达 Andrew Ng 的公开课