**Neural Network Tutorials - Herong's Tutorial Examples** - 1.20, by Dr. Herong Yang

Linear Regression with TensorFlow

This section provides a tutorial example on how to create a linear regression learning model with TensorFlow Python API. An introduction of basics concepts of linear regression is provided. The Python script by Nikhil Kumar is used as a test example.

From previous tutorials, we have learned how to create a tensor flow graph to represent a tensor expression of multiply connected tensor operations. Using very simple tensor expression as an example, we have also learned how to create a TensorFlow session to evaluate the tensor flow graph to generate the output of the tensor expression.

In this tutorial, let's try to create a more real tensor flow graph to solve a machine learning problem using the linear regression model.

First, we need refresh our memory on the linear regression model with the following concepts:

1. Task - Using supervised learning technique to construct a learning model to approximate a real-world relation between a set of features and their related target. Some samples with feature sets and their known targets are provided to help training the model.

2. Features -
Also called independent variables, or predictor variables, input values.
Features of a single sample is usually represented as
X = (x_{1}, x_{2}, ..., x_{n}).

3. Target - Also called dependent variable, or output variable. Target is usually represented as y.

4. Prediction - The output value generated from the learning model. Prediction can be represented as y'.

5. Linear regression model - A linear function to calculate the prediction y' from the features, X. Linear regression model can be expressed as below using vector product operation:

Linear regression model: y' = b_{0}+ B·X Or: y' = b_{0}+ b_{1}*x_{1}+ b_{2}*x_{2}+ ... b_{n}*x_{n}

6. Intercept, also called bias -
A parameter in the linear regression model to move the intercept
value up or down.
Intercept is the first parameter, b_{0}, in the above formula.

7. Coefficients, also weights -
Also called scale factors.
Coefficients are B = (b_{1}, b_{2}, ..., b_{n}), in the above formula.

9. Error, or Residual - The distance between the target and the prediction of a given single simple:

e = y - y'

10. Loss function - A function that measures how far off the prediction y' is from the target y:

l = L(y, y')

11. Squared error -
Half of the error squared (to power of 2) of a given single simple, (y - y')^{2}/2.
Squared error is the commonly used loss function in linear regression model.

l = e^{2}/2 l = (y - y')^{2}/2.

12. Cost function -
A function on model's parameters (b_{0}, B)
that measures how far off the prediction model on a given set of samples.

c = C(b_{0}, B) on (X_{1}, X_{2}, ..., X_{m})

13. MSE (Mean Squared Error) - The mean value of squared errors on a given set of samples. MSE is the commonly used cost function in linear regression models.

MSE as cost function: c = (l_{1}+ l_{2}+ ... + l_{m})/m c = (e_{1}^{2}+ e_{2}^{2}+ ... + e_{m}^{2})/m

15. Cost optimization - A process to find model's parameters that result the lowest cost on a given sample set. For a linear regression model, there are two types cost optimization processes:

- Direct solution - Since the lowest cost should occur at a critical point where partial derivatives on all parameters are zero, the cost optimization can be converted into a set of partial differential equations. Fortunately, if you are using the MSE as the cost function, Those partial differential equations become linear equations and direct solution is available.
- Iterative solution - Start with some initial values for all parameters, then follow an interactive algorithm to update all parameters repeatedly. Each round of updating should reduce the cost to a lower value, so that after many rounds, the cost is close enough the lowest value.

14. Gradient Descent -
An iterative solution algorithm for cost optimization.
Gradient descent updates model's parameters (b_{0}, B),
in the deepest descending direction on the cost function surface.
The descending distance is controlled by factor called learning rate.

Gradient descent is commonly in linear regression models. The formula for calculation parameter updates can be found in any linear regression text book.

15. Learning rate - A factor used to reduce the update quantities on model's parameters in a gradient descent step. A smaller learning rate like 0.01 allows us to rerun the gradient descent step multiple times on the same training set to reach the lowest cost gradually to avoid overshooting problem.

16. Initialization - Providing initial values for model's parameters, intercept and coefficients. Random values are usually used for initialization.

17. Training - Using a set of samples to train the model by using the gradient descent method to find model's parameters that result the lowest cost on the sample set.

18. Epoch - A cycle of training that uses each and every sample once in the training set. If a smaller learning rate is used, you need to run epochs to the lowest cost.

19. Testing - Using a set of samples to test the model by evaluating the cost on the test set.

Okay, enough on linear regression concepts. Let's use TensorFlow to build a linear regression model by following the example provided in "Introduction to TensorFlow" by Nikhil Kumar at https://www.geeksforgeeks.org/introduction-to-tensorflow/

Here is Python script provided by Nikhil Kumar with some updates.

#- linear-regression.py #- Source: https://www.geeksforgeeks.org/introduction-to-tensorflow/ #- Updates: #- Removed graphical plots #- Identified variables explicitly # import tensorflow as tf import numpy as np # Model Parameters learning_rate = 0.01 training_epochs = 2000 display_step = 200 # Training Data train_X = np.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167, 7.042,10.791,5.313,7.997,5.654,9.27,3.1]) train_y = np.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221, 2.827,3.465,1.65,2.904,2.42,2.94,1.3]) n_samples = train_X.shape[0] # Test Data test_X = np.asarray([6.83, 4.668, 8.9, 7.91, 5.7, 8.7, 3.1, 2.1]) test_y = np.asarray([1.84, 2.273, 3.2, 2.831, 2.92, 3.24, 1.35, 1.03]) # Set placeholders for feature and target vectors X = tf.placeholder(tf.float32) y = tf.placeholder(tf.float32) # Set model weights and bias W = tf.Variable(np.random.randn()) b = tf.Variable(np.random.randn()) # Construct a linear model linear_model = W*X + b # Mean squared error cost = tf.reduce_sum(tf.square(linear_model - y)) / (2*n_samples) # Gradient descent optimizer = tf.train.GradientDescentOptimizer(learning_rate) minimize = optimizer.minimize(cost, var_list=(W, b)) # Initializing the variables W_init = tf.variables_initializer([W]) b_init = tf.variables_initializer([b]) # Launch the graph with tf.Session() as sess: # Load initialized variables in current session sess.run(W_init) sess.run(b_init) # Fit all training data for epoch in range(training_epochs): # perform gradient descent step sess.run(minimize, feed_dict={X: train_X, y: train_y}) # Display logs per epoch step if (epoch+1) % display_step == 0: c = sess.run(cost, feed_dict={X: train_X, y: train_y}) print("Epoch:{0:6} \t Cost:{1:10.4} \t W:{2:6.4} \t b:{3:6.4}". format(epoch+1, c, sess.run(W), sess.run(b))) # Print final parameter values print("Optimization Finished!") training_cost = sess.run(cost, feed_dict={X: train_X, y: train_y}) print("Final training cost:", training_cost, "W:", sess.run(W), "b:", sess.run(b), '\n') # Testing the model testing_cost = sess.run(tf.reduce_sum( tf.square(linear_model - y)) / (2 * test_X.shape[0]), feed_dict={X: test_X, y: test_y}) print("Final testing cost:", testing_cost)

If you run the script, you should get something similar to the following:

herong$ python3 linear-regression.py Epoch: 200 Cost: 0.08787 W:0.1923 b: 1.219 Epoch: 400 Cost: 0.08366 W:0.2051 b: 1.129 Epoch: 600 Cost: 0.08107 W:0.2151 b: 1.057 Epoch: 800 Cost: 0.07948 W:0.2230 b: 1.002 Epoch: 1000 Cost: 0.07850 W:0.2292 b: 0.9579 Epoch: 1200 Cost: 0.07789 W:0.2340 b: 0.9236 Epoch: 1400 Cost: 0.07752 W:0.2378 b: 0.8967 Epoch: 1600 Cost: 0.07729 W:0.2408 b: 0.8756 Epoch: 1800 Cost: 0.07715 W:0.2431 b: 0.859 Epoch: 2000 Cost: 0.07707 W:0.2450 b: 0.846 Optimization Finished! Final training cost: 0.07706697 W: 0.24497162 b: 0.84604114 Final testing cost: 0.079794206

Notes on Nikhil Kumar's sample script:

- The script creates a generic linear regression model for multi-dimension features. But the sample set only has 1 feature per sample.
- NumPy array (matrix) data structure is used to feed data into TensorFlow placeholders.
- tf.train.GradientDescentOptimizer() is used to create an optimizer that offers the minimize() method for creating a special tensor operation to automatically update model's parameters (W, b) with the given learning rate.

Table of Contents

Deep Playground for Classical Neural Networks

Building Neural Networks with Python

Simple Example of Neural Networks

►TensorFlow - Machine Learning Platform

"tensorflow" - TensorFlow Python Library

"tensorflow" Interactive Test Web Page

TensorFlow Session Class and run() Function

TensorFlow Variable Class and load() Function

►Linear Regression with TensorFlow

tensorflow.examples.tutorials.mnist Module

Simple TensorFlow Model on MNIST Database

Commonly Used TensorFlow Funcitons

PyTorch - Machine Learning Platform

CNN (Convolutional Neural Network)