Linear Regression with TensorFlow

Neural Network Tutorials - Herong's Tutorial Examples

∟Linear Regression with TensorFlow

This section provides a tutorial example on how to create a linear regression learning model with TensorFlow Python API. An introduction of basics concepts of linear regression is provided. The Python script by Nikhil Kumar is used as a test example.

From previous tutorials, we have learned how to create a tensor flow graph to represent a tensor expression of multiply connected tensor operations. Using very simple tensor expression as an example, we have also learned how to create a TensorFlow session to evaluate the tensor flow graph to generate the output of the tensor expression.

In this tutorial, let's try to create a more real tensor flow graph to solve a machine learning problem using the linear regression model.

First, we need refresh our memory on the linear regression model with the following concepts:

1. Task - Using supervised learning technique to construct a learning model to approximate a real-world relation between a set of features and their related target. Some samples with feature sets and their known targets are provided to help training the model.

2. Features - Also called independent variables, or predictor variables, input values. Features of a single sample is usually represented as X = (x₁, x₂, ..., x_n).

3. Target - Also called dependent variable, or output variable. Target is usually represented as y.

4. Prediction - The output value generated from the learning model. Prediction can be represented as y'.

5. Linear regression model - A linear function to calculate the prediction y' from the features, X. Linear regression model can be expressed as below using vector product operation:

Linear regression model:
  y' = b₀ + B·X

Or:
  y' = b₀ + b₁*x₁ + b₂*x₂ + ... b_n*x_n

6. Intercept, also called bias - A parameter in the linear regression model to move the intercept value up or down. Intercept is the first parameter, b₀, in the above formula.

7. Coefficients, also weights - Also called scale factors. Coefficients are B = (b₁, b₂, ..., b_n), in the above formula.

9. Error, or Residual - The distance between the target and the prediction of a given single simple:

  e = y - y'

10. Loss function - A function that measures how far off the prediction y' is from the target y:

  l = L(y, y')

11. Squared error - Half of the error squared (to power of 2) of a given single simple, (y - y')²/2. Squared error is the commonly used loss function in linear regression model.

  l = e²/2
  l = (y - y')²/2.

12. Cost function - A function on model's parameters (b₀, B) that measures how far off the prediction model on a given set of samples.

  c = C(b₀, B) on (X₁, X₂, ..., X_m)

13. MSE (Mean Squared Error) - The mean value of squared errors on a given set of samples. MSE is the commonly used cost function in linear regression models.

MSE as cost function:
  c = (l₁ + l₂ + ... + l_m)/m
  c = (e₁² + e₂² + ... + e_m²)/m

15. Cost optimization - A process to find model's parameters that result the lowest cost on a given sample set. For a linear regression model, there are two types cost optimization processes:

Direct solution - Since the lowest cost should occur at a critical point where partial derivatives on all parameters are zero, the cost optimization can be converted into a set of partial differential equations. Fortunately, if you are using the MSE as the cost function, Those partial differential equations become linear equations and direct solution is available.
Iterative solution - Start with some initial values for all parameters, then follow an interactive algorithm to update all parameters repeatedly. Each round of updating should reduce the cost to a lower value, so that after many rounds, the cost is close enough the lowest value.

14. Gradient Descent - An iterative solution algorithm for cost optimization. Gradient descent updates model's parameters (b₀, B), in the deepest descending direction on the cost function surface. The descending distance is controlled by factor called learning rate.

Gradient descent is commonly in linear regression models. The formula for calculation parameter updates can be found in any linear regression text book.

15. Learning rate - A factor used to reduce the update quantities on model's parameters in a gradient descent step. A smaller learning rate like 0.01 allows us to rerun the gradient descent step multiple times on the same training set to reach the lowest cost gradually to avoid overshooting problem.

16. Initialization - Providing initial values for model's parameters, intercept and coefficients. Random values are usually used for initialization.

17. Training - Using a set of samples to train the model by using the gradient descent method to find model's parameters that result the lowest cost on the sample set.

18. Epoch - A cycle of training that uses each and every sample once in the training set. If a smaller learning rate is used, you need to run epochs to the lowest cost.

19. Testing - Using a set of samples to test the model by evaluating the cost on the test set.

Okay, enough on linear regression concepts. Let's use TensorFlow to build a linear regression model by following the example provided in "Introduction to TensorFlow" by Nikhil Kumar at https://www.geeksforgeeks.org/introduction-to-tensorflow/

Here is Python script provided by Nikhil Kumar with some updates.

#- linear-regression.py
#- Source: https://www.geeksforgeeks.org/introduction-to-tensorflow/
#- Updates:
#-   Removed graphical plots
#-   Identified variables explicitly
#
import tensorflow as tf
import numpy as np

# Model Parameters
learning_rate = 0.01
training_epochs = 2000
display_step = 200

# Training Data
train_X = np.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
                      7.042,10.791,5.313,7.997,5.654,9.27,3.1])
train_y = np.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
                      2.827,3.465,1.65,2.904,2.42,2.94,1.3])
n_samples = train_X.shape[0]

# Test Data
test_X = np.asarray([6.83, 4.668, 8.9, 7.91, 5.7, 8.7, 3.1, 2.1])
test_y = np.asarray([1.84, 2.273, 3.2, 2.831, 2.92, 3.24, 1.35, 1.03])

# Set placeholders for feature and target vectors
X = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)

# Set model weights and bias
W = tf.Variable(np.random.randn())
b = tf.Variable(np.random.randn())

# Construct a linear model
linear_model = W*X + b

# Mean squared error
cost = tf.reduce_sum(tf.square(linear_model - y)) / (2*n_samples)

# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
minimize = optimizer.minimize(cost, var_list=(W, b))

# Initializing the variables
W_init = tf.variables_initializer([W])
b_init = tf.variables_initializer([b])

# Launch the graph
with tf.Session() as sess:
    # Load initialized variables in current session
    sess.run(W_init)
    sess.run(b_init)

    # Fit all training data
    for epoch in range(training_epochs):

        # perform gradient descent step
        sess.run(minimize, feed_dict={X: train_X, y: train_y})

        # Display logs per epoch step
        if (epoch+1) % display_step == 0:
            c = sess.run(cost, feed_dict={X: train_X, y: train_y})
            print("Epoch:{0:6} \t Cost:{1:10.4} \t W:{2:6.4} \t b:{3:6.4}".
                  format(epoch+1, c, sess.run(W), sess.run(b)))

    # Print final parameter values
    print("Optimization Finished!")
    training_cost = sess.run(cost, feed_dict={X: train_X, y: train_y})
    print("Final training cost:", training_cost, "W:", sess.run(W), "b:",
          sess.run(b), '\n')

    # Testing the model
    testing_cost = sess.run(tf.reduce_sum(
                    tf.square(linear_model - y)) / (2 * test_X.shape[0]),
                    feed_dict={X: test_X, y: test_y})

    print("Final testing cost:", testing_cost)

If you run the script, you should get something similar to the following:

herong$ python3 linear-regression.py

Epoch:   200    Cost:   0.08787    W:0.1923    b: 1.219
Epoch:   400    Cost:   0.08366    W:0.2051    b: 1.129
Epoch:   600    Cost:   0.08107    W:0.2151    b: 1.057
Epoch:   800    Cost:   0.07948    W:0.2230    b: 1.002
Epoch:  1000    Cost:   0.07850    W:0.2292    b: 0.9579
Epoch:  1200    Cost:   0.07789    W:0.2340    b: 0.9236
Epoch:  1400    Cost:   0.07752    W:0.2378    b: 0.8967
Epoch:  1600    Cost:   0.07729    W:0.2408    b: 0.8756
Epoch:  1800    Cost:   0.07715    W:0.2431    b: 0.859
Epoch:  2000    Cost:   0.07707    W:0.2450    b: 0.846
Optimization Finished!
Final training cost: 0.07706697 W: 0.24497162 b: 0.84604114

Final testing cost: 0.079794206

Notes on Nikhil Kumar's sample script:

The script creates a generic linear regression model for multi-dimension features. But the sample set only has 1 feature per sample.
NumPy array (matrix) data structure is used to feed data into TensorFlow placeholders.
tf.train.GradientDescentOptimizer() is used to create an optimizer that offers the minimize() method for creating a special tensor operation to automatically update model's parameters (W, b) with the given learning rate.