From the course: Artificial Intelligence Foundations: Neural Networks

Data preprocessing

- [Instructor] This slide shows how to split the dataset into the input features, which we call X, and the label, which we call Y. Shown here is the output of the normalization step. Use Keras pre-processing to scale the dataset so that all the X input features lie between zero and one, inclusive. This best practice is called normalization. Why? Essentially, if the inputs are not similar, it makes it difficult to initialize a neural network. You create potential problems for yourself. If you feed into a neural network values that have widely different ranges. The network might be able to automatically adapt to these mixed data scales, but it would definitely make learning more difficult. Now we are down to our last step in processing the data, which is to split our dataset into a training set and a test set. We will use the code from scikit-learn called train_test_split, which, as the name suggests, splits our data into a training set and a test set. The code will store the split data into the first four variables on the left of the equals sign, as the variable names suggest. Unfortunately, this function only helps us split our dataset into two. For our lab, we'll keep it simple and use the two sets, but as a best practice, you want to split the dataset into three, training, validation, and test. As you can see, the training set has 719 data points, while the test set has 480 data points each. The X variables have four input features, while the Y variable only has one feature to predict

Contents