Skip to content
Snippets Groups Projects
Commit b7c93e4a authored by ashivani's avatar ashivani
Browse files

Update AUTODRI_baseline.ipynb

parent 1c88ea9e
No related branches found
No related tags found
No related merge requests found
Pipeline #5237 failed
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
![AIcrowd-Logo](https://raw.githubusercontent.com/AIcrowd/AIcrowd/master/app/assets/images/misc/aicrowd-horizontal.png) ![AIcrowd-Logo](https://raw.githubusercontent.com/AIcrowd/AIcrowd/master/app/assets/images/misc/aicrowd-horizontal.png)
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Baseline for [AUTODRI Challenge](https://www.aicrowd.com/challenges/autodri) on AIcrowd # Baseline for [AUTODRI Challenge](https://www.aicrowd.com/challenges/autodri) on AIcrowd
#### Author : Ayush Shivani #### Author : Ayush Shivani
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Download Necessary Packages ## Download Necessary Packages
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import sys import sys
!pip install numpy !pip install numpy
!pip install pandas !pip install pandas
!pip install scikit-learn !pip install scikit-learn
!pip install matplotlib tqdm !pip install matplotlib tqdm
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Download data ## Download data
The first step is to download the training data and the test data The first step is to download the training data and the test data
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# #Donwload the datasets # #Donwload the datasets
!rm -rf data/ !rm -rf data/
!mkdir data/ !mkdir data/
!curl https://datasets.aicrowd.com/default/aicrowd-practice-challenges/public/autodri/v0.1/train.zip !wget https://datasets.aicrowd.com/default/aicrowd-practice-challenges/public/autodri/v0.1/train.zip
!curl https://datasets.aicrowd.com/default/aicrowd-practice-challenges/public/autodri/v0.1/test.zip !wget https://datasets.aicrowd.com/default/aicrowd-practice-challenges/public/autodri/v0.1/test.zip
!curl https://datasets.aicrowd.com/default/aicrowd-practice-challenges/public/autodri/v0.1/val.zip !wget https://datasets.aicrowd.com/default/aicrowd-practice-challenges/public/autodri/v0.1/val.zip
!unzip train.zip !unzip train.zip
!unzip test.zip !unzip test.zip
!unzip val.zip !unzip val.zip
!mv train data/train !mv train data/train
!mv test data/test !mv test data/test
!mv val data/val !mv val data/val
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
## Now the data is available at the following locations: ## Now the data is available at the following locations:
TRAINING_IMAGES_FOLDER = "data/train/cameraFront" TRAINING_IMAGES_FOLDER = "data/train/cameraFront"
TRAINING_LABELS_PATH = "data/train/train.csv" TRAINING_LABELS_PATH = "data/train/train.csv"
TESTING_LABELS_PATH = "data/test/test.csv" TESTING_LABELS_PATH = "data/test/test.csv"
TESTING_IMAGES_FOLDER = "data/test/cameraFront" TESTING_IMAGES_FOLDER = "data/test/cameraFront"
# For this baseline, we will only be using the front camera angle of the car just for demonstration purpose. For actual one should try and see the best combination of all the angles # For this baseline, we will only be using the front camera angle of the car just for demonstration purpose. For actual one should try and see the best combination of all the angles
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Import packages ## Import packages
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import os import os
import tqdm import tqdm
import pandas as pd import pandas as pd
import numpy as np import numpy as np
from sklearn.model_selection import train_test_split from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error,mean_absolute_error from sklearn.metrics import mean_squared_error,mean_absolute_error
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
%matplotlib inline %matplotlib inline
from PIL import Image from PIL import Image
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Load Data ## Load Data
We use PIL library to load our images. Here we are creating our array where our input features are the mean colours and output features are the rotations along the x axis. We use PIL library to load our images. Here we are creating our array where our input features are the mean colours and output features are the rotations along the x axis.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
training_labels_df = pd.read_csv(TRAINING_LABELS_PATH) training_labels_df = pd.read_csv(TRAINING_LABELS_PATH)
def pre_process_data_X(image): def pre_process_data_X(image):
""" """
This file takes a loaded image and returns a particular This file takes a loaded image and returns a particular
representation of the data point representation of the data point
NOTE: This current baseline implements a **very** silly approach NOTE: This current baseline implements a **very** silly approach
of representing every image by the mean RGB values for every image. of representing every image by the mean RGB values for every image.
You are encourage to try to alternate representations of the data, You are encourage to try to alternate representations of the data,
or figure out how to learn the best representation from the data ;) or figure out how to learn the best representation from the data ;)
""" """
im_array = np.array(im) im_array = np.array(im)
mean_rgb = im_array.mean(axis=(0, 1)) mean_rgb = im_array.mean(axis=(0, 1))
return mean_rgb return mean_rgb
ALL_DATA = [] ALL_DATA = []
for _idx, row in tqdm.tqdm(training_labels_df.iterrows(), total=training_labels_df.shape[0]): for _idx, row in tqdm.tqdm(training_labels_df.iterrows(), total=training_labels_df.shape[0]):
filepath = os.path.join( filepath = os.path.join(
TRAINING_IMAGES_FOLDER, TRAINING_IMAGES_FOLDER,
row.filename row.filename
) )
im = Image.open(filepath) im = Image.open(filepath)
data_X = pre_process_data_X(im) data_X = pre_process_data_X(im)
data_Y = [row.canSteering] data_Y = [row.canSteering]
ALL_DATA.append((data_X, data_Y)) ALL_DATA.append((data_X, data_Y))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Exploratory Data Analysis ## Exploratory Data Analysis
We now see the kind of images the dataset contains to get a better idea. We now see the kind of images the dataset contains to get a better idea.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
plt.figure(figsize=(20,20)) plt.figure(figsize=(20,20))
for i in range(16): for i in range(16):
filename,xRot = training_labels_df.iloc[i] filename,xRot = training_labels_df.iloc[i]
filepath = os.path.join( filepath = os.path.join(
TRAINING_IMAGES_FOLDER, TRAINING_IMAGES_FOLDER,
filename filename
) )
im = Image.open(filepath) im = Image.open(filepath)
plt.subplot(4,4,i+1) plt.subplot(4,4,i+1)
plt.axis('off') plt.axis('off')
plt.title("canSteering: %.3f"%(xRot)) plt.title("canSteering: %.3f"%(xRot))
plt.imshow(im) plt.imshow(im)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Split Data into Train and Validation ## Split Data into Train and Validation
We split the dataset into Training data and Validation datasets to help us test the generalizability of our models, and to ensure that we are not overfitting on the training set. We split the dataset into Training data and Validation datasets to help us test the generalizability of our models, and to ensure that we are not overfitting on the training set.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
training_set, validation_set= train_test_split(ALL_DATA, test_size=0.2, random_state=42) training_set, validation_set= train_test_split(ALL_DATA, test_size=0.2, random_state=42)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Here we have selected the size of the testing data to be 20% of the total data. You can change it and see what effect it has on the accuracies. To learn more about the train_test_split function [click here](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html). Here we have selected the size of the testing data to be 20% of the total data. You can change it and see what effect it has on the accuracies. To learn more about the train_test_split function [click here](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html).
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Now, since we have our data splitted into train and validation sets, we need to get the label separated from the data. Now, since we have our data splitted into train and validation sets, we need to get the label separated from the data.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
X_train, y_train = zip(*training_set) X_train, y_train = zip(*training_set)
X_val, y_val = zip(*validation_set) X_val, y_val = zip(*validation_set)
X_train = np.array(X_train) X_train = np.array(X_train)
y_train = np.array(y_train) y_train = np.array(y_train)
X_val = np.array(X_val) X_val = np.array(X_val)
y_val = np.array(y_val) y_val = np.array(y_val)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Define the Classifier ## Define the Classifier
Now we finally come to the juicy part. Now we finally come to the juicy part.
Now that all the data is all loaded and available nice, we can finally get to training the classifier. Here we use sklearn [`MLPRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html) to train our network. We can tune the hyper parameters based on cross validation scores Now that all the data is all loaded and available nice, we can finally get to training the classifier. Here we use sklearn [`MLPRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html) to train our network. We can tune the hyper parameters based on cross validation scores
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
model = MLPRegressor(hidden_layer_sizes=[10, 10], verbose=True) model = MLPRegressor(hidden_layer_sizes=[10, 10], verbose=True)
# NOTE : This is again silly hyper parameter instantiation of this problem, # NOTE : This is again silly hyper parameter instantiation of this problem,
# and we encourage you to explore what works the best for you. # and we encourage you to explore what works the best for you.
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Train the classifier ## Train the classifier
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
model.fit(X_train, y_train) model.fit(X_train, y_train)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Predict on Validation ## Predict on Validation
Now we predict our trained classifier on the validation set and evaluate our model Now we predict our trained classifier on the validation set and evaluate our model
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
y_pred = model.predict(X_val) y_pred = model.predict(X_val)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Evaluate the Performance ## Evaluate the Performance
We use the same metrics as that will be used for the test set. We use the same metrics as that will be used for the test set.
[MAE](https://en.wikipedia.org/wiki/Mean_absolute_error) and [RMSE](https://www.statisticshowto.com/rmse/) are the metrics for this challenge [MAE](https://en.wikipedia.org/wiki/Mean_absolute_error) and [RMSE](https://www.statisticshowto.com/rmse/) are the metrics for this challenge
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
print('Mean Absolute Error:', mean_absolute_error(y_val, y_pred)) print('Mean Absolute Error:', mean_absolute_error(y_val, y_pred))
print('Mean Squared Error:', mean_squared_error(y_val, y_pred)) print('Mean Squared Error:', mean_squared_error(y_val, y_pred))
print('Root Mean Squared Error:', np.sqrt(mean_squared_error(y_val, y_pred))) print('Root Mean Squared Error:', np.sqrt(mean_squared_error(y_val, y_pred)))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Load Test Set ## Load Test Set
Load the test data now Load the test data now
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import glob import glob
testing_labels_df = pd.read_csv(TESTING_LABELS_PATH) testing_labels_df = pd.read_csv(TESTING_LABELS_PATH)
TEST_DATA = [] TEST_DATA = []
TEST_FILENAMES = [] TEST_FILENAMES = []
for _idx, row in tqdm.tqdm(testing_labels_df.iterrows(), total=testing_labels_df.shape[0]): for _idx, row in tqdm.tqdm(testing_labels_df.iterrows(), total=testing_labels_df.shape[0]):
filepath = os.path.join( filepath = os.path.join(
TESTING_IMAGES_FOLDER, TESTING_IMAGES_FOLDER,
row.filename row.filename
) )
print(filepath) print(filepath)
im = Image.open(filepath) im = Image.open(filepath)
data_X = pre_process_data_X(im) data_X = pre_process_data_X(im)
TEST_DATA.append(data_X) TEST_DATA.append(data_X)
TEST_FILENAMES.append(row.filename) TEST_FILENAMES.append(row.filename)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Make predictions on the test set ## Make predictions on the test set
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
test_predictions = model.predict(TEST_DATA) test_predictions = model.predict(TEST_DATA)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
test_predictions.shape test_predictions.shape
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
test_df = pd.DataFrame(test_predictions, columns=['canSteering']) test_df = pd.DataFrame(test_predictions, columns=['canSteering'])
test_df["filename"] = TEST_FILENAMES test_df["filename"] = TEST_FILENAMES
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
test_df.shape test_df.shape
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Save the prediction to csv ## Save the prediction to csv
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
test_df.to_csv('submission.csv', index=False) test_df.to_csv('submission.csv', index=False)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
**Note**: Do take a look at the submission format.The submission file should contain the following header : `filename,xRot`. **Note**: Do take a look at the submission format.The submission file should contain the following header : `filename,xRot`.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## To download the generated csv in Google Colab run the below command ## To download the generated csv in Google Colab run the below command
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from google.colab import files from google.colab import files
files.download('submission.csv') files.download('submission.csv')
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Well Done! 👍 We are all set to make a submission and see your name on leaderborad. Lets navigate to [challenge page](https://www.aicrowd.com/challenges/autodri) and make one. ### Well Done! 👍 We are all set to make a submission and see your name on leaderborad. Lets navigate to [challenge page](https://www.aicrowd.com/challenges/autodri) and make one.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
``` ```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment