Commit f448bafe authored by shubham_sharma's avatar shubham_sharma
Browse files

First commit

parents
# To be filled in later with an consistent contribution guide
- Q : Who writes that ?
\ No newline at end of file
# 🕵️ Introduction
![](https://storage.googleapis.com/kaggle-competitions/kaggle/4104/media/retina.jpg)
Test your vision of ML by this vision problem of Diabetes. Diabetic retinopathy is the leading cause of blindness in the working-age population of the developed world.
Here is a problem for you to classify the patient retina as being diabetic or not diabetic taking into consideration the available features of dataset.
Understand with code! Here is [getting started code](https://discourse.aicrowd.com/t/baseline-mnist/2757) for you.😄
# 💾 Dataset
This dataset contains features extracted from the Messidor image set to predict whether an image contains signs of diabetic retinopathy or not. There are total of `20` attributes to this dataset, out of which first `19` attributes represents a descriptive features extracted from the image set. Last attribute `label` is `1` if image contains signs of Diabetic Retinopathy and `0` if no signs of Diabetic Retinopathy.
For details about attributes visit [here!](https://gitlab.aicrowd.com/aicrowd/practice-challenges/aicrowd_DIBRD_challenge/blob/master/dataset_info.txt).
## 📁 Files
- `./data/train.csv` - (`920` samples) File that should be used for training and validation purpose by the user.
- `./data/test.csv` - (`230` samples) File that will be used for actual evaluation for the leaderboard score.
# 🚀 Submission
- Prepare a csv containing header as `label` and predicted value as digit `0` or `1` with name as `submission.csv`.
- Sample submission format available at `./data/sample_submission.csv`.
**Make your first submission [here](https://www.aicrowd.com/challenges/dibrd-predict-diabetic-retinopathy/submissions/new) 🚀 !!**
# 🖊 Evaluation Criteria
During evaluation [F1 score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) and [Log Loss](http://wiki.fast.ai/index.php/Log_Loss) will be used to test the efficiency of the model where,
<img src="https://latex.codecogs.com/gif.latex?%24%24F1%20%3D%20%7Bprecision%20*%20recall%20%5Cover%20precision%20&plus;%20recall%7D%24%24"/> </br>
<img src="http://latex.codecogs.com/gif.latex?%24%24%20Log%20Loss%20%3D%20-log%20P%28yt%7Cyp%29%20%3D%20-%28yt%20log%28yp%29%20&plus;%20%281%20-%20yt%29%20log%281%20-%20yp%29%29%20%24%24"/>
# 🔗 Links
* 💪 Challenge Page : https://www.aicrowd.com/challenges/dibrd-predict-diabetic-retinopathy
* 🗣️ Discussion Forum : https://www.aicrowd.com/challenges/dibrd-predict-diabetic-retinopathy/discussion
* 🏆 leaderboard : https://www.aicrowd.com/challenges/dibrd-predict-diabetic-retinopathy/leaderboards
# 📱 Contact
- [Shubham Sharma](shubham@ext.aicrowd.com)
# 📚 References
* References:
- Dr. Balint Antal, Department of Computer Graphics and Image Processing
Faculty of Informatics, University of Debrecen, 4010, Debrecen, POB 12, Hungary, antal.balint@inf.unideb.hu
- Dr. Andras Hajdu, Department of Computer Graphics and Image Processing
Faculty of Informatics, University of Debrecen, 4010, Debrecen, POB 12, Hungary, hajdu.andras@inf.unideb.hu
- Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
- [Image source](https://www.kaggle.com/c/diabetic-retinopathy-detection)
\ No newline at end of file
---
challenge_name: aicrowd_DIBRD_challenge
evaluation_repo: git@gitlab.aicrowd.com:aicrowd/practice-challenges/aicrowd_DIBRD_challenge_evaluator.git
data_url: https://s3.wasabisys.com/aicrowd-practice-challenges/public/dibrd/v0.1/test_ground_truth.csv
official_baseline: DIBRD_baseline.ipynb
authors:
- name: Shubham Sharma
email: shubham@ext.aicrowd.com
version: '0.1'
%% Cell type:markdown id: tags:
# Baseline for DIBRD Practice Challenge on AIcrowd
#### Author : Shubham Sharma
%% Cell type:markdown id: tags:
## To open this notebook on Google Computing platform Colab, click below!
%% Cell type:markdown id: tags:
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ayushshivani/aicrowd_educational_baselines/blob/master/MNIST_baseline.ipynb)
%% Cell type:markdown id: tags:
## Download Necessary Packages
%% Cell type:code id: tags:
``` python
import sys
!{sys.executable} -m pip install numpy
!{sys.executable} -m pip install pandas
!{sys.executable} -m pip install scikit-learn
```
%% Cell type:markdown id: tags:
## Download dataset
%% Cell type:code id: tags:
``` python
!wget https://s3.eu-central-1.wasabisys.com/aicrowd-public-datasets/aicrowd_educational_mnist/data/public/test.zip
!wget https://s3.eu-central-1.wasabisys.com/aicrowd-public-datasets/aicrowd_educational_mnist/data/public/train.zip
!unzip train.zip
!unzip test.zip
```
%% Cell type:markdown id: tags:
## Import packages
%% Cell type:code id: tags:
``` python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import f1_score,precision_score,recall_score,accuracy_score
```
%% Cell type:markdown id: tags:
## Load the data
%% Cell type:code id: tags:
``` python
train_data_path = "train.csv" #path where data is stored
```
%% Cell type:code id: tags:
``` python
train_data = pd.read_csv(train_data_path,header=None) #load data in dataframe using pandas
```
%% Cell type:markdown id: tags:
## Visualise the Dataset
%% Cell type:code id: tags:
``` python
train_data.head()
```
%% Cell type:markdown id: tags:
You can see the columns goes from 0 to 20, where columns from 0 to 19 represents features extracted from the image set and last column represents the type of patient i.e 1 if if signs of Diabetic Retinopathy is present else 0.
%% Cell type:markdown id: tags:
## Split the data in train/test
%% Cell type:code id: tags:
``` python
X_train, X_test= train_test_split(train_data, test_size=0.2, random_state=42)
```
%% Cell type:markdown id: tags:
Here we have selected the size of the testing data to be 20% of the total data. You can change it and see what effect it has on the accuracies. To learn more about the train_test_split function [click here](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html).
%% Cell type:markdown id: tags:
Now, since we have our data splitted into train and validation sets, we need to get the label separated from the data.
%% Cell type:code id: tags:
``` python
X_train,y_train = X_train.iloc[:,1:],X_train.iloc[:,0]
X_test,y_test = X_test.iloc[:,1:],X_test.iloc[:,0]
```
%% Cell type:markdown id: tags:
## Define the classifier
%% Cell type:code id: tags:
``` python
classifier = LogisticRegression(solver = 'lbfgs',multi_class='auto',max_iter=10)
```
%% Cell type:markdown id: tags:
We have used [Logistic Regression](https://en.wikipedia.org/wiki/Logistic_regression) as a classifier here and set few of the parameteres. But one can set more parameters and increase the performance. To see the list of parameters visit [here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html).
%% Cell type:markdown id: tags:
We can also use other classifiers. To read more about sklean classifiers visit [here](https://scikit-learn.org/stable/supervised_learning.html). Try and use other classifiers to see how the performance of your model changes.
%% Cell type:markdown id: tags:
## Train the classifier
%% Cell type:code id: tags:
``` python
classifier.fit(X_train, y_train)
```
%% Cell type:markdown id: tags:
Got a warning! Dont worry, its just beacuse the number of iteration is very less(defined in the classifier in the above cell).Increase the number of iterations and see if the warning vanishes.Do remember increasing iterations also increases the running time.( Hint: max_iter=500)
%% Cell type:markdown id: tags:
# Predict on test set
%% Cell type:code id: tags:
``` python
y_pred = classifier.predict(X_test)
```
%% Cell type:markdown id: tags:
## Find the scores
%% Cell type:code id: tags:
``` python
precision = precision_score(y_test,y_pred,average='micro')
recall = recall_score(y_test,y_pred,average='micro')
accuracy = accuracy_score(y_test,y_pred)
f1 = f1_score(y_test,y_pred,average='macro')
```
%% Cell type:code id: tags:
``` python
print("Accuracy of the model is :" ,accuracy)
print("Recall of the model is :" ,recall)
print("Precision of the model is :" ,precision)
print("F1 score of the model is :" ,f1)
```
%% Cell type:markdown id: tags:
# Prediction on Evaluation Set
%% Cell type:markdown id: tags:
# Load the evaluation data
%% Cell type:code id: tags:
``` python
final_test_path = "test.csv"
final_test = pd.read_csv(final_test_path,header=None)
```
%% Cell type:markdown id: tags:
## Predict on evaluation set
%% Cell type:code id: tags:
``` python
submission = classifier.predict(final_test)
```
%% Cell type:markdown id: tags:
## Save the prediction to csv
%% Cell type:code id: tags:
``` python
submission = pd.DataFrame(submission)
submission.to_csv('/tmp/submission.csv',header=['label'],index=False)
```
%% Cell type:markdown id: tags:
Note: Do take a look at the submission format.The submission file should contain a header.For eg here it is "label".
%% Cell type:markdown id: tags:
## To download the generated csv in colab run the below command
%% Cell type:code id: tags:
``` python
from google.colab import files
files.download('/tmp/submission.csv')
```
%% Cell type:markdown id: tags:
### Go to [platform](https://www.aicrowd.com/challenges/mnist-recognise-handwritten-digits/). Participate in the challenge and submit the submission.csv generated.
%% Cell type:markdown id: tags:
# Baseline for DIBRD Practice Challenge on AIcrowd
#### Author : Shubham Sharma
%% Cell type:markdown id: tags:
## To open this notebook on Google Computing platform Colab, click below!
%% Cell type:markdown id: tags:
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ayushshivani/aicrowd_educational_baselines/blob/master/MNIST_baseline.ipynb)
%% Cell type:markdown id: tags:
## Download Necessary Packages
%% Cell type:code id: tags:
``` python
import sys
!{sys.executable} -m pip install numpy
!{sys.executable} -m pip install pandas
!{sys.executable} -m pip install scikit-learn
```
%% Cell type:markdown id: tags:
## Download dataset
%% Cell type:code id: tags:
``` python
!wget https://s3.eu-central-1.wasabisys.com/aicrowd-public-datasets/aicrowd_educational_mnist/data/public/test.zip
!wget https://s3.eu-central-1.wasabisys.com/aicrowd-public-datasets/aicrowd_educational_mnist/data/public/train.zip
!unzip train.zip
!unzip test.zip
```
%% Cell type:markdown id: tags:
## Import packages
%% Cell type:code id: tags:
``` python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import f1_score,precision_score,recall_score,accuracy_score
```
%% Cell type:markdown id: tags:
## Load the data
%% Cell type:code id: tags:
``` python
train_data_path = "train.csv" #path where data is stored
```
%% Cell type:code id: tags:
``` python
train_data = pd.read_csv(train_data_path,header=None) #load data in dataframe using pandas
```
%% Cell type:markdown id: tags:
## Visualise the Dataset
%% Cell type:code id: tags:
``` python
train_data.head()
```
%% Cell type:markdown id: tags:
You can see the columns goes from 0 to 20, where columns from 0 to 19 represents features extracted from the image set and last column represents the type of patient i.e 1 if if signs of Diabetic Retinopathy is present else 0.
%% Cell type:markdown id: tags:
## Split the data in train/test
%% Cell type:code id: tags:
``` python
X_train, X_test= train_test_split(train_data, test_size=0.2, random_state=42)
```
%% Cell type:markdown id: tags:
Here we have selected the size of the testing data to be 20% of the total data. You can change it and see what effect it has on the accuracies. To learn more about the train_test_split function [click here](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html).
%% Cell type:markdown id: tags:
Now, since we have our data splitted into train and validation sets, we need to get the label separated from the data.
%% Cell type:code id: tags:
``` python
X_train,y_train = X_train.iloc[:,1:],X_train.iloc[:,0]
X_test,y_test = X_test.iloc[:,1:],X_test.iloc[:,0]
```
%% Cell type:markdown id: tags:
## Define the classifier
%% Cell type:code id: tags:
``` python
classifier = LogisticRegression(solver = 'lbfgs',multi_class='auto',max_iter=10)
```
%% Cell type:markdown id: tags:
We have used [Logistic Regression](https://en.wikipedia.org/wiki/Logistic_regression) as a classifier here and set few of the parameteres. But one can set more parameters and increase the performance. To see the list of parameters visit [here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html).
%% Cell type:markdown id: tags:
We can also use other classifiers. To read more about sklean classifiers visit [here](https://scikit-learn.org/stable/supervised_learning.html). Try and use other classifiers to see how the performance of your model changes.
%% Cell type:markdown id: tags:
## Train the classifier
%% Cell type:code id: tags:
``` python
classifier.fit(X_train, y_train)
```
%% Cell type:markdown id: tags:
Got a warning! Dont worry, its just beacuse the number of iteration is very less(defined in the classifier in the above cell).Increase the number of iterations and see if the warning vanishes.Do remember increasing iterations also increases the running time.( Hint: max_iter=500)
%% Cell type:markdown id: tags:
# Predict on test set
%% Cell type:code id: tags:
``` python
y_pred = classifier.predict(X_test)
```
%% Cell type:markdown id: tags:
## Find the scores
%% Cell type:code id: tags:
``` python
precision = precision_score(y_test,y_pred,average='micro')
recall = recall_score(y_test,y_pred,average='micro')
accuracy = accuracy_score(y_test,y_pred)
f1 = f1_score(y_test,y_pred,average='macro')
```
%% Cell type:code id: tags:
``` python
print("Accuracy of the model is :" ,accuracy)
print("Recall of the model is :" ,recall)
print("Precision of the model is :" ,precision)
print("F1 score of the model is :" ,f1)
```
%% Cell type:markdown id: tags:
# Prediction on Evaluation Set
%% Cell type:markdown id: tags:
# Load the evaluation data
%% Cell type:code id: tags:
``` python
final_test_path = "test.csv"
final_test = pd.read_csv(final_test_path,header=None)
```
%% Cell type:markdown id: tags:
## Predict on evaluation set
%% Cell type:code id: tags:
``` python
submission = classifier.predict(final_test)
```
%% Cell type:markdown id: tags:
## Save the prediction to csv
%% Cell type:code id: tags:
``` python
submission = pd.DataFrame(submission)
submission.to_csv('/tmp/submission.csv',header=['label'],index=False)
```
%% Cell type:markdown id: tags:
Note: Do take a look at the submission format.The submission file should contain a header.For eg here it is "label".
%% Cell type:markdown id: tags:
## To download the generated csv in colab run the below command
%% Cell type:code id: tags:
``` python
from google.colab import files
files.download('/tmp/submission.csv')
```
%% Cell type:markdown id: tags:
### Go to [platform](https://www.aicrowd.com/challenges/mnist-recognise-handwritten-digits/). Participate in the challenge and submit the submission.csv generated.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment