No Result
View All Result
DevRescue
  • Home
  • Python
  • Lists
  • Movies
  • Finance
  • Opinion
  • About
  • Contact Us
  • Home
  • Python
  • Lists
  • Movies
  • Finance
  • Opinion
  • About
  • Contact Us
DevRescue
Home Blog Python

Simple Python k-Nearest Neighbors Tutorial

by Khaleel O.
July 12, 2021
in Python
Reading Time: 6 mins read
A A

In this Python k-Nearest Neighbors tutorial, we will give you a simple introduction into Machine Learning with Python using the k-nearest neighbors (KNN or k-NN) algorithm. We will use this algorithm to solve a simple classification problem. We will be using Python 3.8.10.

KNN is a simple, supervised machine learning algorithm that can be used to solve both classification and regression problems. For this tutorial however, we will be doing classification.

Classification with KNN works by a simple majority vote mechanism. KNN predicts the label of any data point by looking at the K closest labeled data points and getting them to vote on what label the unlabeled point should have. The label assigned is the class of that data point.

Let’s begin. 🙌🙌🙌

STEP 1

STEP 2

STEP 3

STEP 4


STEP 1

We will use the scikit-learn module to perform our classification. If you don’t have it installed you can install it with the following command at the console:

pip install scikit-learn

STEP 2

Next we import our classifier and our dataset.

We import the KNeighborsClassifier which is the classifier that actually implements the k-nearest neighbors vote.

from sklearn.neighbors import KNeighborsClassifier

For this tutorial we will be using the cardiac Single Proton Emission Computed Tomography (SPECT) images dataset which you can find HERE. This dataset merges the original separate train and test datasets, which you can find HERE.

In the cardiac SPECT dataset, each of 348 patients was classified into two categories (normal and abnormal) based on 44 continuous features. These features were extracted by processing the original SPECT images.

Let us import the dataset from a CSV file:

from sklearn.neighbors import KNeighborsClassifier

import requests
import pandas as pd 
import io

import numpy as np
import matplotlib.pyplot as plt


url = 'https://raw.githubusercontent.com/devrescue/python/main/datasets/SPECTF.csv'

s = requests.get(url).content

df = pd.read_csv(io.StringIO(s.decode('utf-8')))

#some information about our dataset
print(df.info())

We did a separate tutorial on how to import data from CSV files that you might want to check out.

Doing a bit of Exploratory Data Analysis on our dataset we can see that there are 45 columns and 349 rows.

There are actually 348 samples or rows because the first row is the column header, which we exclude. Each row represents the data for one heart patient.

The OVERALL_DIAGNOSIS column is our TARGET VARIABLE. As you can see, the possible values are 0 = normal and 1 = abnormal.

We will use KNN to try and predict the class of the target variable, i.e. whether a patient is normal or abnormal.

We will base this prediction on the 44 PREDICTOR VARIABLES or FEATURE VARIABLES. We say 44 because we excluded our target variable column.

Let’s do a quick count of the NORMAL vs ABNORMAL cases in our dataset:

print(len(df[df.OVERALL_DIAGNOSIS == 0])) #normal
print(len(df[df.OVERALL_DIAGNOSIS == 1])) #abnormal

#output
95 
254

Based on a quick count, we see that we have 95 normal cases and 254 abnormal cases. We can go further and produce a bar plot that summarizes the NORMAL vs ABNORMAL cases visually. First, we install the matplotlib module:

pip install matplotlib

Then we write our code to produce the plot.

fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
diagnoses = ['NORMAL', 'ABNORMAL']
patients = [len(df[df.OVERALL_DIAGNOSIS == 0]),len(df[df.OVERALL_DIAGNOSIS == 1])]
ax.bar(diagnoses,patients)
plt.show()
Normal vs Abnormal Cases in SPECTF Dataset

STEP 3

Now we begin the actual classification:

y = df['OVERALL_DIAGNOSIS'].values 
X = df.drop('OVERALL_DIAGNOSIS', axis=1).values 

knn = KNeighborsClassifier(n_neighbors=6)

knn.fit(X,y)

The target variable for classification as previously said will be OVERALL_DIAGNOSIS. This is what we will use the KNN algorithm to help us predict for new samples or patients.

y is the target variable. So the above statement selects only that column from our dataset. y is also the label for the data.

X is the unlabeled training data. We used df.drop() so that the label isn’t included in the dataset.

KNeighborsClassifier() is the classifier that implements the k-nearest neighbors vote. The parameter n_neighbors is the number of neighbors that will vote for the label of our unlabeled data.

knn.fit(X,y) fits the k-nearest neighbors classifier from the training dataset X and target values y.

STEP 4

Now that we have fitted our knn classifier, we can now try to predict our target variable also known as the label.

y_pred = knn.predict(X)

y_pred is numpy array of the predicted labels as determined by our KNN classifier. Recall that X is our unlabeled training data.

NEXT STEPS

Now that we have trained our KNN classifier with Test data, the next step would be to measure the accuracy and performance of our classifier knn.

You can find the entire source code for this tutorial HERE.

Tags: k-nearest neighborsKNNmachine learning
Previous Post

Load CSV Files with Python and pandas

Next Post

k-Nearest Neighbors Accuracy in Python

Khaleel O.

Khaleel O.

I love to share, educate and help developers. I have 14+ years experience in IT. Currently transitioning from Systems Administration to DevOps. Avid reader, intellectual and dreamer. Enter Freely, Go safely, And leave something of the happiness you bring.

Related Posts

Python

Python Fibonacci Recursive Solution

by Khaleel O.
January 16, 2024
0
0

Let's do a Python Fibonacci Recursive Solution. Let's go! 🔥🔥🔥 The Fibonacci sequence is a series of numbers in which...

Read moreDetails
Python

Python Slice String List Tuple

by Khaleel O.
January 16, 2024
0
0

Let's do a Python Slice string list tuple how-to tutorial. Let's go! 🔥🔥🔥 In Python, a slice is a feature...

Read moreDetails
Python

Python Blowfish Encryption Example

by Khaleel O.
January 14, 2024
0
0

Let's do a Python Blowfish Encryption example. Let's go! 🔥 🔥 Blowfish is a symmetric-key block cipher algorithm designed for...

Read moreDetails
Python

Python Deque Methods

by Khaleel O.
January 14, 2024
0
0

In this post we'll list Python Deque Methods. Ready? Let's go! 🔥🔥🔥 A deque (double-ended queue) in Python is a...

Read moreDetails

DevRescue © 2021 All Rights Reserved. Privacy Policy. Cookie Policy

Manage your privacy

To provide the best experiences, we and our partners use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us and our partners to process personal data such as browsing behavior or unique IDs on this site and show (non-) personalized ads. Not consenting or withdrawing consent, may adversely affect certain features and functions.

Click below to consent to the above or make granular choices. Your choices will be applied to this site only. You can change your settings at any time, including withdrawing your consent, by using the toggles on the Cookie Policy, or by clicking on the manage consent button at the bottom of the screen.

Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Statistics

Marketing

Features
Always active

Always active
  • Manage options
  • Manage services
  • Manage {vendor_count} vendors
  • Read more about these purposes
Manage options
  • {title}
  • {title}
  • {title}
Manage your privacy
To provide the best experiences, DevRescue.com will use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Statistics

Marketing

Features
Always active

Always active
  • Manage options
  • Manage services
  • Manage {vendor_count} vendors
  • Read more about these purposes
Manage options
  • {title}
  • {title}
  • {title}
No Result
View All Result
  • Home
  • Python
  • Lists
  • Movies
  • Finance
  • Opinion
  • About
  • Contact Us

DevRescue © 2022 All Rights Reserved Privacy Policy