Social Items

Showing posts with label Recognition. Show all posts
Showing posts with label Recognition. Show all posts


Latest Update:

I have uploaded the complete code (Python and Jupyter notebook) on GitHub: https://github.com/javedsha/text-classification

Document/Text classification is one of the important and typical task in supervised machine learning (ML). Assigning categories to documents, which can be a web page, library book, media articles, gallery etc. has many applications like e.g. spam filtering, email routing, sentiment analysis etc. In this article, I would like to demonstrate how we can do text classification using python, scikit-learn and little bit of NLTK.

Disclaimer: I am new to machine learning and also to blogging (First). So, if there are any mistakes, please do let me know. All feedback appreciated.

Let’s divide the classification problem into below steps:

  1. Prerequisite and setting up the environment.

Step 1: Prerequisite and setting up the environment

The prerequisites to follow this example are python version 2.7.3 and jupyter notebook. You can just install anaconda and it will get everything for you. Also, little bit of python and ML basics including text classification is required. We will be using scikit-learn (python) libraries for our example.

Step 2: Loading the data set in jupyter.

The data set will be using for this example is the famous “20 Newsgoup” data set. About the data from the original website:

The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. To the best of my knowledge, it was originally collected by Ken Lang, probably for his Newsweeder: Learning to filter netnews paper, though he does not explicitly mention this collection. The 20 newsgroups collection has become a popular data set for experiments in text applications of machine learning techniques, such as text classification and text clustering.

This data set is in-built in scikit, so we don’t need to download it explicitly.

i. Open command prompt in windows and type ‘jupyter notebook’. This will open the notebook in browser and start a session for you.

ii. Select New > Python 2. You can give a name to the notebook - Text Classification Demo 1

iii. Loading the data set: (this might take few minutes, so patience)

from sklearn.datasets import fetch_20newsgroups
twenty_train = fetch_20newsgroups(subset='train', shuffle=True)

 Note: Above, we are only loading the training data. We will load the test data separately later in the example.

iv. You can check the target names (categories) and some data files by following commands.

twenty_train.target_names #prints all the categories
print("\n".join(twenty_train.data[0].split("\n")[:3])) #prints first line of the first data file

Step 3: Extracting features from text files.

Text files are actually series of words (ordered). In order to run machine learning algorithms we need to convert the text files into numerical feature vectors. We will be using bag of words model for our example. Briefly, we segment each text file into words (for English splitting by space), and count # of times each word occurs in each document and finally assign each word an integer id. Each unique word in our dictionary will correspond to a feature (descriptive feature).

Scikit-learn has a high level component which will create feature vectors for us ‘CountVectorizer’. More about it here.

from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(twenty_train.data)

Here by doing ‘count_vect.fit_transform(twenty_train.data)’, we are learning the vocabulary dictionary and it returns a Document-Term matrix. [n_samples, n_features].

TF: Just counting the number of words in each document has 1 issue: it will give more weightage to longer documents than shorter documents. To avoid this, we can use frequency (TF - Term Frequencies) i.e. #count(word) / #Total words, in each document.

TF-IDF: Finally, we can even reduce the weightage of more common words like (the, is, an etc.) which occurs in all document. This is called as TF-IDF i.e Term Frequency times inverse document frequency.

We can achieve both using below line of code:

from sklearn.feature_extraction.text import TfidfTransformer
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)

The last line will output the dimension of the Document-Term matrix -> (11314, 130107).

Step 4. Running ML algorithms.

There are various algorithms which can be used for text classification. We will start with the most simplest one ‘Naive Bayes (NB)’ (don’t think it is too Naive! 😃)

You can easily build a NBclassifier in scikit using below 2 lines of code: (note - there are many variants of NB, but discussion about them is out of scope)

from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(X_train_tfidf, twenty_train.target)

This will train the NB classifier on the training data we provided.

Building a pipeline: We can write less code and do all of the above, by building a pipeline as follows:

>>> from sklearn.pipeline import Pipeline
>>> text_clf = Pipeline([('vect', CountVectorizer()),
... ('tfidf', TfidfTransformer()),
... ('clf', MultinomialNB()),
... ])
text_clf = text_clf.fit(twenty_train.data, twenty_train.target)

The names ‘vect’ , ‘tfidf’ and ‘clf’ are arbitrary but will be used later.

Performance of NB Classifier: Now we will test the performance of the NB classifier on test set.

import numpy as np
twenty_test = fetch_20newsgroups(subset='test', shuffle=True)
predicted = text_clf.predict(twenty_test.data)
np.mean(predicted == twenty_test.target)

The accuracy we get is ~77.38%, which is not bad for start and for a naive classifier. Also, congrats!!! you have now written successfully a text classification algorithm 👍

Support Vector Machines (SVM): Let’s try using a different algorithm SVM, and see if we can get any better performance. More about it here.

>>> from sklearn.linear_model import SGDClassifier>>> text_clf_svm = Pipeline([('vect', CountVectorizer()),
... ('tfidf', TfidfTransformer()),
... ('clf-svm', SGDClassifier(loss='hinge', penalty='l2',
... alpha=1e-3, n_iter=5, random_state=42)),
... ])
>>> _ = text_clf_svm.fit(twenty_train.data, twenty_train.target)>>> predicted_svm = text_clf_svm.predict(twenty_test.data)
>>> np.mean(predicted_svm == twenty_test.target)

The accuracy we get is~82.38%. Yipee, a little better 👌

Step 5. Grid Search

Almost all the classifiers will have various parameters which can be tuned to obtain optimal performance. Scikit gives an extremely useful tool ‘GridSearchCV’.

>>> from sklearn.model_selection import GridSearchCV
>>> parameters = {'vect__ngram_range': [(1, 1), (1, 2)],
... 'tfidf__use_idf': (True, False),
... 'clf__alpha': (1e-2, 1e-3),
... }

Here, we are creating a list of parameters for which we would like to do performance tuning. All the parameters name start with the classifier name (remember the arbitrary name we gave). E.g. vect__ngram_range; here we are telling to use unigram and bigrams and choose the one which is optimal.

Next, we create an instance of the grid search by passing the classifier, parameters and n_jobs=-1 which tells to use multiple cores from user machine.

gs_clf = GridSearchCV(text_clf, parameters, n_jobs=-1)
gs_clf = gs_clf.fit(twenty_train.data, twenty_train.target)

This might take few minutes to run depending on the machine configuration.

Lastly, to see the best mean score and the params, run the following code:


The accuracy has now increased to ~90.6% for the NB classifier (not so naive anymore! 😄) and the corresponding parameters are {‘clf__alpha’: 0.01, ‘tfidf__use_idf’: True, ‘vect__ngram_range’: (1, 2)}.

Similarly, we get improved accuracy ~89.79% for SVM classifier with below code. Note: You can further optimize the SVM classifier by tuning other parameters. This is left up to you to explore more.

>>> from sklearn.model_selection import GridSearchCV
>>> parameters_svm = {'vect__ngram_range': [(1, 1), (1, 2)],
... 'tfidf__use_idf': (True, False),
... 'clf-svm__alpha': (1e-2, 1e-3),
... }
gs_clf_svm = GridSearchCV(text_clf_svm, parameters_svm, n_jobs=-1)
gs_clf_svm = gs_clf_svm.fit(twenty_train.data, twenty_train.target)

Step 6: Useful tips and a touch of NLTK.

  1. Removing stop words: (the, then etc) from the data. You should do this only when stop words are not useful for the underlying problem. In most of the text classification problems, this is indeed not useful. Let’s see if removing stop words increases the accuracy. Update the code for creating object of CountVectorizer as follows:
>>> from sklearn.pipeline import Pipeline
>>> text_clf = Pipeline([('vect', CountVectorizer(stop_words='english')),
... ('tfidf', TfidfTransformer()),
... ('clf', MultinomialNB()),
... ])

This is the pipeline we build for NB classifier. Run the remaining steps like before. This improves the accuracy from 77.38% to 81.69% (that is too good). You can try the same for SVM and also while doing grid search.

2. FitPrior=False: When set to false for MultinomialNB, a uniform prior will be used. This doesn’t helps that much, but increases the accuracy from 81.69% to 82.14% (not much gain). Try and see if this works for your data set.

3. Stemming: From Wikipedia, stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form. E.g. A stemming algorithm reduces the words “fishing”, “fished”, and “fisher” to the root word, “fish”.

We need NLTK which can be installed from here. NLTK comes with various stemmers (details on how stemmers work are out of scope for this article) which can help reducing the words to their root form. Again use this, if it make sense for your problem.

Below I have used Snowball stemmer which works very well for English language.

import nltk
from nltk.stem.snowball import SnowballStemmer
stemmer = SnowballStemmer("english", ignore_stopwords=True)
class StemmedCountVectorizer(CountVectorizer):
def build_analyzer(self):
analyzer = super(StemmedCountVectorizer, self).build_analyzer()
return lambda doc: ([stemmer.stem(w) for w in analyzer(doc)])
stemmed_count_vect = StemmedCountVectorizer(stop_words='english')text_mnb_stemmed = Pipeline([('vect', stemmed_count_vect),
... ('tfidf', TfidfTransformer()),
... ('mnb', MultinomialNB(fit_prior=False)),
... ])
text_mnb_stemmed = text_mnb_stemmed.fit(twenty_train.data, twenty_train.target)predicted_mnb_stemmed = text_mnb_stemmed.predict(twenty_test.data)np.mean(predicted_mnb_stemmed == twenty_test.target)

The accuracy with stemming we get is ~81.67%. Marginal improvement in our case with NB classifier. You can also try out with SVM and other algorithms.

Conclusion: We have learned the classic problem in NLP, text classification. We learned about important concepts like bag of words, TF-IDF and 2 important algorithms NB and SVM. We saw that for our data set, both the algorithms were almost equally matched when optimized. Sometimes, if we have enough data set, choice of algorithm can make hardly any difference. We also saw, how to perform grid search for performance tuning and used NLTK stemming approach. You can use this code on your data set and see which algorithms works best for you.

Update: If anyone tries a different algorithm, please share the results in the comment section, it will be useful for everyone.

Please let me know if there were any mistakes and feedback is welcome ✌️

Recommend, comment, share if you liked this article.


http://scikit-learn.org/ (code)

http://qwone.com/~jason/20Newsgroups/ (data set)

Machine Learning, NLP: Text Classification using scikit-learn, python and NLTK


This is the second progress video about the port of openFrameworks to the Jetson.

The demonstration sketch shows that most of the OpenGL issues have been sorted out. Also, this is the first demo that includes a GLSL shader that is rendering the background. So some good progress is being made.

Two different openFrameworks add-ons are being demonstrated:

ofxTimeline and ofxUI .

ofxTimeline is being used to control the virtual camera movement. The timeline runs a 1-minute loop that controls the pan, zoom, and orbit of the camera. ofxUI provides the GUI element at the bottom left-hand corner which notates the camera position in a dynamic manner.

The depth information from the Kinect is being rendered in what is referred to as a point cloud. Basically, a 3D point mesh is constructed for each frame that is being displayed, and the color of each point is calculated from the color camera (RGB), then displayed. There is no CUDAof the process at this point, it’s just done on one of the ARM cores.

While there are still a few bugs hidden in the openFrameworks port, for the most part, everything is running smoothly.

Thanks & Cheers

For more stackexchanges-beta

Jetson TK1 Kinect Point Cloud in openFrameworks

 The Intel RealSense T265 Tracking Camera solves a fundamental problem in interfacing with the real world by helpfully answering “Where am I?” Looky here:


One of the most important tasks in interfacing with the real world from a computer is to calculate your position in relationship to a map of the surrounding environment. When you do this dynamically, this is known as Simultaneous Localization And Mapping, or SLAM.

If you’ve been around the mobile robotics world at all (rovers, drones, cars), you probably have heard of this term. There are other applications too, such as Augmented Reality (AR) where a computing system must place the user precisely in the surrounding environment. Suffice it to say, it’s a foundational problem.

SLAM is a computational problem. How does a device construct or update a map of an unknown environment while simultaneously keeping track of its own location within that environment? People do this naturally in small places such as a house. At a larger scale, people have been clever enough to use visual navigational aids, such as the stars, to help build their maps.

This  V-SLAM solution does something very similar. Two fisheye cameras combine with the information from an  Inertial  Measurement  Unit (IMU) to navigate using visual features to track its way around even unknown environments with accuracy. 

Let’s just say that this is a non-trivial problem. If you have tried to implement this yourself, you know that it can be expensive and time consuming. The Intel RealSense T265 Tracking Camera provides precise and robust tracking that has been extensively tested in a variety of conditions and environments.

The T265 is a self-contained tracking system that plugs into a USB port. Install the librealsense SDK, and you can start streaming pose data right away.

Tech Stuffs

Here’s some tech specs:


  • OV9282
  • Global Shutter, Fisheye Field of View = 163 degrees
  • Fixed Focus, Infrared Cut Filter
  • 848 x 800 resolution
  • 30 frames per second

Inertial Measurement Unit (IMU)

  • 6 Degrees of Freedom (6 DoF)
  • Accelerometer 
  • Gyroscope

Visual Processing Unit (VPU)

  • Movidius MA215x ASIC (Application Specific Integrated Circuit)

The Power Requirement is 300 mA at 5V (!!!). The package is 108mm Wide x 24.5mm High x 12.50mm Deep. The camera weighs 60 grams.


To interface with the camera,  Intel provides the open source library librealsense. On the JetsonHacksNano account on Github, there is a repository named installLibrealsense. The repository contains convenience scripts to install librealsense.

Note: Starting with L4T 32.2.1/JetPack 4.2.2 a swap file is now part of the default install. You do not need to create a swap file if you are using this release or later. Skip the following step if using 32.2.1 or above.

In order to use the install script, you will either need to create a swapfile to ease an out of memory issue, or modify the install script to run less jobs during the make process. In the video, we chose the swapfile route. To install the swapfile:

$ git clone https://github.com/jetsonhacksnano/installSwapfile
$ cd installSwapfile
$ ./installSwapfile.sh
$ cd ..

You’re now ready to install librealsense.

$ git clone https://github.com/jetsonhacksnano/installLibrealsense
$ cd installLibrealsense
$ ./installLibrealsense.sh

While the installLibrealsense.sh script has the option to compile the librealsense with CUDA support, we do not select that option. If you are using the T265 alone, there is no advantage in using CUDA, as the librealsense CUDA routines only convert images from the RealSense Depth cameras (D415, D435 and so on).

The location of librealsense SDK products:

  • The library is installed in /usr/local/lib
  • The header files are in /usr/local/include
  • The demos and tools are located in /usr/local/bin

Go to the demos and tools directory, and checkout the realsense-viewer application and all of the different demonstrations!


The Intel RealSense T265 is a powerful tool for use in robotics and augmented/virtual reality. Well worth checking out!


  • Tested on Jetson Nano L4T 32.1.0
  • If you have a mobile robot, you can send wheel odometry to the RealSense T265 through the librealsense SDK for better accuracy. The details are still being worked out.

Thanks, Cheers.

Jetson Nano – RealSense Tracking Camera

 Many people use Intel RealSense cameras with robots. Here we install the real sense-ros wrapper on the NVIDIA Jetson Nano developer kit. Looky here:


There are several members in the Intel RealSense camera family. This includes the Depth Cameras (D415, D435, D435i) and Tracking Camera (T265). There are also more recent introductions which are just becoming available.

The cameras all share the same Intel® RealSense™ SDK which is known as librealsense2. The SDK is open source and available on Github. We have articles for installing librealsense (D400x article and T265 article) here on the JetsonHacks site.

The size and weight of the cameras make them very good candidates for robotic applications. Computing hardware onboard the cameras provide depth and tracking information directly, which makes it a very attractive addition to a Jetson Nano. Plus the cameras have low power consumption. Because ROS is the most popular middleware application for robotics, here’s how you install realsense-ros on the Jetson Nano.

Install RealSense Wrapper for ROS

There are two prerequisites for installing realsense-ros on the Jetson Nano. The first is to install librealsense as linked above. The second prerequisite is a ROS installation. Checkout Install ROS on Jetson Nano for a how-to on installing ROS Melodic on the Nano.

With the two prerequisites out of the way, it’s time to install realsense-ros. There are convenience scripts to install the RealSense ROS Wrapper on the Github JetsonHacksNano account.

$ git clone https://github.com/JetsonHacksNano/installRealSenseROS
$ cd installRealSenseROS
$ ./installRealSenseROS <catkin workplace name>

Where catkin workplace name is the path to the catkin_workspace to place the RealSense ROS package. If no catkin workplace name is specified, the script defaults to ~/catkin_ws.

Note: Version are in the releases section. The master branch of the repository will usually match the most recent release release of L4T, but you may have to look through the releases for a suitable version. To checkout one of the releases, switch to the installRealSenseROS directory and then:

$ git checkout <version number>


$ git checkout vL4T32.2.1

The ROS launch file for the camera(s) will be in the src directory of the Catkin workspace realsense-ros/realsense2_camera/launch There are a variety of launch files to choose from. For example:

$ roslaunch realsense2_camera rs_camera.launch

You will need to make sure that your Catkin workspace is correctly sourced, and roscore is running.


There are dependencies between the versions of librealsense and realsense-ros. The install scripts are also dependent on the version of L4T. Check the releases on the Github accounts to match.

In the video:

  • Jetson Nano
  • L4T 32.2.1 / JetPack 4.2.2
  • librealsense 2.25.0
  • realsense-ros 2.28.0

realsense-ros does not “officially” support ROS Melodic. However, we haven’t encountered any issues as of the time of this writing.

Thanks Cheers

RealSense ROS Wrapper – Jetson Nano