Developing a deep learning edge detector to solve a “toy” problem

TLDR

In an attempt to solve a children-shape-puzzle-game a deep learning based edge detector was developed. It’s based on the U-Net image segmentation architecture and trained on the BSDS 500 dataset.

fig1. children shape puzzle problem we try to solve

I just want the code:
github.com/i008/deepedge

I just want to try it on my own images:

Go to the edge_notebook.ipynb

I just want to see how it works:
check results here

Preface

As someone obsessed with computer vision I have this weird tendency to look at the world around me and have thoughts like: “How would a computer takle this problem” or “How can I machine-solve it”

This time was no different, on one of my visits I saw this shape puzzle wich is probably targeted for children between age 0 and 3 (check fig.1). My first thought was, well a 1-year-old can solve it, I’m sure if I harness some basic computer vision techniques I can do it too. By that time I already had a basic idea and pipeline in my head on how I will approach the problem.
It was something along this lines:

  • Blur image
  • Detect edges
  • Detect contours
  • Filter contours
  • Calculate shape features with some rotation-invariance (for example Hu-Moments)
  • Bruteforce match contour features
  • Draw solution on the image

Well, simple enough right? sounds like a good OpenCV refresher project. As you can imagine I heavily overestimated my ability to solve it.

First attempt (classic computer vision approach)

I started to follow my initial plan, I blurred the image with a (3,3) gaussian kernel and applied a
canny edge detector. Auto canny sets the upper and lower thresholds automatically based on the median pixel intensities.

fig2. Canny detected edges

Looking at fig2. I realized I’m in trouble, getting the edges right is not going to be easy, and it becomes clear why:

  • The image is lit in a tricky way
  • The distances between contours we want to detect are quite small this makes using morphological operations very hard
  • If that’s not enough then the “wooden” textures everywhere don’t make it easier either.

At this point there are 2 options:

  • Handcraft a very specific preprocessing pipeline that might give us a better (edge) segmentation
  • Look for other options, and by other, i obviously mean deep learning.

Since hard coding is not exactly what I consider fun, I chose the second option.

Deep Edge Detection

After the initial (failed) attempt I knew that what I need is a “better” edge detector.To be more clear about what better means, let’s say it should be more sensitive and less noisy.
Deep Learning solutions have many amazing properties, but one of the most interesting ones is that they allow so-called transfer learning. This means architectures and models trained to solve one task can be reused to solve a different one.

But there is more to it, convolutional neural networks tend to learn more complex and high-level representations of images. Give this and the fact that we will train the detector on a human labeled dataset this gives us hope to achieve a less noisy and more human like edge detection.

To develop a deep learning based edge/contour detector we need two things:

  • Labeled dataset with original images and their respective contour-masks
  • A reasonable neural network architecture

BSDS 500 contour dataset

details and raw data
loader tool

The BSDS500 dataset is one of the few easily accessible dataset with hand-labeled contours. It consists of 500 images, each image was hand labeled by a number of annotators (usually around 5)
500 or around 2000 samples (if we consider each annotator separately) is a relatively small dataset size, especially for deep learning. But it should be enough to get started.
There are few things worth mentioning, I decided to go for with the full dataset and treat each annotator as an additional sample, but there are other options (maybe better ones). For example picking only the least granular annotation (for instance by sorting by foreground-to-background ratios)

fig 3a. Original images from bsds500

fig 3b. Contour/Edge masks human annotated

Network Architecture for Edge Detection

I decided to base my detector on a recently developed architecture called U-Net. For details check out:
original U-Net arxiv paper
This method was applied with great success to different segmentation problems (not only biomedical ones) for example:
deepsense.io satelite image segmentation

conv architecture

fig4. U-Net convolutional architecture for image segmentation

Lets get technical

We will use Keras to implement the U-Net architecture, but before that let’s talk a moment about framing the problem, and specifically about the objective function. We will start with the simplest option wich is to use the binary_crossentropy. This means that our network will output a probability map of edge “pixel-candidates”. In most systems that need some kind of edge detection what you really want is binary image and not a probability map – this leads to some issues, and specifically you will have another hyperparameter in your system, wich is the threshold value for the maps. But more on that topic later on.

For an input image with the size=(256, 256, 3) this architecture has almost 8 million trainable parameters. Thats a lot, a GPU is probably a must have to train it.

Training the Detector

Lets put things together and train the U-net based edge detector:

As you can see I concatenated the train and test set. This might seem strange and not something you do without a very good reason. Fortunately, there are a few good reasons:

  • We are not doing this to compete in benchmarks
  • Data is small – every sample counts!
  • A very generous 50/50 train-test split originally made by the authors might be unnecessary in our case
  • We still have the validation set.

Let’s have a quick look at fig4. to see pretty common curves, we can notice that the network still improves, and it’s probably few dozens of epochs away from converging. Since that’s a long running task we will stop here and see what we actually learned.

loss, learning curve

fig5. learning curves

Results

Let’s have a quick peak at figures below. The detector seems to work quite fine, at least it performs the task it was supposed to learn. In comparison to edges detected with gaussian-canny, we certainly have less noise and the segmentation will probably perform better in most applications. But judge it yourself.

fig6. deep edge vs canny

edge detection

fig7. coins deep edge vs canny

Find more examples:
here

That’s all cool, but what about the toy, can we solve it now?

This might sound pretty disappointing but the answer is – not really. As we can see on fig8. we got a decent segmentation using deep edges but we are still far away from beeing able to solve it out of the box without a complex post-edge processing pipeline

deep edge detection

fig8. deep edge detection on toy

Conclusion

We didn’t solve our initial task but let’s be honest in toy projects like that (pun intended) it’s rarely about the destination and more about the journey. All in all we developed something quite new and something that might come handy in various applications.

It’s important to note that in many places we chose the simplest route and not necessarily the best one, to end this post I will write down few things that could be done differently and/or better:

  • Figure out a smarter objective function, for example, one that rewards continuity of edges.
  • Preprocess the dataset, for example, mean center, smooth, etc.
  • Be smarter in choosing wich edge maps to use, for example by pixel voting (so we exclude edges that were not selected let’s say by 2/5 of the annotators)
  • Our detector is limited to a fixed size image, it would be great to find a way to overcome this.

Posted by jakub.cieslik

Leave a Reply