Extending Keras ImageDataGenerator to handle multilable classification tasks

I stumbled up on this problem recently, working on one of the kaggle competitions which featured a multi label and very unbalanced satellite image dataset.

Let’s talk a moment about a neat Keras feature which is keras.preprocessing.image.ImageDataGenerator as you can see from the documentation its main purpose is to augment and generate new images from your dataset. This is a common tactic to fight small datasets and overfitting.
By default ImageDataGenerator expects our data to be structured in a very specific way, this is each class should have its own directory and every image inside this directory belongs to the class specified by the name of this directory.
We can realize that this is very limiting and usage of this API directly will not work for Multi-label problems.

Continue reading →

Posted by jakub.cieslik, 0 comments

Indexing images using h5py for machine learning purposes

Dealing with image datasets can get a little bit tricky, considering their size a dataset too big to fit into memory is a common view. One way to make dealing with them more pleasant is to index them in an HDF5 file wich gives us a number of advantages compared to dealing with each file one-by-one. To name a few:

  • Reading from HDF5 is extremely fast
  • We can treat them similar as we would treat a numpy nd-array.
  • They are stored entirely on a hard drive wich means you are not restricted by system memory.
  • Sharing, uploading, moving is easier since you can have a full dataset in just one file
    Continue reading →
Posted by jakub.cieslik, 0 comments

Using boto3 and Keras to checkpoint deep learning models on AWS S3


For starters, let me explain why I’m writing this post, although the boto3 library is extremely powerfull. It’s also one of those packages I can’t wrap my head around, I keep googling for solutions how to do simple stuff ALL THE TIME.

I think one of the reasons for that is in boto3 there are multiple ways to perform basically the same task. This makes it relatively hard to use.

But enough lingering, Let’s write a simple wrapper around boto3 to make common S3 operations easier and learn to use it more efficiently.

To actually apply it in a real-world scenario
we will use the wrapper to create a custom keras.callback whose task is to upload model checkpoints to s3, every time the model improves.

Continue reading →

Posted by jakub.cieslik, 0 comments

Developing a deep learning edge detector to solve a “toy” problem


In an attempt to solve a children-shape-puzzle-game a deep learning based edge detector was developed. It’s based on the U-Net image segmentation architecture and trained on the BSDS 500 dataset.

fig1. children shape puzzle problem we try to solve

I just want the code:

I just want to try it on my own images:

Go to the edge_notebook.ipynb

I just want to see how it works:
check results here

Continue reading →

Posted by jakub.cieslik, 0 comments
Load more