Extending Keras ImageDataGenerator to handle multilable classification tasks

I stumbled up on this problem recently, working on one of the kaggle competitions which featured a multi label and very unbalanced satellite image dataset.

Let’s talk a moment about a neat Keras feature which is keras.preprocessing.image.ImageDataGenerator as you can see from the documentation its main purpose is to augment and generate new images from your dataset. This is a common tactic to fight small datasets and overfitting.
By default ImageDataGenerator expects our data to be structured in a very specific way, this is each class should have its own directory and every image inside this directory belongs to the class specified by the name of this directory.
We can realize that this is very limiting and usage of this API directly will not work for Multi-label problems.

Continue reading →

Posted by jakub.cieslik, 0 comments

Using boto3 and Keras to checkpoint deep learning models on AWS S3


For starters, let me explain why I’m writing this post, although the boto3 library is extremely powerfull. It’s also one of those packages I can’t wrap my head around, I keep googling for solutions how to do simple stuff ALL THE TIME.

I think one of the reasons for that is in boto3 there are multiple ways to perform basically the same task. This makes it relatively hard to use.

But enough lingering, Let’s write a simple wrapper around boto3 to make common S3 operations easier and learn to use it more efficiently.

To actually apply it in a real-world scenario
we will use the wrapper to create a custom keras.callback whose task is to upload model checkpoints to s3, every time the model improves.

Continue reading →

Posted by jakub.cieslik, 0 comments