diff --git a/README.md b/README.md index 4202349..d4034b8 100644 --- a/README.md +++ b/README.md @@ -1 +1,125 @@ -# pagenet \ No newline at end of file +# PageNet + +PageNet is a Deep Learning system that takes in an image with a document in it and returns a quadrilateral representing the main page region. We trained PageNet using the library [Caffe](caffe.berkeleyvision.org). For details, see our [paper](https://arxiv.org/abs/1709.01618). + +## Usage + +There are three scripts in this repo. One for training networks, one for predictions using pre-trained networks, and one for rendered quadrilateral regions. + +### Testing Pretrained Models + +We have provided two pretrained models from our paper. One model is trained on the CBAD dataset and the other is trained on a private collection of Ohio Death Records provided by [Family Search](https://www.familysearch.org/). + +`test_pretrained.py` has the following usage + +``` +usage: test_pretrained.py [-h] [--out-dir OUT_DIR] [--gpu GPU] + [--print-count PRINT_COUNT] + image_dir manifest model out_file + +Outputs binary predictions + +positional arguments: + image_dir The directory where images are stored + manifest txt file listing images relative to image_dir + model [cbad|ohio] + out_file Output file + +optional arguments: + -h, --help show this help message and exit + --out-dir OUT_DIR + --gpu GPU GPU to use for running the network + --print-count PRINT_COUNT + Print interval +``` +`image_dir` is the directory containing images to predict. The file paths listed in `manifest` are relative to `image_dir` and are listed one per line. `model` should be either `cbad` or `ohio` to select which trained model to use. `out_file` will list the coordinates of the quadrilaterals predicted by PageNet for each of the input images. + +`--gpu` is for passing the device ID of the GPU to use. If it is negative, CPU mode is used. Specifying `--out-dir` will allow you to dump both the raw and post processed predictions as images. + + +### Training + +`train.py` has the following usage + +``` +usage: train.py [-h] [--gpu GPU] [-m MEAN] [-s SCALE] [-b BATCH_SIZE] [-c] + [--image-size IMAGE_SIZE] [--gt-interval GT_INTERVAL] + [--min-interval MIN_INTERVAL] [--debug-dir DEBUG_DIR] + [--print-count PRINT_COUNT] + solver_file dataset_dir train_manifest val_manifest + +Outputs binary predictions + +positional arguments: + solver_file The solver.prototxt + dataset_dir The dataset to be evaluated + train_manifest txt file listing images to train on + val_manifest txt file listing images for validation + +optional arguments: + -h, --help show this help message and exit + --gpu GPU GPU to use for running the network + -m MEAN, --mean MEAN Mean value for data preprocessing + -s SCALE, --scale SCALE + Optional pixel scale factor + -b BATCH_SIZE, --batch-size BATCH_SIZE + Training batch size + -c, --color Training batch size + --image-size IMAGE_SIZE + Size of images for input to training/prediction + --gt-interval GT_INTERVAL + Interval for Debug + --min-interval MIN_INTERVAL + Miniumum iteration for Debug + --debug-dir DEBUG_DIR + Dump images for debugging + --print-count PRINT_COUNT + How often to print progress +``` +`solver_file` points to a caffe solver.prototxt file. Such a file is included in the repo. The training script expects that the network used for training to begin and end like the included `train_val.prototxt` file, but the middle layers can be changed. +`dataset_dir` is the directory containing the training and validation images. The file paths listed in `train_manifest` and `val_manifest` are relative to `dataset_dir` and are listed one per line. + +`--gpu` is for passing the device ID of the GPU to use. If it is negative, CPU mode is used. + +The optional arguments have reasonable defaults. If you're curious about their exact meaning, I suggest you look at the code. + +### Rendering Masks + +The usage for `render_quads.py` is +``` +python render_quads.py manifest dataset_dir out_dir +``` + +`manifest` lists the image file path and quadrilateral coordinates. It should be the `out_file` of `test_pretrained.py`. The filepaths in `manifest` are relative to `dataset_dir`. `out_dir` is an output directory where quadrilateral region images are written + + +## Dependencies + +The python scripts depend on OpenCV 3.2, Matplotlib, Numpy, and Caffe. + +## Docker + +For those who don't want to install the dependencies, I have created a docker image to run this code. You must have the [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) plugin installed to use it though you can still run our models on CPU (not recommended). + +The usage for the docker container is + +``` +nvidia-docker run -v $HOST_WORK_DIRECTORY:/data tensmeyerc/icdar2017:pagenet python $SCRIPT $ARGS +``` + +`$HOST_WORK_DIRECTORY` is a directory on your machine that is mounted on `/data` inside of the docker container (using -v). It's the only way to expose files to the docker container. +`$SCRIPT` is one of the scripts described above. `$ARGS` are the normal arguments you pass to the python script. Note that any file paths passed as arguments must begin with `/data` to be visible to the docker container. +There is no need to download the container ahead of time. If you have docker and nvidia-docker installed, running the above commands will pull the docker image (~2GB) if it has not been previously pulled. + +## Citation + +If you find this code useful to your research, please cite our paper: + +``` +@article{tensmeyer2017_pagenet, + title={PageNet: Page Boundary Extraction in Historical Handwritten Documents}, + author={Tensmeyer, Chris and Davis, Brian and Wigington, Curtis and Lee, Iain and Barrett, Bill}, + journal={arXiv preprint arXiv:1709.01618}, + year={2017}, +} +```