| @@ -1 +1,125 @@ | |||
| # pagenet | |||
| # PageNet | |||
| PageNet is a Deep Learning system that takes in an image with a document in it and returns a quadrilateral representing the main page region. We trained PageNet using the library [Caffe](caffe.berkeleyvision.org). For details, see our [paper](https://arxiv.org/abs/1709.01618). | |||
| ## Usage | |||
| There are three scripts in this repo. One for training networks, one for predictions using pre-trained networks, and one for rendered quadrilateral regions. | |||
| ### Testing Pretrained Models | |||
| We have provided two pretrained models from our paper. One model is trained on the CBAD dataset and the other is trained on a private collection of Ohio Death Records provided by [Family Search](https://www.familysearch.org/). | |||
| `test_pretrained.py` has the following usage | |||
| ``` | |||
| usage: test_pretrained.py [-h] [--out-dir OUT_DIR] [--gpu GPU] | |||
| [--print-count PRINT_COUNT] | |||
| image_dir manifest model out_file | |||
| Outputs binary predictions | |||
| positional arguments: | |||
| image_dir The directory where images are stored | |||
| manifest txt file listing images relative to image_dir | |||
| model [cbad|ohio] | |||
| out_file Output file | |||
| optional arguments: | |||
| -h, --help show this help message and exit | |||
| --out-dir OUT_DIR | |||
| --gpu GPU GPU to use for running the network | |||
| --print-count PRINT_COUNT | |||
| Print interval | |||
| ``` | |||
| `image_dir` is the directory containing images to predict. The file paths listed in `manifest` are relative to `image_dir` and are listed one per line. `model` should be either `cbad` or `ohio` to select which trained model to use. `out_file` will list the coordinates of the quadrilaterals predicted by PageNet for each of the input images. | |||
| `--gpu` is for passing the device ID of the GPU to use. If it is negative, CPU mode is used. Specifying `--out-dir` will allow you to dump both the raw and post processed predictions as images. | |||
| ### Training | |||
| `train.py` has the following usage | |||
| ``` | |||
| usage: train.py [-h] [--gpu GPU] [-m MEAN] [-s SCALE] [-b BATCH_SIZE] [-c] | |||
| [--image-size IMAGE_SIZE] [--gt-interval GT_INTERVAL] | |||
| [--min-interval MIN_INTERVAL] [--debug-dir DEBUG_DIR] | |||
| [--print-count PRINT_COUNT] | |||
| solver_file dataset_dir train_manifest val_manifest | |||
| Outputs binary predictions | |||
| positional arguments: | |||
| solver_file The solver.prototxt | |||
| dataset_dir The dataset to be evaluated | |||
| train_manifest txt file listing images to train on | |||
| val_manifest txt file listing images for validation | |||
| optional arguments: | |||
| -h, --help show this help message and exit | |||
| --gpu GPU GPU to use for running the network | |||
| -m MEAN, --mean MEAN Mean value for data preprocessing | |||
| -s SCALE, --scale SCALE | |||
| Optional pixel scale factor | |||
| -b BATCH_SIZE, --batch-size BATCH_SIZE | |||
| Training batch size | |||
| -c, --color Training batch size | |||
| --image-size IMAGE_SIZE | |||
| Size of images for input to training/prediction | |||
| --gt-interval GT_INTERVAL | |||
| Interval for Debug | |||
| --min-interval MIN_INTERVAL | |||
| Miniumum iteration for Debug | |||
| --debug-dir DEBUG_DIR | |||
| Dump images for debugging | |||
| --print-count PRINT_COUNT | |||
| How often to print progress | |||
| ``` | |||
| `solver_file` points to a caffe solver.prototxt file. Such a file is included in the repo. The training script expects that the network used for training to begin and end like the included `train_val.prototxt` file, but the middle layers can be changed. | |||
| `dataset_dir` is the directory containing the training and validation images. The file paths listed in `train_manifest` and `val_manifest` are relative to `dataset_dir` and are listed one per line. | |||
| `--gpu` is for passing the device ID of the GPU to use. If it is negative, CPU mode is used. | |||
| The optional arguments have reasonable defaults. If you're curious about their exact meaning, I suggest you look at the code. | |||
| ### Rendering Masks | |||
| The usage for `render_quads.py` is | |||
| ``` | |||
| python render_quads.py manifest dataset_dir out_dir | |||
| ``` | |||
| `manifest` lists the image file path and quadrilateral coordinates. It should be the `out_file` of `test_pretrained.py`. The filepaths in `manifest` are relative to `dataset_dir`. `out_dir` is an output directory where quadrilateral region images are written | |||
| ## Dependencies | |||
| The python scripts depend on OpenCV 3.2, Matplotlib, Numpy, and Caffe. | |||
| ## Docker | |||
| For those who don't want to install the dependencies, I have created a docker image to run this code. You must have the [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) plugin installed to use it though you can still run our models on CPU (not recommended). | |||
| The usage for the docker container is | |||
| ``` | |||
| nvidia-docker run -v $HOST_WORK_DIRECTORY:/data tensmeyerc/icdar2017:pagenet python $SCRIPT $ARGS | |||
| ``` | |||
| `$HOST_WORK_DIRECTORY` is a directory on your machine that is mounted on `/data` inside of the docker container (using -v). It's the only way to expose files to the docker container. | |||
| `$SCRIPT` is one of the scripts described above. `$ARGS` are the normal arguments you pass to the python script. Note that any file paths passed as arguments must begin with `/data` to be visible to the docker container. | |||
| There is no need to download the container ahead of time. If you have docker and nvidia-docker installed, running the above commands will pull the docker image (~2GB) if it has not been previously pulled. | |||
| ## Citation | |||
| If you find this code useful to your research, please cite our paper: | |||
| ``` | |||
| @article{tensmeyer2017_pagenet, | |||
| title={PageNet: Page Boundary Extraction in Historical Handwritten Documents}, | |||
| author={Tensmeyer, Chris and Davis, Brian and Wigington, Curtis and Lee, Iain and Barrett, Bill}, | |||
| journal={arXiv preprint arXiv:1709.01618}, | |||
| year={2017}, | |||
| } | |||
| ``` | |||