| @@ -1 +1,98 @@ | |||||
| # pagenet | |||||
| # PageNet | |||||
| PageNet is a Deep Learning system that takes in an image with a document in it and returns a quadrilateral representing the main page region. We trained PageNet using the library [Caffe](caffe.berkeleyvision.org). For details, see our [paper](https://arxiv.org/abs/1709.01618). | |||||
| ## Usage | |||||
| There are three scripts in this repo. One for training networks, one for predictions using pre-trained networks, and one for rendered quadrilateral regions. | |||||
| ### Testing Pretrained Models | |||||
| We have provided two pretrained models from our paper. One model is trained on the CBAD dataset and the other is trained on a private collection of Ohio Death Records provided by [Family Search](https://www.familysearch.org/). | |||||
| `test_pretrained.py` has the following usage | |||||
| ``` | |||||
| usage: test_pretrained.py [-h] [--out-dir OUT_DIR] [--gpu GPU] | |||||
| [--print-count PRINT_COUNT] | |||||
| image_dir manifest model out_file | |||||
| Outputs binary predictions | |||||
| positional arguments: | |||||
| image_dir The directory where images are stored | |||||
| manifest txt file listing images relative to image_dir | |||||
| model [cbad|ohio] | |||||
| out_file Output file | |||||
| optional arguments: | |||||
| -h, --help show this help message and exit | |||||
| --out-dir OUT_DIR | |||||
| --gpu GPU GPU to use for running the network | |||||
| --print-count PRINT_COUNT | |||||
| Print interval | |||||
| ``` | |||||
| `image_dir` is the directory containing images to predict. The file paths listed in `manifest` are relative to `image_dir` and are listed one per line. `model` should be either `cbad` or `ohio` to select which trained model to use. `out_file` will list the coordinates of the quadrilaterals predicted by PageNet for each of the input images. | |||||
| `--gpu` is for passing the device ID of the GPU to use. If it is negative, CPU mode is used. Specifying `--out-dir` will allow you to dump both the raw and post processed predictions as images. | |||||
| ### Training | |||||
| `train.py` has the following usage | |||||
| ``` | |||||
| usage: train.py [-h] [--gpu GPU] [-m MEAN] [-s SCALE] [-b BATCH_SIZE] [-c] | |||||
| [--image-size IMAGE_SIZE] [--gt-interval GT_INTERVAL] | |||||
| [--min-interval MIN_INTERVAL] [--debug-dir DEBUG_DIR] | |||||
| [--print-count PRINT_COUNT] | |||||
| solver_file dataset_dir train_manifest val_manifest | |||||
| Outputs binary predictions | |||||
| positional arguments: | |||||
| solver_file The solver.prototxt | |||||
| dataset_dir The dataset to be evaluated | |||||
| train_manifest txt file listing images to train on | |||||
| val_manifest txt file listing images for validation | |||||
| optional arguments: | |||||
| -h, --help show this help message and exit | |||||
| --gpu GPU GPU to use for running the network | |||||
| -m MEAN, --mean MEAN Mean value for data preprocessing | |||||
| -s SCALE, --scale SCALE | |||||
| Optional pixel scale factor | |||||
| -b BATCH_SIZE, --batch-size BATCH_SIZE | |||||
| Training batch size | |||||
| -c, --color Training batch size | |||||
| --image-size IMAGE_SIZE | |||||
| Size of images for input to training/prediction | |||||
| --gt-interval GT_INTERVAL | |||||
| Interval for Debug | |||||
| --min-interval MIN_INTERVAL | |||||
| Miniumum iteration for Debug | |||||
| --debug-dir DEBUG_DIR | |||||
| Dump images for debugging | |||||
| --print-count PRINT_COUNT | |||||
| How often to print progress | |||||
| ``` | |||||
| `solver_file` points to a caffe solver.prototxt file. Such a file is included in the repo. The training script expects that the network used for training to begin and end like the included `train_val.prototxt` file, but the middle layers can be changed. | |||||
| `dataset_dir` is the directory containing the training and validation images. The file paths listed in `train_manifest` and `val_manifest` are relative to `dataset_dir` and are listed one per line. | |||||
| `--gpu` is for passing the device ID of the GPU to use. If it is negative, CPU mode is used. | |||||
| The optional arguments have reasonable defaults. If you're curious about their exact meaning, I suggest you look at the code. | |||||
| ### Rendering Masks | |||||
| The usage for `render_quads.py` is | |||||
| ``` | |||||
| python render_quads.py manifest dataset_dir out_dir | |||||
| ``` | |||||
| `manifest` lists the image file path and quadrilateral coordinates. It should be the `out_file` of `test_pretrained.py`. The filepaths in `manifest` are relative to `dataset_dir`. `out_dir` is an output directory where quadrilateral region images are written | |||||
| ## Dependencies | |||||
| The python scripts depend on OpenCV 3.2, Matplotlib, | |||||