Tabula is a tool for liberating data tables locked inside PDF files.

View the Project on GitHub tabulapdf/tabula

Current Version: 1.1.0

Other Versions: pre-releases & archives

Need help? Open an issue on Github.

Donate: Help support this project by backing us on OpenCollective.

We'd love to hear from you! Say hi on Twitter at @TabulaPDF

Latest Version: Tabula 1.1.0

We're proud to announce the first official release of Tabula 1.1!

Tabula 1.1 contains a rewrite of our processing backend which should provide a significant performance increase -- up to 7x faster detection and extraction! The backend rewrite also improves support for RTL languages and fixes many other bugs. (You can read about all the changes in the release notes.)

Download Tabula 1.1 below, or on the release notes page.

Special thanks to our OpenCollective backers for supporting our work on Tabula; if you find Tabula useful in your work, please consider a one-time or monthly donation.

How Can Tabula Help Me?

If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful it is — there's no easy way to copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface. Tabula works on Mac, Windows and Linux.

Who Uses Tabula?

Tabula is used to power investigative reporting at news organizations of all sizes, including ProPublica, The Times of London, Foreign Policy, La Nación (Argentina), The New York Times and the St. Paul (MN) Pioneer Press.

Grassroots organizations like rely on Tabula to turn clunky documents into human-friendly public resources.

And researchers of all kinds use Tabula to turn PDF reports into Excel spreadsheets, CSVs, and JSON files for use in analysis and database applications.

Download & Install Tabula

Windows & Linux users will need a copy of Java installed. You can download Java here. (Java is included in the Mac version.)

  1. Download the version of Tabula for your operating system:
  2. Extract the zip file. (Instructions: Windows, Mac)
  3. Go into the folder you just extracted. Run the "Tabula" program inside.
  4. A web browser will open. If it doesn't, open your web browser, and go to http://localhost:8080. There's Tabula!

How to Use Tabula

  1. Upload a PDF file containing a data table.
  2. Browse to the page you want, then select the table by clicking and dragging to draw a box around the table.
  3. Click "Preview & Export Extracted Data". Tabula will try to extract the data and display a preview. Inspect the data to make sure it looks correct. If data is missing, you can go back to adjust your selection.
  4. Click the "Export" button.
  5. Now you can work with your data as text file or a spreadsheet rather than a PDF! (You can open the downloaded file in Microsoft Excel or the free LibreOffice Calc)

Note: Tabula only works on text-based PDFs, not scanned documents.

Authors and Contributors

Tabula was created by Manuel Aristarán, Mike Tigas and Jeremy B. Merrill with the support of ProPublica, La Nación DATA, Knight-Mozilla OpenNews, The New York Times and The Knight Foundation. Tabula was designed by Jason Das.

Want to contribute? Fork it on GitHub and check out the to-do list for ideas. You can also support our continued work on Tabula with a one-time or monthly donation on OpenCollective.

Learn more about this project on Source: "Introducing Tabula"

Tabula was created by journalists for journalists and anyone else working with data locked away in PDFs. Tabula will always be free and open source.