hachyderm.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
Hachyderm is a safe space, LGBTQIA+ and BLM, primarily comprised of tech industry professionals world wide. Note that many non-user account types have restrictions - please see our About page.

Administered by:

Server stats:

9K
active users

#TextAnalytics

0 posts0 participants0 posts today

Turning pdfs from nightmares to opportunities.

We usually stay away from pdfs as data sources, because they are just super inconvenient to handle. For my research needs and to help out a colleague, I have developed 2 functions which open up pdfs as new research opportunities. You can do:

- complex search queries on batches of pdfs

- text extraction on zones *you define* in batches of pdf

and these are free, fully click and point, no registration or installation needed, and respectful of your data.

Check it there: nocodefunctions.com

Use case: we often have access to one of these great data sources: news articles, academic papers, forms from public administrations and private orgs, press releases,  and database extractions of all kinds.

If these documents are formatted as pdfs, the text they contain can't be accessed in a clean way. The 2 functions change that by making it easy to search and extract text from lots of pdfs at once, precisely and with advanced controls.

Google Cloud's BigQuery now offers access to Vertex AI's generative text-bison model via SQL. The model can be used for tasks such as sentiment analysis, classification, and entity extraction, and can be applied to text data stored in BigQuery tables. The feature is still in preview and requires enrolment, and it is not yet clear whether fine-tuned models will be... medium.com/google-cloud/bigque