Machine learning: spaCy 3.2 brings more performance from Apple Silicon

Share your love

The Berlin-based company Explosion AI has released version 3.2 of the Natural Language Processing (NLP) Python library spaCy. The update not only promises developers more performance – especially on Nvidia GPU hardware and Mac computers with Apple’s M1 CPU – but the spaCy team has also improved operation and added new functions relating to fastText.

The new release of the NLP library should work up to 8 times faster on M1 Mac computers than on comparable systems. The prerequisite for this, however, is the use of Apple’s native Accelerate library thinc-apple-opswhich is specially tailored for matrix multiplication.

With the release of floret Explosion AI recently presented an extended version of fastText. floret, which can be easily integrated using a Python wrapper, combines the partial words known from fastText with Bloom embeddings to enable compact full vectors. When using the partial words, OOV words are omitted and thanks to the Bloom embedding, the vector table can be kept small with less than 100,000 entries. Agglutinating languages ​​such as Finnish and Korean mainly benefit from this, as two examples in the floretDemo project pipelines/floret_vectors_demo Make it clear on GitHub. Combined with HashEmbed in tok2vec In turn, the Bloom embedding has already made for compact spaCy models.

Also new in spaCy 3.2 is support for Doc as input in the pipelines nlp and nlp.pipe. If the containers for access to linguistic annotations are entered into the pipelines instead of a string, they skip the tokenizer, so that before processing – as can be seen in the following listing – user-defined extensions or DocMake s easier to create with custom tokenization.

Read Also   The Real-world Benefits of Virtual Reality

Process a doc object

doc = nlp.make_doc("This is text 500.")
doc._.text_id = 500
doc = nlp(doc)

A summary of the most important innovations in spaCy 3.2 can be found on the ExplosionAI blog. The complete overview of all details keep the release notes ready on GitHub.


Article Source

Share your love