Releases: isi-nlp/nlcodec
Releases · isi-nlp/nlcodec
0.5 - Add `byte` scheme
Add byte
scheme
v0.4.0 -- add support for class scheme
class
scheme supported now
v0.3.2 - Shrink vocabulary support
This version is used in our many-English paper https://arxiv.org/abs/2104.00290
nlcodec CLI bug fix. Add nlcodec-learn CLI for Spark based learn
v0.3.1 version 0.3.1
Db, Multipartdb, Batch, and more; perf improv with __slots__
- add
nlcodec-freqs
CLI to setup.py - log time and memory usage for
learn
task - log BPE merge operations once every 2s instead of all operations
- using
__slots__
: ~25% faster, %30 less memory for BPE with 3M word types nlcodec.db.core
withDb
andMultipartDb
nlcodec.db.batch
withBatch
andBathIterable
- CLI
nlcodec.learn
for learning BPE using pyspark - CLI
nlcodec.bitextdb
to build a database from parallel text
fix issue with name property
option to run on a spark session given by caller
- spark session can be specified by user
- docs published
PySpark and term-frequencies support for large datasets
- Option to accept term frequencies as input
- PySpark backend to compute word and char frequencies
--min-co-ev
of BPE is CLI arg
Fix find_packages() issue; select all nested packages
v0.2.1 update release docs
public release v0.2.0
- uploaded to pypi :
pip install nlcodec
- public repository with apache license 2.0