ingest¶
A mechanism to ingest CSV files into a database.
In morphological profiling experiments, a CellProfiler pipeline is often run in parallel across multiple images and produces a set of CSV files. For example, imaging a 384-well plate, with 9 sites per well, produces 384 * 9 images; a CellProfiler process may be run on each image, resulting in a 384*9 output directories (each directory typically contains one CSV file per compartment (e.g. Cells.csv, Cytoplasm.csv, Nuclei.csv) and one CSV file for per-image measurements (e.g. Image.csv).
cytominer_database.ingest.seed
can be used to read all these CSV files into a database backend. SQLite is the
recommended engine, but ingest will likely also work with PostgreSQL and MySQL.
cytominer_database.ingest.seed
assumes a directory structure like shown below:
Example:
import cytominer_database.ingest
cytominer_database.ingest.seed(source, target, config)
-
cytominer_database.ingest.
checksum
(pathname, buffer_size=65536)[source]¶ Generate a 32-bit unique identifier for a file.
Parameters: - pathname – input file
- buffer_size – buffer size
-
cytominer_database.ingest.
into
(input, output, name, identifier, skip_table_prefix=False)[source]¶ Ingest a CSV file into a table in a database.
Parameters: - input – Input CSV file.
- output – Connection string for the database.
- name – Table in database into which the CSV file will be ingested
- identifier – Unique identifier for
input
. - skip_table_prefix – True if the prefix of the table name should be excluded from the names of columns.
-
cytominer_database.ingest.
seed
(source, target, config_file, skip_image_prefix=True)[source]¶ Read CSV files into a database backend.
Parameters: - config_file – Configuration file.
- source – Directory containing subdirectories that contain CSV files.
- target – Connection string for the database.
- skip_image_prefix – True if the prefix of image table name should be excluded from the names of columns from per image table