seq

Peter A Noble - Interdisciplinary Scientist

Software and Tutorials

ANOVA and SNK test analysis

Overview [Readme]

C++ source code and input/output files: [Source code][input file][ output file]

Six tutorials in Pytorch Machine Learning and R

Overview [Readme]

A simple artificial neural network [Pytorch_1.pptx] [Pytorch_script][train_example.csv]

Model performance [Pytorch_2.pptx] [Pytorch_script2][determine_regress_pytorch.cpp][predict_y_v3.cpp][train_example.csv][test_example.csv]

Bells and whistles [Pytorch_3.pptx] [Pytorch_script3]

Moving from CPU to GPU [Pytorch_4.pptx] [Pytorch_script4]

A GPU classifier for 7 outputs [Pytorch_script5][class_train.csv][class_test.csv]

Assessing Model Performance [Readme][R_AUC_ROC_script][actual.txt][predicted.txt][plot.pdf]

Chaos Gene Representation

Overview [Readme][Proof][Benchmarking paper][Application paper]

Convert a gene sequence into x- and y- coordinates [Readme][Source code][test fasta file][output file you should get by running software]

Count sequences within CGR boxes using x- and y- coordinates [Readme][Source code][test fasta file][output file you should get by running software]

Count the number of targets (e.g., AAACCA) in a specific gene sequence using the CGR coordinates [Readme][Source code][input file][output file you should get by running software]

Neuroet: a user-friendly Machine Learning tool for scientists

Overview [Readme][Benchmarking paper][Application paper]

Download the Neuroet app and follow the instructions [Neuroet] [Instructions]

C++ programs: [Sensitivity analysis] [Predict Y's from model]

MS Excel files: [Extract model] [Analyze sensitivity results]

Example input files: [train_x.txt] [train_y.txt] [test_x.txt] [combined_x.txt] [combined_y.txt]

Natural Language Processing (NLP) to identify smokers from non-smokers in unstructured electronic health records (EHRs)

Overview [Readme][pptx]

Determining the number of unique target words and their count in a document. [Source code][Input File ][Output file]

Extract words to the left and right of target word. [Source code][Input File #1 ][Input File #2][Output file] [Excel file]

Vectorize the words. [Source code][Input File #1 ][Input File #2][Output file #1][Output file #2 ]

Make training and testing datafiles and run Neuroet. [Results from Neuroet using 100 vectors with a hidden layer of 3 neurons][Excel file of validation and confusion matrix results]

Determine the Response Curve of the model in R. [ R script][Actual results][Predicted results][AUC plot]

Analytics to predict future ICD codes based on Bayesian probabilities and network analyses

Overview [Readme][ppt]

Build Bayesian Probability Dataset. [C++ source file][Non-redundant patient report file][Non-redundant codes file]

Generate Network edges and nodes based on Bayesian Probabilities of ICDs. [C++ source file][R file to make pretty network diagrams]

Determine the coefficients of an equation using matrix algebra

Overview [Readme][Benchmarking paper][Application paper 1][Application paper 2]

How to compile the source code and example test and output files [Readme][Source code][matrix_h][Input file #1][Input file #2][output file you should get by running software]

Calibrate a DNA microarray and test the calibration with real data

Fast-track Overview [short version][long version][Benchmarking paper][Application paper 1][Application paper 2]

Average probe signal intensities [Readme][Source code][input_1 file][input_2 file][input_3 file][input_4 file][input_5 file][input_6 file][output file you should get by running software]

Calibrate probe signal intensities [Readme][Source code][input_1 file][output file you should get by running software]

Calculate concentrations from intensities and calibrations [Readme][Source code][input_1 file][expt_test file][output file you should get by running software]

Create a Blast database and query with fasta files

How to make a NCBI blast database [Readme][test file]

How to query a NCBI database with a fasta file [Readme][test file][output file you should get by running software][output file with header]

Purge Blast output by Percent Similarity and Minimum Alignment Length [Readme][Source code][test file][output file you should get by running software]

Manipulate fasta files

Convert fasta files to one line files and remove verbose text [Readme][Source code][test fasta file][output file you should get by running software]

Insert sequence length and convert one-line sequence files to fasta format [Readme][Source code][test file][output file you should get by running software]

Determine the GC content of a DNA sequencing run [Readme][Source code][test file][output file you should get by running software]

Determine the GC content of each sequence in a DNA sequencing run [Readme][Source code][test file][output file you should get by running software]

Make a GC histogram of a sequencing run [Readme][Source code][test file][output file you should get by running software]

Calculate hexanucleotide frequencies and determine co-occurring sequences in a metagenomic sample

Determine hexanucleotide frequences for a metagenomic sample [Readme][Source code][test file][hexamer file][output file you should get by running software]

Determine the co-occurrence of sequences in a 454 sample [Readme][Source code][test file][output file you should get by running software]