Title: | Useful Functions for Handfully Manipulating and Analyzing Data with Data.frame Format |
---|---|
Description: | Some useful functions for simply manipulating and analyzing data with data.frame format. It mainly includes the following sections: ReformatDataframe (reformat dataframe with the modifiers), InteractDataframe, and Post-VCF (for downstream analysis for data generated from vcftools (Petr Danecek, 2011) (<http://dx.doi.org/10.1093/bioinformatics/btr330>) or plink (Chang CC, 2015) <10.1186/s13742-015-0047-8>. |
Authors: | Hongfei Liu |
Maintainer: | Hongfei Liu <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2024-11-26 05:22:49 UTC |
Source: | https://github.com/luffylouis/handyfunctions |
check the validation and return index of cols given from input in rawDataFrame
checkCols(rawDataFrame, cols)
checkCols(rawDataFrame, cols)
rawDataFrame |
raw data.frame |
cols |
specific cols given from input |
return validation (only FALSE if invaild cols input) or index of cols
library(handyFunctions) data(people) checkCols(people, c("..name", "..sex")) # OR checkCols(people, c(1, 2))
library(handyFunctions) data(people) checkCols(people, c("..name", "..sex")) # OR checkCols(people, c(1, 2))
Return suggested dtype of vector input
checkDtype(vector)
checkDtype(vector)
vector |
vector/list input |
Return suggested dtypes of vector
library(handyFunctions) vector <- c(1, 2, 3, "", NA, " ", "four", "NA", 5) checkDtype(vector)
library(handyFunctions) vector <- c(1, 2, 3, "", NA, " ", "four", "NA", 5) checkDtype(vector)
A dataset containing the personal grade information (chinese, math, english, physics, biology, chemistry) of virtual persons.
grade
grade
A data frame with 6 rows and 7 variables:
name, chinese or foreigner, in carats
grade of the chinese, in numbers
grade of the math, in numbers
grade of the english, in numbers
grade of the physics, in numbers
grade of the biology, in numbers
grade of the chemistry, in numbers
...
"simulated dataset"
Return the index of source vector matched with query vector
matchIndex(SourceInfo, queryInfo, queryType = TRUE)
matchIndex(SourceInfo, queryInfo, queryType = TRUE)
SourceInfo |
the source vector |
queryInfo |
the query vector |
queryType |
logical If set it to accurate match (default: TRUE) |
the index of source vector matched with query vector
library(handyFunctions) data(grade) matchIndex(grade[, "name"], c("Ming Li", "Bang Wei"))
library(handyFunctions) data(grade) matchIndex(grade[, "name"], c("Ming Li", "Bang Wei"))
merge two data.frame based on xcol and ycol
mergeCustom(x, y, xcol, ycol)
mergeCustom(x, y, xcol, ycol)
x |
the first data.frame |
y |
the second data.frame |
xcol |
colnames which you want to merged in first data.frame |
ycol |
colnames which you want to merged in second data.frame |
return the new data.frame merged
library(handyFunctions) data(people) data(grade) mergeCustom(people, grade, "..name", "name")
library(handyFunctions) data(people) data(grade) mergeCustom(people, grade, "..name", "name")
Return reformatted data.frame with standard col names
modifyColNames(rawDataFrame, cols = TRUE, rawSep = "..", sep = "_")
modifyColNames(rawDataFrame, cols = TRUE, rawSep = "..", sep = "_")
rawDataFrame |
Raw data.frame input |
cols |
Specific col names or indexes what you want to reformat (default: TRUE, use all cols) |
rawSep |
Raw odd separation symbol in col names of raw data.frame. Note: it supports regEx (regular expression), so "." means all possible symbols. If you want to use the "." dot notation, please use "[.]". |
sep |
Separation symbol in col names of modified data.frame |
A modified data.frame with col names separated by your given delimitator
library(handyFunctions) data(people) modified_people <- modifyColNames(people,rawSep = "[.][.]")
library(handyFunctions) data(people) modified_people <- modifyColNames(people,rawSep = "[.][.]")
Return suggested appropriate dtypes for each column in rawDataFrame
modifyColTypes(rawDataFrame, cols = TRUE, dtype = FALSE, custom = FALSE)
modifyColTypes(rawDataFrame, cols = TRUE, dtype = FALSE, custom = FALSE)
rawDataFrame |
Raw data.frame |
cols |
Specify cols which you want to change its dtypes when custom is FALSE (default: TRUE, for all cols) |
dtype |
Specify indexed matched dtypes whcih you want to update when custom is FALSE (default: FALSE, for automatically update) |
custom |
Option whether set to auto/custom , you can specify your custom dtypes for cols given when setting to TRUE (default: FALSE, for auto) |
Return a new data.frame with appropriate dtypes suggested for each cols
library(handyFunctions) data(people) modifyColTypes(people)
library(handyFunctions) data(people) modifyColTypes(people)
Return reformatted data.frame with standard row names
modifyRowNames(rawDataFrame, rows = TRUE, rawSep = "..", sep = "_")
modifyRowNames(rawDataFrame, rows = TRUE, rawSep = "..", sep = "_")
rawDataFrame |
Raw data.frame input |
rows |
Specific row names or indexes what you want to reformat (default: TRUE, use all row) |
rawSep |
Raw odd separation symbol in row names of raw data.frame. Note: it supports regEx (regular expression), so "." means all possible symbols. If you want to use the "." dot notation, please use "[.]". |
sep |
Separation symbol in row names of modified data.frame |
A modified data.frame with row names separated by your given delimitator
library(handyFunctions) data(people) modifyRowNames(people)
library(handyFunctions) data(people) modifyRowNames(people)
A dataset containing the personal basic information (name, sex, age, and death_age) of virtual persons.
people
people
A data frame with 6 rows and 4 variables:
name, chinese or foreigner, in carats
sex of the person, in carats
living age in final record, in numbers
final age when a person is dead, in numbers
...
"simulated dataset"
return index of x data.frame with the given vector/list or ycol in data.frame (if set the accurate match or not)
queryingInfo(SourceData, sourceCol, queryCol, queryInfo, queryType = TRUE)
queryingInfo(SourceData, sourceCol, queryCol, queryInfo, queryType = TRUE)
SourceData |
the source data.frame which you want to query |
sourceCol |
the col names or index of query field in source data.frame |
queryCol |
the col names or index of return field in source data.frame |
queryInfo |
vector/list the query info |
queryType |
logical if set it to accurate match (default: TRUE) |
a vector in query field matched with query info in source data
library(handyFunctions) data(grade) queryingInfo(grade, "name", "chinese", c("Ming Li", "Bang Wei"))
library(handyFunctions) data(grade) queryingInfo(grade, "name", "chinese", c("Ming Li", "Bang Wei"))
Function of showing SNP density at chromosome level
ShowSNPDensityPlot( densityData, binSize, densityColorBar = c("grey", "darkgreen", "yellow", "red"), chromSet = c(1:22), withchr = FALSE )
ShowSNPDensityPlot( densityData, binSize, densityColorBar = c("grey", "darkgreen", "yellow", "red"), chromSet = c(1:22), withchr = FALSE )
densityData |
the raw density data generated from vcftools |
binSize |
the bin size set while generating density data |
densityColorBar |
vector Specific the color bar for plotting density plot (generally four colors) |
chromSet |
vector Filtered chrom set which you want to plot (it must be matched with the CHROM column in densityData) |
withchr |
logical If the chromsome labels of density plot is prefixed with "chr". Note: it cannot work when the filtered chrom set contain other uncommon chrom symbols (e.g. NC0*, etc) |
A ggplot2 object for SNP density plot
library(handyFunctions) data(SNV_1MB_density_data) ShowSNPDensityPlot(SNV_1MB_density_data, binSize = 1e6, chromSet = c(38:1))
library(handyFunctions) data(SNV_1MB_density_data) ShowSNPDensityPlot(SNV_1MB_density_data, binSize = 1e6, chromSet = c(38:1))
A dataset containing the SNV number within 1Mb bins called from transcriptome dataset of wild wolf and domesticated dogs.
SNV_1MB_density_data
SNV_1MB_density_data
A data frame with 2544 rows and 4 variables:
chrom id, reference genome of CanFam3.1, in numbers/carats
the start genomic coordinate for one bin at relevant chromosome, in numbers
the end genomic coordinate for one bin at relevant chromosome, in numbers
SNV(variants) number within one bin per KB, in numbers
...
"real dataset"
Return specific-indexed vector according to given delimitator/separator by splitting one col in data.frame
splitCol(data, col = FALSE, sep, index, fixed = TRUE)
splitCol(data, col = FALSE, sep, index, fixed = TRUE)
data |
vector or data.frame input |
col |
the col names or indexes if data.frame input |
sep |
separation deliminator |
index |
the index of symbol which you want |
fixed |
logical. If TRUE match split exactly, otherwise use regular expressions, detailed info can be seen in strsplit. |
specific-indexed vector or factor
library(handyFunctions) data(people) splitCol(people, col = 1, sep = " ", index = 2)
library(handyFunctions) data(people) splitCol(people, col = 1, sep = " ", index = 2)
Reformat dataframe with the all modifiers simultaneously (colNames, rowNames and dtypes)
unifyDataframe( rawDataFrame, rawRowSep = "..", rowSep = "_", rawColSep = "..", colSep = "_", changeDtype = TRUE )
unifyDataframe( rawDataFrame, rawRowSep = "..", rowSep = "_", rawColSep = "..", colSep = "_", changeDtype = TRUE )
rawDataFrame |
raw data.frame |
rawRowSep |
raw separation deliminator of row names in raw data.frame |
rowSep |
the new separation deliminator of row names |
rawColSep |
raw separation deliminator of col names in raw data.frame |
colSep |
the new separation deliminator of col names |
changeDtype |
if change the dtypes of cols |
A modified data.frame with applied to above all modifiers
library(handyFunctions) data(people) unifyDataframe(people,rawColSep = "[.][.]")
library(handyFunctions) data(people) unifyDataframe(people,rawColSep = "[.][.]")