Conjugate Gradient in Python - full implementation and example
Conjugate gradient methods - optimization
The Conjugate Gradient Method for Solving Linear Systems
Conjugate gradient method - formulasearchengine
conjugate gradient method example
conjugate gradient method example - win
OOP VS Optimization VS Decision Making. Which course should I choose?
Hi, I'm currently enrolled in a M.Sc. course in Data Science & Engineering, and I have to decide which optional course to take among the following (I can choose only 1), please take a brief look at the syllabus. 1) OOP syllabus:
Basic features (1 credit) • Object-oriented programming, java, eclipse • Classes, attributes, methods and constructors, objects • Packages and visibility rules • Strings, wrapper classes • Arrays
Linear programming: modeling techniques, basic concepts of the Simplex Method, and duality (10% of the course).
2. Computational complexity: problem classes P, NP, NP-complete, and CoNP-complete (10% of the course).
3. Exact optimization methods: Branch and Bound, Cutting Planes, and Dynamic Programming (20% of the course).
4. Heuristic optimization methods: greedy algorithms, GRASP, Beam Search, meta-heuristics (Tabu Search, Simulated Annealing, Genetic Algorithms, ACO, VNS, RBS), and math-heuristics (30% of the course).
5. Decision making under uncertainty: Stochastic Programming with recourse, Measures for Stochastic Programming, Progressive Hedging method (10% of the course).
6. Nonlinear Programming: theoretical conditions for unconstrained and constrained optimization, algorithms for unconstrained and constrained optimization (20%).
Of course, the best answer is "it depends on what you have already done, and what you would like to do", so I try to give a brief introduction. I come from a B.Sc. in Electronic Engineering, and this is the only reason why I'm considering taking OOP. I don't have much problem with programming, but I feel like I don't have some skills because my Bsc was not in CS. Regrading the other 2 courses, they are not the only math courses in my degree, I have many others ( such as ML&DL, Math for ML, Statistics for Data Science, Network Dynamics & Learning, Computational Linear Algebra), but still, they might be interesting. I don't want to work as a software developer, I'm more interested in research, but do you think I should take OOP anyway to fill some gaps? Can you give me some examples where Decision Making and Numerical/Stochastic Optimization could be useful? (As I said the most important topics of both courses are also covered partially in other courses)
Don't put money on this as I'm still debugging (I bet you half a bitcoin I have mistaken a few indices in the H_2 norm)... Here is the discretisation formula I used, to copy-paste on latexbase:
OOP Vs Optimization Vs Decision Making. Which course would you take?
Hi, I'm currently enrolled in a M.Sc. course in Data Science & Engineering, and I have to decide which optional course to take among the following (I can choose only 1), please take a brief look at the syllabus. 1) OOP syllabus:
Basic features (1 credit) • Object-oriented programming, java, eclipse • Classes, attributes, methods and constructors, objects • Packages and visibility rules • Strings, wrapper classes • Arrays
Linear programming: modeling techniques, basic concepts of the Simplex Method, and duality (10% of the course).
2. Computational complexity: problem classes P, NP, NP-complete, and CoNP-complete (10% of the course).
3. Exact optimization methods: Branch and Bound, Cutting Planes, and Dynamic Programming (20% of the course).
4. Heuristic optimization methods: greedy algorithms, GRASP, Beam Search, meta-heuristics (Tabu Search, Simulated Annealing, Genetic Algorithms, ACO, VNS, RBS), and math-heuristics (30% of the course).
5. Decision making under uncertainty: Stochastic Programming with recourse, Measures for Stochastic Programming, Progressive Hedging method (10% of the course).
6. Nonlinear Programming: theoretical conditions for unconstrained and constrained optimization, algorithms for unconstrained and constrained optimization (20%).
Of course, the best answer is "it depends on what you have already done, and what you would like to do", so I try to give a brief introduction. I come from a B.Sc. in Electronic Engineering, and this is the only reason why I'm considering taking OOP. I don't have much problem with programming, but I feel like I don't have some skills because my Bsc was not in CS. Regrading the other 2 courses, they are not the only math courses in my degree, I have many others ( such as ML&DL, Math for ML, Statistics for Data Science, Network Dynamics & Learning, Computational Linear Algebra), but still, they might be interesting. I don't want to work as a software developer, I'm more interested in research, but do you think I should take OOP anyway to fill some gaps? Can you give me some examples where Decision Making and Numerical/Stochastic Optimization could be useful? (As I said the most important topics of both courses are also covered partially in other courses)
VASP- tetrahedron method for surface relaxations, cause of error?
Hi comp chemists, I am once again asking for your help. I am running a series of surface relaxations for iron and chromium oxide surfaces. An example INCAR I am using:
convergence and job settings: ISMEAR=-5 !Gaussian smearing SIGMA=0.03 ! smearing for insulators ICHARGE=2 !Conjugate gradient algorithm relaxes ions IBRION=2 !ions and electronic degrees of freedom changed PREC=A !Precision Accurate POTIM = 0.8 !Scaling constant for step widths ISPIN=2 !Spin polarized NSW=100 !Number of ionic steps at 200 EDIFF=10E-6 !Break condition for the electronic SC-loop ISIF=2 IVDW=11 !DFT-D3 Convergence LDIPOL=.TRUE IDIPOL=3 memory: LPLANE = .TRUE. NCORE = 8 LSCALU = .FALSE. NSIM = 4 functional and dispersion: GGA=B3 VDW_S6 = 1.0 VDW_S8 = 1.703 VDW_SR = 1.261 VDW RADIUS = 50.2 VDW CNRADIUS = 20.0 ENMAX=800
However, using a lot of the jobs I put on end up yielding the errors:
WARNING: DENTET: can't reach specified precision
and
very serious problems the old and the new charge density differ
which have been attributed to using the tetrahedron methods for large systems. Could it also be due to the fact that I am using on 1 k-point in the Z direction? My K-point setup is:
Automatic mesh 0 Gamma 11 11 1 0 0 0
When switching to Gaussian smearing the first electronic optimization fails, with the generic error:
Lupine Publishers | The Creation of C13H20BeLi2SeSi. The Proposal of a Bio- Inorganic Molecule, Using Ab Initio Methods for The Genesis of a Nano Membrane
i.e. 20 glucose molecules linked together would have 19 bonds
Molecular formula
# of molecules * molecular formula - number of bonds * H20 (from hydrolysis)
i.e. when you bond 5 glucose molecules together you have to subtract 4H2O
pH/pOH
-log[H+] = pH
-log[OH-] = pOH
pH + pOH = 14
Leaf surface area
i.e. using graph paper to find surface area
Transpiration rate
Amount of water used / surface area / time
Labs
Transpiration Lab
Basically you take this potometer which measures the amount of water that gets sucked up by a plant that you have and you expose the plant to different environmental conditions (light, humidity, temperature) and see how fast the water gets transpired
Random stuff to know:
It’s hard to get it to work properly
A tight seal of vaseline keeps everything tidy and prevents water from evaporating straight from the tube, also allows for plant to suck properly
Water travels from high water potential to low water potential
2) Cell Structure & Function
Content
Cellular Components
Many membrane-bound organelles evolved from once free prokaryotes via endosymbiosis, such as mitochondria (individual DNA)
Compartmentalization allows for better SA:V ratio and helps regulate cellular processes
Cytoplasm: thick solution in each cell containing water, salts, proteins, etc; everything - nucleus
Cytoplasmic streaming: moving all the organelles around to give them nutrients, speeds up reactions
Cytosol: liquid of the cytoplasm (mostly water)
Plasma Membrane: separates inside of cell from extracellular space, controls what passes through amphipathic area (selectively permeable)
Aquaporin: hole in membrane that allows water through
Cell Wall: rigid polysaccharide layer outside of plasma membrane in plants/fungi/bacteria
Bacteria have peptidoglycan, fungi have chitin, and plants have cellulose and lignin
Turgor pressure pushes the membrane against the wall
Nucleus: contains genetic information
Has a double membrane called the nuclear envelope with pores
Nucleolus: in nucleus, produces ribosomes
Chromosomes: contain DNA
Centrioles: tubulin thing that makes up centrosome in the middle of a chromosome
Smooth Endoplasmic Reticulum: storage of proteins and lipids
Rough Endoplasmic Reticulum: synthesizes and packages proteins
Chloroplasts: photosynthetic, sunlight transferred into chemical energy and sugars
More on this in photosynthesis
Vacuoles: storage, waste breakdown, hydrolysis of macromolecules, plant growth
Plasmodesmata: channels through cell walls that connect adjacent cells
Golgi Apparatus: extracellular transport
Lysosome: degradation and waste management
Mutations in the lysosome cause the cell to swell with unwanted molecules and the cell will slow down or kill itself
Mitochondria: powerhouse of the cell
Mutations in the mitochondria cause a lack of deficiency of energy in the cell leading to an inhibition of cell growth
Vesicles: transport of intracellular materials
Microtubules: tubulin, stiff, mitosis, cell transport, motor proteins
Microfilaments: actin, flexible, cell movement
Flagella: one big swim time
Cilia: many small swim time
Peroxisomes: bunch of enzymes in a package that degrade H202 with catalase
Ribosomes: protein synthesis
Microvilli: projections that increase cell surface area like tiny feetsies
In the intestine, for example, microvilli allow more SA to absorb nutrients
Cytoskeleton: hold cell shape
Cellular Transport
Passive transport: diffusion
Cell membranes selectively permeable (large and charged repelled)
Tonicity: osmotic (water) pressure gradient
Cells are small to optimize surface area to volume ratio, improving diffusion
Primary active transport: ATP directly utilized to transport
Secondary active transport: something is transported using energy captured from movement of other substance flowing down the concentration gradient
Endocytosis: large particles enter a cell by membrane engulfment
Phagocytosis: “cell eating”, uses pseudopodia around solids and packages it within a membrane
Pinocytosis: “cell drinking”, consumes droplets of extracellular fluid
Receptor-mediated endocytosis: type of pinocytosis for bulk quantities of specific substances
Exocytosis: internal vesicles fuse with the plasma membrane and secrete large molecules out of the cell
Ion channels and the sodium potassium pump
Ion channel: facilitated diffusion channel that allows specific molecules through
Sodium potassium pump: uses charged ions (sodium and potassium)
Membrane potential: voltage across a membrane
Electrogenic pump: transport protein that generates voltage across a membrane
Proton pump: transports protons out of the cell (plants/fungi/bacteria)
Cotransport: single ATP-powered pump transports a specific solute that can drive the active transport of several other solutes
Bulk flow: one-way movement of fluids brought about by pressure
Dialysis: diffusion of solutes across a selective membrane
Cellular Components Expanded: The Endomembrane System
Nucleus + Rough ER + Golgi Bodies
Membrane and secretory proteins are synthesized in the rough endoplasmic reticulum, vesicles with the integral protein fuse with the cis face of the Golgi apparatus, modified in Golgi, exits as an integral membrane protein of the vesicles that bud from the Golgi’s trans face, protein becomes an integral portion of that cell membrane
Calculations
Surface area to volume ratio of a shape (usually a cube)
U-Shaped Tube (where is the water traveling)
Solution in u-shaped tube separated by semi-permeable membrane
find average of solute (that is able to move across semi permeable membrane)
add up total molar concentration on both sides
water travels where concentration is higher
Water Potential = Pressure Potential + Solute Potential
Solute Potential = -iCRT
i = # of particles the molecule will make in water
C = molar concentration
R = pressure constant (0.0831)
T = temperature in kelvin
Labs
Diffusion and Osmosis
Testing the concentration of a solution with known solutions
Dialysis bag
Semipermeable bag that allows the water to pass through but not the solute
Potato core
Has a bunch of solutes inside
Relevant Experiments
Lynne Margolis: endosymbiotic theory (mitochondria lady)
Chargaff: measured A/G/T/C in everything (used UV chromatography)
Franklin + Watson and Crick: discovered structure of DNA; Franklin helped with x ray chromatography
3) Cellular Energetics
Content
Reactions and Thermodynamics
Baseline: used to establish standard for chemical reaction
Catalyst: speeds up a reaction (enzymes are biological catalysts)
Exergonic: energy is released
Endergonic: energy is consumed
Coupled reactions: energy lost/released from exergonic reaction is used in endergonic one
Laws of Thermodynamics:
First Law: energy cannot be created nor destroyed, and the sum of energy in the universe is constant
Second Law: energy transfer leads to less organization (greater entropy)
Third Law: the disorder (entropy) approaches a constant value as the temperature approaches 0
Cellular processes that release energy may be coupled with other cellular processes
Loss of energy flow means death
Energy related pathways in biological systems are sequential to allow for a more controlled/efficient transfer of energy (product of one metabolic pathway is reactant for another)
Bioenergetics: study of how energy is transferred between living things
Fuel + 02 = CO2 + H20
Combustion, Photosynthesis, Cellular Respiration (with slight differences in energy)
Enzymes
Speed up chemical processes by lowering activation energy
Structure determines function
Active sites are selective
Enzymes are typically tertiary- or quaternary-level proteins
Catabolic: break down / proteases and are exergonic
Anabolic: build up and are endergonic
Enzymes do not change energy levels
Substrate: targeted molecules in enzymatic
Many enzymes named by ending substrate in “-ase”
Enzymes form temporary substrate-enzyme complexes
Enzymes remain unaffected by the reaction they catalyze
Enzymes can’t change a reaction or make other reactions occur
Induced fit: enzyme has to change its shape slightly to accommodate the substrate
Cofactor: factor that help enzymes catalyze reactions (org or inorg)
Examples: temp, pH, relative ratio of enzyme and substrate
Organic cofactors are called coenzymes
Denaturation: enzymes damaged by heat or pH
Regulation: protein’s function at one site is affected by the binding of regulatory molecule to a separate site
Enzymes enable cells to achieve dynamic metabolism - undergo multiple metabolic processes at once
Cannot make an endergonic reaction exergonic
Steps to substrates becoming products
Substrates enters active site, enzyme changes shape
Substrates held in active site by weak interactions (i.e. hydrogen bonds)
Substrates converted to product
Product released
Active site available for more substrate
Rate of enzymatic reaction increases with temperature but too hot means denaturation
Inhibitors fill the active site of enzymes
Some are permanent, some are temporary
Competitive: block substrates from their active sites
Non competitive (allosteric): bind to different part of enzyme, changing the shape of the active site
Allosteric regulation: regulatory molecules interact with enzymes to stimulate or inhibit activity
Enzyme denaturation can be reversible
Cellular Respiration
Steps
Glycolysis
Acetyl co-A reactions
Krebs / citric acid cycle
Oxidative phosphorylation
Brown fat: cells use less efficient energy production method to make heat
Absorption vs action spectrum (broader, cumulative, overall rate of photosynthesis)
Components
Chloroplast
Mesophyll: interior leaf tissue that contains chloroplasts
Pigment: substance that absorbs light
Steps
Light-Dependent Reaction
Light-Independent (Dark) Reaction (Calvin Cycle)
Anaerobic Respiration (Fermentation)
Glycolysis yields 2ATP + 2NADH + 2 Pyruvate
2NADH + 2 Pyruvate yields ethanol and lactate
Regenerates NAD+
Calculations
Calculate products of photosynthesis & cellular respiration
Labs
Enzyme Lab
Peroxidase breaks down peroxides which yields oxygen gas, quantity measured with a dye
Changing variables (i.e. temperature) yields different amounts of oxygen
Photosynthesis Lab
Vacuum in a syringe pulls the oxygen out of leaf disks, no oxygen causes them to sink in bicarbonate solution, bicarbonate is added to give the disks a carbon source for photosynthesis which occurs at different rates under different conditions, making the disks buoyant
Cellular Respiration Lab
Use a respirometer to measure the consumption of oxygen (submerge it in water)
You put cricket/animal in the box that will perform cellular respiration
You put KOH in the box with cricket to absorb the carbon dioxide (product of cellular respiration)-- it will form a solid and not impact your results
Relevant Experiments
Engelmann
Absorption spectra dude with aerobic bacteria
4) Cell Communication & Cell Cycle
Content
Cell Signalling
Quorum sensing: chemical signaling between bacteria
See Bonnie Bassler video
Taxis/Kinesis: movement of an organism in response to a stimulus (chemotaxis is response to chemical)
Ligand: signalling molecule
Receptor: ligands bind to elicit a response
Hydrophobic: cholesterol and other such molecules can diffuse across the plasma membrane
Hydrophilic: ligand-gated ion channels, catalytic receptors, G-protein receptor
Signal Transduction
Process by which an extracellular signal is transmitted to inside of cell
Pathway components
Signal/Ligand
Receptor protein
Relay molecules: second messengers and the phosphorylation cascade
DNA response
Proteins in signal transduction can cause cancer if activated too much (tumor)
RAS: second messenger for growth factor-- suppressed by p53 gene (p53 is protein made by gene) if it gets too much
Response types
Gene expression changes
Cell function
Alter phenotype
Apoptosis- programmed cell death
Cell growth
Secretion of various molecules
Mutations in proteins can cause effects downstream
Pathways are similar and many bacteria emit the same chemical within pathways, evolution!
Feedback
Positive feedback amplifies responses
Onset of childbirth, lactation, fruit ripening
Negative feedback regulates response
Blood sugar (insulin goes down when glucagon goes up), body temperature
Cell cycle
Caused by reproduction, growth, and tissue renewal
Checkpoint: control point that triggers/coordinates events in cell cycle
Mitotic spindle: microtubules and associated proteins
Cytoskeleton partially disassembles to provide the material to make the spindle
Elongates with tubulin
Shortens by dropping subunits
Aster: radial array of short microtubules
Kinetochores on centrosome help microtubules to attach to chromosomes
Broke apart liver cells and realized the significance of the signal transduction pathway, as the membrane and the cytoplasm can’t activate glycogen phosphorylase by themselves
5) Heredity
Content
Types of reproduction
Sexual: two parents, mitosis/meiosis, genetic variation/diversity (and thus higher likelihood of survival in a changing environment)
Asexual: doesn’t require mate, rapid, almost genetically identitical (mutations)
Binary fission (bacteria)
Budding (yeast cells)
Fragmentation (plants and sponges)
Regeneration (starfish, newts, etc.)
Meiosis
One diploid parent cell undergoes two rounds of cell division to produce up to four haploid genetically varied cells
n = 23 in humans, where n is the number of unique chromosomes
Meiosis I
Prophase: synapsis (two chromosome sets come together to form tetrad), chromosomes line up with homologs, crossing over
Metaphase: tetrads line up at metaphase plate, random alignment
Anaphase: tetrad separation, formation at opposite poles, homologs separate with their centromeres intact
Telophase: nuclear membrane forms, two haploid daughter cells form
Meiosis II
Prophase: chromosomes condense
Metaphase: chromosomes line up single file, not pairs, on the metaphase plate
Anaphase: chromosomes split at centromere
Telophase: nuclear membrane forms and 4 total haploid cells are produced
Genetic variation
Crossing over: homologous chromosomes swap genetic material
Independent assortment: homologous chromosomes line up randomly
Random fertilization: random sperm and random egg interact
Gametogenesis
Spermatogenesis: sperm production
Oogenesis: egg cells production (¼ of them degenerate)
Fundamentals of Heredity
Traits: expressed characteristics
Gene: “chunk” of DNA that codes for a specific trait
Homologous chromosomes: two copies of a gene
Alleles: copies of chromosome may differ bc of crossing over
Homozygous/Heterozygous: identical/different
Phenotype: physical representation of genotype
Generations
Parent or P1
Filial or F1
F2
Law of dominance: one trait masks the other one
Complete: one trait completely covers the other one
Incomplete: traits are both expressed
Codominance: traits combine
Law of segregation (Mendel): each gamete gets one copy of a gene
Law of independent assortment (Mendel): traits segregate independently from one another
Locus: location of gene on chromosome
Linked genes: located on the same chromosome, loci less than 50 cM apart
Gene maps and linkage maps
Nondisjunction: inability of chromosomes to separate (ex down syndrome)
Polygenic: many genes influence one phenotype
Pleiotropic: one gene influences many phenotypes
Epistasis: one gene affects another gene
Mitochondrial and chloroplast DNA is inherited maternally
Diseases/Disorders
Genetic:
Tay-Sachs: can’t break down specific lipid in brain
Sickle cell anemia: misshapen RBCs
Color blindness
Hemophilia: lack of clotting factors
Chromosomal:
Turner: only one X chromosome
Klinefelter: XXY chromosomes
Down syndrome (trisomy 21): nondisjunction
Crosses
Sex-linked stuff
Blood type
Barr bodies: in women, two X chromosomes; different chromosomes expressed in different parts of the body, thus creating two different phenotype expressions in different places
Calculations
Pedigree/Punnett Square
Recombination stuff
Recombination rate = # of recombinable offspring/ total offspring (times 100) units: map units
Relevant Experiments
Mendel
6) Gene Expression and Regulation
Content
DNA and RNA Structure
Prokaryotic organisms typically have circular chromosomes
Plasmids = extrachromosomal circular DNA molecules
Purines (G, A) are double-ringed while pyrimidines (C, T, U) have single ring
snRNA - small nuclear RNA (bound to snRNPs - small nuclear ribonucleoproteins)
miRNA - microRNA (regulatory)
DNA Replication
Steps:
Helicase opens up the DNA at the replication fork.
Single-strand binding proteins coat the DNA around the replication fork to prevent rewinding of the DNA.
Topoisomerase works at the region ahead of the replication fork to prevent supercoiling.
Primase synthesizes RNA primers complementary to the DNA strand.
DNA polymerase III extends the primers, adding on to the 3' end, to make the bulk of the new DNA.
RNA primers are removed and replaced with DNA by DNA polymerase I.
The gaps between DNA fragments are sealed by DNA ligase.
Protein Synthesis
61 codons code for amino acids, 3 code as STOP - UAA, UAG, UGA - 64 total
Transcription Steps:
RNA polymerase binds to promoter (before gene) and separate the DNA strands
RNA polymerase fashions a complementary RNA strand from a DNA strand
Coding strand is same as RNA being made, template strand is complementary
Terminator on gene releases the RNA polymerase
RNA Processing Steps (Eukaryotes):
5’ cap and 3’ (poly-A tail, poly A polymerase) tail is added to strand (guanyl transferase)
Splicing of the RNA occurs in which introns are removed and exons are added by spliceosome
Cap/tail adds stability, splicing makes the correct sequence (“gibberish”)
Translation Steps:
Initiation complex is the set up of a ribosome around the beginning of an mRNA fragment
tRNA binds to codon, amino acid is linked to other amino acid
mRNA is shifted over one codon (5’ to 3’)
Stop codon releases mRNA
Gene Expression
Translation of mRNA to a polypeptide occurs on ribosomes in the cytoplasm as well as rough ER
Translation of the mRNA occurs during transcription in prokaryotes
Genetic info in retroviruses is an exception to normal laws: RNA to DNA is possible with reverse transcriptase, which allows the virus to integrate into the host’s DNA
Regulatory sequences = stretches of DNA that interact with regulatory proteins to control transcription
Epigenetic changes can affect expression via mods of DNA or histones
Observable cell differentiation results from the expression of genes for tissue-specific proteins
Induction of transcription factors during dev results in gene expression
Prokaryotes: operons transcribed in a single mRNA molecule, inducible system
Eukaryotes: groups of genes may be influenced by the same transcription factors to coordinate expression
Promoters = DNA sequences that RNA polymerase can latch onto to initiate
Negative regulators inhibit gene expression by binding to DNA and blocking transcription
Acetylation (add acetyl groups)- more loosely wound/ less tightly coiled/compressed
Methylation of DNA (add methyl groups) - less transcription- more tightly wound
Mutation and Genetic Variation
Disruptions in genes (mutations) change phenotypes
Mutations can be +/-/neutral based on their effects that are conferred by the protein formed - environmental context
Errors in DNA replication or repair as well as external factors such as radiation or chemical exposure cause them
Mutations are the primary source of genetic variation
Horizontal acquisition in prokaryotes - transformation (uptake of naked DNA), transduction (viral DNA transmission), conjugation (cell-cell DNA transfer), and transposition (DNA moved within/between molecules) - increase variation
Related viruses can (re)combine genetic material in the same host cell
Types of mutations: frameshift, deletion, insertion
Genetic Engineering
Electrophoresis separates molecules by size and charge
PCR magnifies DNA fragments
Bacterial transformation introduces DNA into bacterial cells
Operons
Almost always prokaryotic
Promoter region has operator in it
Structural genes follow promoter
Terminator ends operon
Regulatory protein is active repressor
Active repressor can be inactivated
Enhancer: remote gene that require activators
RNAi: interference with miRNA
Anabolic pathways are normally on and catabolic pathways are normally off
Calculations
Transformation efficiency (colonies/DNA)
Numbers of base pairs (fragment lengths)
Cutting enzymes in a plasmid or something (finding the lengths of each section)
Labs
Gel Electrophoresis Lab
Phosphates in DNA make it negative (even though it’s an acid!), so it moves to positive terminal on the board
Smaller DNA is quicc, compare it to a standard to calculate approx. lengths
Bacterial Transformation Lab
Purpose of sugar: arabinose is a promoter which controls the GFP in transformed cells, turns it on, also green under UV
Purpose of flipping upside down: condensation forms but doesn’t drip down
Purpose of heat shock: increases bacterial uptake of foreign DNA
Plasmids have GFP (green fluorescent protein) and ampicillin resistance genes
Calcium solution puts holes in bacteria to allow for uptake of plasmids
PCR Lab
DNA + primers + nucleotides + DNA polymerase in a specialized PCR tube in a thermal cycler
Primers bind to DNA before it can repair itself, DNA polymerase binds to the primers and begins replication
After 30 cycles, there are billions of target sequences
Relevant Experiments
Avery: harmful + harmless bacteria in mice, experimented with proteins vs DNA of bacteria
Griffith: Avery’s w/o DNA vs protein
Hershey and Chase: radioactively labeled DNA and protein
Melson and Stahl: isotopic nitrogen in bacteria, looked for cons/semi/dispersive DNA
Beadle and Tatum: changed medium’s amino acid components to find that a metabolic pathway was responsible for turning specific proteins into other proteins, “one gene one enzyme”
Nirenberg: discovered codon table
7) Natural Selection
Scientific Theory: no refuting evidence (observation + experimentation), time, explain a brand/extensive range of phenomena
Theory of Natural Selection
Definition
Not all offspring (in a population) will survive
Variation among individuals in a population
Some variations were more favourable than others in a particular environment
Those with more favourable variations were more likely to survive and reproduce.
These favourable variations were passed on and increased in frequency over time.
Types of Selection:
Directional selection: one phenotype favored at one of the extremes of the normal distribution
”Weeds out” one phenotype
Ony can happen if a favored allele is already present
Stabilizing Selection: Organisms within a population are eliminated with extreme traits
Favors “average” or medium traits
Ex. big head causes a difficult delivery; small had causes health deficits
Disruptive Selection: favors both extremes and selects against common traits
Ex. sexual selection (seems like directional but it’s not because it only affects one sex, if graph is only males then directional)
Competition for limited resources results in differential survival, favourable phenotypes are more likely to survive and produce more offspring, thus passing traits to subsequent generations.
Biotic and abiotic environments can be more or less stable/fluctuating, and this affects the rate and direction of evolution
Convergent evolution occurs when similar selective pressures result in similar phenotypic adaptations in different populations or species.
Divergent evolution: groups from common ancestor evolve, homology
Different genetic variations can be selected in each generation.
Environments change and apply selective pressures to populations.
Evolutionary fitness is measured by reproductive success.
Natural selection acts on phenotypic variations in populations.
Some phenotypic variations significantly increase or decrease the fitness of the organism in particular environments.
Through artificial selection, humans affect variation in other species.
Humans choose to cause artificial selection with specific traits, accidental selection caused by humans is not artificial
Random occurrences
Mutation
Genetic drift - change in existing allele frequency
Migration
Reduction of genetic variation within a given population can increase the differences between populations of the same species.
Conditions for a population or an allele to be in Hardy-Weinberg equilibrium are
Large population size
Absence of migration
No net mutations
Random mating
Absence of selection
Changes in allele frequencies provide evidence for the occurrence of evolution in a population.
Small populations are more susceptible to random environmental impact than large populations.
Gene flow: transference of genes/alleles between populations
Speciation: one species splits off into multiple species
Sympatric (living together i.e. disruption) Allopatric (physically separate, i.e. founder effect) Parapatric (habitats overlapping)
Polyploidy (autopolyploidy), sexual selection
Species: group of populations whose members can interbreed and produce healthy, fertile offspring but can’t breed with other species (ex. a horse and donkey can produce a mule but a mule is nonviable, so it doesn’t qualify)
Morphological definition: body shape and structural characteristics define a species
Ecological species definition: way populations interact with their environments define a species
Phylogenetic species definition: smallest group that shares a common ancestor is a species
Prezygotic barriers: barriers to reproduction before zygote is formed
Geographical error: two organisms are in different areas
Behavioural error (i.e. mating rituals aren’t the same)
Mechanical error: “the pieces don’t fit together”
Temporal error (i.e. one organism comes out at night while the other comes out in the day)
Zygotic/Gametic isolation: sperm and egg don’t physically meet
Postzygotic barriers: barriers to reproduction after zygote is formed
Hybrid viability: developmental errors of offspring
Hybrid fertility: organism is sterilized
Hybrid breakdown: offspring over generations aren’t healthy
Hybrid zone: region in which members of different species meet and mate
Reinforcement: hybrids less fit than parents, die off, strength prezygotic barriers
Fusion: two species may merge into one population
Stability: stable hybrid zones mean hybrids are more fit than parents, thus creating a stable population, but can be selected against in hybrid zones as well
Punctuated equilibria: long periods of no or little change evolutionarily punctuated by short periods of large change, gradualism is just slow evolution
Evidence of evolution
Paleontology (Fossils)
Comparative Anatomy
Embryology: embryos look the same as they grow
Biogeography: distribution of flora and fauna in the environment (pangea!)
Biochemical: DNA and proteins and stuff, also glycolysis
Phylogenetic trees
Monophyletic: common ancestor and all descendants
Polyphyletic: descendants with different ancestors
Paraphyletic: leaving specifies out of group
Out group: basal taxon, doesn’t have traits others do
Cline: graded variation within species (i.e. different stem heights based on altitude)
Anagenesis: one species turning into another species
Cladogenesis: one species turning into multiple species
Taxon: classification/grouping
Clade: group of species with common ancestor
Horizontal gene transfer: genes thrown between bacteria
Shared derived characters: unique to specific group
Shared primitive/ancestral characters: not unique to a specific group but is shared within group
Origins of life
Stages
Inorganic formation of organic monomers (miller-urey experiment)
Inorganic formation of organic polymers (catalytic surfaces like hot rock or sand)
Protobionts and compartmentalization (liposomes, micelles)
DNA evolution (RNA functions as enzyme)
Shared evolutionary characteristics across all domains
Membranes
Cell comm.
Gene to protein
DNA
Proteins
Extant = not extinct
Highly conserved genes = low rates of mutation in history due to criticalness (like electron transport chain)
Molecular clock: dating evolution using DNA evidence
Extinction causes niches for species to fill
Eukaryotes all have common ancestor (shown by membrane-bound organelles, linear chromosomes, and introns)
Calculations
Hardy-Weinberg
p + q = 1
p^2 + 2pq +q^2 = 1
Chi Squared
Labs
Artificial Selection Lab
Trichrome trait hairs
Anthocyanin for second trait (purple stems)
Function of the purple pigment?
Function of trichome hairs?
BLAST Lab
Putting nucleotides into a database outputs similar genes
Relevant Experiments
Darwin
Lamarck
Miller-Urey
Slapped some water, methane, ammonia, and hydrogen is some flasks and simulated early earth with heat and stuff and it made some amino acids.
[Nunerical Analysis] Need tips for spotting error in algorythm
I am implementing the conjugate gradient method to the normal equations in matlab and it worked well for general matrixes but the convergence gets completely off track when the matrix is ill conditioned. However I have some examples and data and my algorythm has a much worse behavior than expected. So I was wondering if there were even general things to look for in my code when things like this happen. If it is relevant I've got this kind of behaviour only for ill conditioned complex matrixes. Edit: typo in the title it was meant to be Numerical
[in-depth] How it's made: the science behind cultured/clean/cell-based meat, part 4a: the components of cell culture medium and fetal bovine serum
The Futurology subreddit frequently features highly upvoted posts on cell-based meat, reflecting the media attention and public interest that has followed the industry. There are many introductory resources to how cell-based meat is produced and what its benefits may be, however, there are no comprehensive resources that fully inform those interested in learning more. Below you’ll find the 5th post in a multi-part series that walks through the science driving the innovative technology of cell-based meat. These posts are intended to be educational but lengthy and best understood by those with science backgrounds. Please check out the previous posts linked below. Each post is also formatted for easier reading here. Series I: Cell Lines Series II: Bioprocessing Series III: Bioengineering 1 and 2 Series IV: Cell culture media 1, 2, and 3 Series V: Final products Series VI: Impact (environment, human health, food security, animal welfare) Introduction Growing cells ex vivo requires the same fundamental inputs as required in vivo: a mixture of a carbon-based energy source, amino acids, salts, vitamins, water, and other components to support cell viability and vitality. This mixture, known as the cell culture medium, is the most important factor in cell culture technology. Although cell culture is routinely performed in academic labs and industrial bioprocesses, creating the biomass required for cell-based meat to achieve mass-market penetration at competitive prices will demand significant reductions in costs, innovations for serum removal, and optimization across a diverse set of species and cell types. An overview of cell culture medium composition and the factors at play to achieve price parity with conventional meat are discussed below. Common Components of Cell Culture Medium The first instance of culturing tissues outside of the body came from Sydney Ringer in 1882. By creating a balanced salt solution with similar pH, osmolarity, and salt concentration to that of an animal’s body, Ringer was able to keep various animal tissues alive outside of the body for several days. Subsequent work in the following decades first demonstrated that culturing cells in the presence of blood plasma (i.e. serum) or embryonic extracts assisted in cellular proliferation and viability, allowing tissues to survive for longer periods of time. Over time, researchers identified the importance of glucose, amino acids, glutathione, insulin, and vitamins in the sera being used.1 Once this was known, scientists aimed at uncovering the additional unknown essential components of serum and other extracts that permitted cell proliferation and viability. In the 1940s and 50s, working with the first immortalized cell lines such as L cells2 and HeLa (discussed in Series I), scientists used iterative approaches to discover that low molecular weight dialyzed fractions of serum containing amino acids were necessary for cell survival. In 1955, Harry Eagle developed a Minimum Essential Medium by testing the amino acid requirements on several different cell lines, discovering that thirteen were indispensable. Eagle’s minimum essential medium additionally consists of glucose, six inorganic salts, eight water-soluble vitamins, and dialyzed serum. Variations on this medium were then derived using a variety of different cell lines as well as trial and error approaches that aimed at replacing serum with chemically defined components. These variations, including Dulbecco’s Minimum Essential Medium (DMEM), Iscove’s Modified DMEM, Ham’s F12, Medium 199, RPMI 1640, Leibovitz’s L-15, and others, still make up the majority of what are referred to as basal cell culture media in use for culturing the variety of cell types used today.3,4 What makes these formulations essential? Although formulations have been varied and optimized over time, the principal components of basal cell culture media have remained largely unchanged. Importantly, these variations may be cell-type specific, including for the cell types used in cell-based meat (described in Series I). Therefore, rather than discussing optimal conditions for a specific cell line or species, only the general roles of each component of common basal media including glucose, amino acids, inorganic salts, vitamins, and buffers are briefly discussed below. Glucose Glucose (specifically D-glucose) is the most common energy input used in cell culture, although some media formulations use galactose or a combination of glucose and its metabolite, pyruvate. Industrially, it is produced enzymatically using amylase enzymes to breakdown starches from maize, potato, wheat, and other crops into constituent sugars used in various downstream products such as industrialized food, fermentation processes, or in this case, culturing of cells. Glucose enters the cell via transporter proteins on the cell surface, using either passive transport down its concentration gradient (more common) or ATP-dependent active transport. Once inside the cell, it serves as a reducing agent against oxidative stress in the form of NADPH generation via the pentose phosphate pathway, as well as a primary source of energy in the form of ATP generation via glycolysis. In cell culture, glucose is used at concentrations between 5.5 and 55 mM, where the lower end is more common and similar to fasting blood glucose levels in humans. Different cell types will require different amounts of glucose. During periods of rapid cell proliferation and growth, as typically maintained during bioprocessing, glucose metabolism is high and can yield lactic acid even in the presence of sufficient oxygen, leading to pH changes.5 Thus, glucose and lactic acid levels are commonly measured and tightly controlled throughout a bioprocess (discussed in Series II). Amino Acids Amino acids are necessary to create proteins and other low molecular weight compounds such as nucleotides and small peptides. Amino acids can be split into two groups: essential and non-essential. Non-essential amino acids (NEAAs) can be synthesized de novo by an animal, whereas essential amino acids (EAAs) must be obtained through the diet. Generally speaking, pathways for the de novo synthesis of NEAAs are conserved in vertebrate species.6 In humans and many other animals, the EAAs include histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine. NEAAs include alanine, arginine, asparagine, aspartate, cysteine, glutamate, glutamine, glycine, proline, serine, taurine, and tyrosine. However, EAA requirements can vary between species. For instance, dogs, cows, and pigs have the same EAA requirements as humans plus arginine, whereas cats and chickens require the same EAA as the former plus taurine and glycine, respectively. Importantly, what is considered to be “essential” in cell culture is different than what is considered “essential” to a whole organism, as the diversity of cell types that may synthesize certain amino acids in vivo are not present in vitro. For instance, Eagle’s Minimum Essential Medium formulation lists 13 (L-enantiomer) amino acids as being essential across multiple cell lines in vitro: arginine, cysteine, glutamine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, tyrosine, and valine. As an example, arginine is essential in vitro as its biosynthesis in vivo primarily occurs between epithelial cells in the gut and proximal tubule cells of the kidney. Thus, arginine must be supplied in the absence of these cell types. Media that are particularly nutrient-rich (eg. DMEM/F12 or Medium 199) may contain all amino acids. Alternatively, NEAAs can be supplemented independently. Industrialized production of amino acids can be obtained through bulk extraction from protein hydrolysates (discussed later), chemical synthesis, or microbial fermentation and purification, with the latter being the most common.7 Amino acids enter the cell through a variety of transporter proteins on the cell surface, at rates influenced by the cell’s state and consumption rates due to protein production levels, cell cycle state, and other parameters. Once inside the cell, amino acids serve as substrates for many biosynthetic pathways and optimal concentrations are important for maintaining metabolic equilibrium. The majority of carbon mass in proliferative cells is derived from bulk amino acids rather than glucose or L-glutamine, which are the most rapidly metabolized.8 Ultimately, the levels of amino acids required for cell culture are determined not only by their utilization by the growing cells, but also by individual amino acid solubility, stability, and interaction with other medium components such as metal cations, all of which can change once in a complex mixture.9 Consideration for all of these variables is highly complex and a full understanding of amino acid behavior, utilization, and optimization in a bioprocess has yet to be accomplished. Given the variety of biosynthetic pathways that involve amino acids, it is likely that amino acid content, concentration, and perfusion rate (when applicable) will need to be optimized for a particular bioprocess across species and cell types for parameters such as growth rates or protein content in the final product. Computational approaches to model specific utilization rates of amino acids and other basal media components are an active area of research10 (discussed later). L-glutamine L-glutamine deserves special consideration as one of the most important amino acids included in cell culture media, as it is readily transported into cells and becomes a major contributor to protein biomass. It is a notable precursor of carbon and nitrogen-containing biomolecules such as the intermediate molecules used in the synthesis of other amino acids and nucleotides11 and it can be added at concentrations 3-40x higher than other amino acids in the medium.12 During times of high cellular growth and proliferation, the demand for glutamine outpaces its supply, making it de facto an essential amino acid that can be readily metabolized as a replenishing alternative energy source (i.e. anaplerosis). At physiological pH in a cell culture medium solution, L-glutamine is unstable, resulting in its decomposition into pyroglutamate and ammonia, the latter of which is toxic to cells. Ammonia, therefore, is a tightly monitored and regulated metabolite in large scale bioprocesses that involve high densities of cells undergoing rapid growth (discussed in Series II). In order to avoid some of these disadvantages of L-glutamine, glutamate — which is more stable in solution — can be substituted in when working with cells expressing high levels of glutamine synthetase, an enzyme which enables intracellular conversion of glutamate to glutamine while consuming ammonia in the process. A more common practice involves supplementation with L-glutamine as a stable dipeptide in the form of alanyl-glutamine (i.e. GlutaMAX) or glycyl-glutamine, which enable cells to endogenously cleave the dipeptide for more controlled usage of the amino acids in the dipeptide. There is still much to learn about amino acid metabolism in cell culture. For instance, recent discoveries suggest L-glutamine is entirely dispensable for the culture of pluripotent stem cells.13 Inorganic Salts The inclusion of inorganic salts) is important in establishing and maintaining the osmolarity of the cell with its surrounding cell culture medium solution as well as serving as enzymatic cofactors and important components of receptor and extracellular matrix proteins. These inorganic salts are composed of cations and anions that fully dissociate in solution. The original minimal essential medium solution contained six inorganic salts (calcium chloride, potassium chloride, magnesium sulfate, sodium chloride, sodium phosphate, and sodium bicarbonate), which are based on Earle’s salt solution. Other formulations include additional inorganic salts containing zinc, copper, and iron, which have particular importance for a variety of cellular functions (discussed later). Although all cells maintain a resting membrane potential, excitable cells such as neurons and skeletal muscle cells are particularly sensitive to changes in ionic concentrations that can readily affect their functionality and viability. Several basal medium formulations have thus been optimized for salt concentrations for neuronal14 and skeletal muscle cell culture that more accurately recapitulate the interstitial fluids surrounding these cell types. The osmolality or measurement of osmotic pressure within the medium is typically between 260 to 320 mOSM/kg (milliosmoles per kg of solute), although this can vary with cell lines that are particularly robust in varying solute concentrations such as insect cells.15 Changes in the salt concentration, either abruptly due to medium changing or slowly due to water evaporation, can lead to osmotic shock. Thus, maintenance of osmolarity is an important component of cell culture. Vitamins Vitamins are classes of organic compounds that serve as a critical component for the maintenance and growth of cells. Most vitamins are essential in that they need to be obtained directly from the diet or cell culture medium with few exceptions (e.g. vitamin D synthesized by fibroblasts and keratinocytes of the skin or some B vitamins produced in low levels by intestinal microbiota). Vitamins are classified as either fat-soluble or water-soluble and can serve broadly as enzymatic cofactors, antioxidants, and hormones. Vitamins are processed in a variety of ways in vivo following ingestion, often in a complex sequence that ends in absorption into intestinal cells via membrane surface transporters. This complex sequence involved in absorption can be largely avoided in vitro, as hostile environments (e.g. stomach acid) or barriers (e.g. the blood-brain-barrier) are absent.16 Thus, vitamins are typically included in a medium formulation as a single chemical compound that can be processed and absorbed directly by cells in vitro. Vitamins can also effectively function as a group of compounds (i.e. vitamers) where each compound can serve the vitamin’s functional role, albeit with varying properties. The natural production of vitamins in microbes and plants has made industrial production of vitamins via microbial fermentation possible, however, improvements in metabolic engineering strategies are needed to increase yields and sustainability in the industry. For these reasons, some vitamins are produced more efficiently via chemical synthesis.17 Water-soluble vitamins including riboflavin (vitamin B2), nicotinamide (vitamin B3), pantothenic acid (vitamin B5), pyrodoxine and pyridoxal (vitamin B6), biotin (vitamin B7), i-inositol (vitamin B8), folic acid (vitamin B9), cyanocobalamin (vitamin B12), and choline are typically added to and “essential” in cell culture media, sometimes in various modified forms in order to provide stability. Fat-soluble vitamins A, D, E, and K are excluded in basal medium formulations but can be added if necessary when dissolved in an organic solvent. Similar to the different in vivo versus in vitro requirements of amino acids, fat-soluble vitamins play specific roles for certain cell types or bodily functions and are thus only “essential” when culturing a relevant cell type. For instance, a metabolite of vitamin A, retinoic acid, is an important developmental morphogen (discussed in detail later) and may be included as an additive in media to derive spinal motor neuron cells from pluripotent stem cells.18 Special consideration for stability must be taken when using serum-free medium formulations (discussed later) as the lack of stabilizing serum proteins can lead to rapid degradation via light, heat, oxidation, or pH fluctuations.19 These properties make it advisable to reconstitute powdered B vitamins immediately before use (discussed later). Buffering Systems Buffers are essential to cell culture systems as they serve to maintain pH at a constant level (for mammalian cells, generally 7.4 ± 0.4) despite changes in the composition of acids or bases that would otherwise alter the pH of the cell culture medium. Buffers are mixtures of a weak acid and its conjugate base or a weak base and its conjugate acid, where each mixture serves as a sponge to soak up free protons or hydroxide ions in solution, minimizing their effect on overall pH. Buffer systems in cell culture typically consist of either CO2-bicarbonate systems or buffering agents such as HEPES. As discussed in Series II, a CO2-bicarbonate system can be achieved by exogenous addition of 5-10% gaseous CO2 (often delivered in bioreactor systems via sparging)), which reaches equilibrium in solution with bicarbonate ions, forming a natural buffer system. pH slowly changes over time due to the respiration of cells and the release of additional CO2, which forms carbonic acid in solution, in addition to the metabolism of glucose and the formation of lactic acid. The resultant decreasing pH changes are counteracted by the inclusion of sodium bicarbonate in the basal medium itself. Importantly, added sodium bicarbonate should be proportional to the atmospheric CO2 being used to maintain equilibrium. For instance, for media containing 1.5 to 2.2 g/L sodium bicarbonate, 5% CO2 is recommended, whereas 10% CO2 is recommended for media containing 3.7 g/L sodium bicarbonate. HEPES is a zwitterionic buffer that can be used in cell culture systems as a supplemental buffer, especially in the absence of CO2 exposure. As one of Good’s buffers, its high solubility, low toxicity, and membrane impermeability have made it attractive for use in cell culture applications. In the scale-up of highly proliferative stem cell populations, dissolved CO2 due to high metabolism can reach levels that are deleterious for cell growth and nutrient utilization.21 Attempts have thus been made to limit dissolved CO2 by culturing cells in the presence of atmospheric CO2 levels with added buffering capacity from HEPES or other Good’s buffers.22 This strategy may be useful for future scale-up efforts in cell-based meat. Consideration for the cost of the buffer must also be weighed, as it may constitute the most expensive component of a basal media formulation at scale. Preparation Out of convenience, most academic and lab-scale cell culture is performed using commercially available premade liquid media. However, large volumes necessitate on-site preparation of liquid cell culture media from reconstituted powdered medium ingredients. Powdered medium is more efficiently transported and stored, resulting in cost savings and reduced degradation of fragile ingredients (e.g. B vitamins). Ideally, a powdered medium contains all of the components to be utilized and is created through a process known as micronization, where the average size of crystallized particles in the mix is reduced in order to increase solubility and homogeneity. When ready to use, the powder is typically reconstituted in a dedicated tank using high-quality water prepared by reverse osmosis, deionization, and filtration. The reconstituted medium is then itself sterilized by filtration (e.g. through a 0.22 µm filter), irradiation, or other methods discussed in Series II (e.g. pulsed electric fields). The use of sterilization involving high heat is precluded by some heat-labile ingredients that may be part of the formulation. Other preparation methods for additional ingredients are discussed throughout. Serum As previously mentioned, a basal medium formulation is often sufficient to keep cells alive for short periods of time, but in order for them to proliferate efficiently over extended periods of time, a variety of animal sera) (e.g. fetal bovine serum, horse serum, and others) and extracts (e.g. chick embryo extract) have historically been used (notably, on a volumetric basis, serum-free formulations are now more dominant in their usage although FBS is still often included in routine cell culture in academic settings). Serum is a high protein-containing mixture that contains growth and attachment factors, hormones, antioxidants, lipids, and other components (all described later) that mimic a proliferative, fetal-like state. Indeed, most sera used in cell culture are derived from fetal animals, which are rich in the necessary components and contain low immunoglobulin and complement content due to developmentally immature immune systems. As fetal bovine serum (FBS) is the most common sera used in cell culture, it will be used as a reference example throughout this section. Originally employed in the late 1950s,24 FBS has become a mainstay in biomedical research because it can supplement the growth of virtually all common human, animal, and even insect cell lines. As an added supplement for many cell culture applications in amounts typically 5-20% of total medium volume, FBS — when used — is often the most expensive part of performing cell culture. FBS is harvested from a fetal calf any time during the last two-thirds of gestation following the discovery of pregnant cows due for slaughter. It has been estimated that up to 8% of cows in the slaughter line may be pregnant, making FBS a byproduct of the meat processing industry.25 It is prepared by the sterile collection of fetal blood followed by coagulation at low temperatures and centrifugation to remove clotting factors and blood cells. The serum supernatant is then filtered and assessed for a variety of quality controls including residual microbial or viral contamination, endotoxin, immunoglobulin content, and total protein, before being bottled and sold commercially, at prices exceeding $1000 USD per liter (at time of writing, July 2019) depending on quality control parameters (some described later), which vary by industry and use-case. Despite its long history of use, FBS has several well-described issues that have made its replacement a priority in recent years. First, FBS contains hundreds or even thousands of different components and the true composition and amounts of these components are unknown, making it a chemically undefined product. The composition also varies by geographic region where a cow’s diet can vary, by batch within the same geographic region, by seasonality of collection, by the quantity and identity of antibiotics or hormones received by the mother, and by the gestational age of the fetus. Variability can also stem from a single bottled product originating from fetuses of different sexes.26 This variability has led to a growing concern over serum’s contribution to irreproducibility of in vitro experiments within and between labs around the world.27 Rigorous quality control involving testing of serum batches across multiple cell lines or experiments prior to purchasing a specific, well-performing large batch is often performed in industry but can remain burdensome from a labor and economic perspective for smaller academic labs. Thus, the inherent variability and undefined nature of FBS use leads to compounding external costs in quality control testing, experimental irreproducibility or conflicting results, and follow-up research to dissect irreproducible signals. Second, FBS is a potential source of contamination from multiple organisms, including Mycoplasma, viruses, and bovine spongiform encephalopathy. Mycoplasma are a class of parasitic bacteria that lead to metabolic and gene expression variations for infected cell lines. Mycoplasma are likely the most common cell line contaminant, with recent estimates showing 11% of cell lines being infected, and rates as high as 70% in geographical regions where testing is not routine.28 Although presently FBS is routinely filtered using 0.1 micron systems that should theoretically capture Mycoplasma, suppliers cannot make this guarantee. The common cell line contaminants M. arginini and A. laidlawii, in particular, have been linked in origin to FBS, and ongoing cross-contamination of cell lines has likely propagated this contamination in laboratories since the 1960s and 1970s when FBS batches were routinely positive for these bacteria.29 Additional methods to decontaminate serum from Mycoplasma include gamma irradiation, however, this can also damage growth factors and other proteins in the serum.30 Thus, the use of FBS is responsible for a non-trivial amount of bacterial contamination in cell lines today, leading to compounding problems concerning reproducibility and potential unknown variability stemming from some decontamination practices. In addition to bacterial contamination, the threat of adventitious viral agents in FBS also persists. Regulations under USDA and the EU mandate the testing and/or treatment (via heat or irradiation) of eight viruses known to be present in FBS from all geographical regions of origin.31 Although modern production methods make the risk of contamination in a validated batch low, viral contamination is often still detectable in batches that manufacturer screens claim to be negative.32 Similarly, the threat of FBS containing the causative prion proteins involved in bovine spongiform encephalopathy (i.e. Mad Cow Disease, which manifests in humans as variant Creutzfeldt-Jakob Disease) is persistent and requires additional testing as well as documented traceability for the FBS origin. For instance, countries such as the USA, New Zealand, and Australia have no documented cases of bovine spongiform encephalopathy; thus FBS originating from these countries may be considered ‘safer,’ often commanding significantly higher prices and collectively comprises up to 90% of the serum supply for commercial therapeutics.33 This fact has also incentivized fraudulent activity in the field, where manufacturers may opt for fake labels from New Zealand in order to solicit higher prices.34Industry associations have formed in an attempt to mitigate these concerns. Nevertheless, the inherent risk of contamination from FBS poses threats to experimental and bioprocess reproducibility, drives price fluctuations, and can even incentivize bad actors that value profit over safety. Contamination will be discussed further from a food safety perspective in Series V. Third, there is a limited global supply of FBS and there exists competition for it from profitable, mature industries. For instance, while the vaccine and biologics industries have begun to move to serum-free formulations (discussed later), the rise of cell therapies and stem cell research more generally has ushered in an impending demand that exceeds current availability. Because FBS is a byproduct of a more lucrative product per animal (i.e. meat and dairy) and profits are retained by slaughterhouses rather than farmers, farmers have little incentive to increase cattle herds to meet a future FBS demand.35 It has thus been hypothesized that “peak serum” has been met, with serum availability relatively stagnant and serum demand increasing dramatically as cell therapies begin to be approved.36 The replacement of serum thus may be driven first by limited total availability followed by cost concerns that will spur replacement innovation in the field as non-pharmaceutical players are priced out. In the case of cell-based meat, this cost concern is already prohibitive, making FBS an economic nonstarter as meat products cannot be justified at prices that rival a cell-based therapeutic (currently at a cost of goods of approximately $50,000 and selling price of hundreds of thousands of dollars). Lastly, the use of FBS carries ethical concerns, making its use inherently misaligned with one of the fundamental benefits of cell-based meat: animal welfare (discussed in Series VI). A single liter of serum requires 1-3 fetuses, with roughly 2 million fetal calves used in serum collection annually, totaling approximately 800,000 liters of FBS produced per year. The collection process involves removal of the fetus from the mother’s womb and aseptic collection of blood by a syringe placed directly into the beating heart as this contains unclotted blood, raising concerns that the fetus could consciously experience the event as painful.37 Thus, the search for serum-free formulations (discussed later) is in alignment with the cell-based meat industry and general animal welfare concerns, manifested by replacement, reduction, or refinement of animal experiments or animal-based products in science. The next series on cell culture medium will explore the components of serum that have made it a near-universal cell culture supplement and approaches for replacing serum in a cost-effective manner. About / Disclosure Elliot Swartz, Ph.D. (e_swartz) is the author and is employed by The Good Food Institute, a 501(c)3 nonprofit using markets and innovation to accelerate the plant-based and cell-based meat sectors. Feel free to ask anything about the science discussed or how to get more involved in the future of food. Many questions will additionally be addressed in upcoming discussion topic series!
[Professional numerical analysis] Numerical technique to solve the differential equation [w(x) + A]f(x) = g(x), where A is a shift invariant (highpass) linear operator, for f(x)?
I've come across several examples of this equation in my work in medical imaging. I'm having a lot of trouble solving it equation accurately.
[w(x) + A]f(x) = g(x)
where w(x) is a positive function, A is a positive definite symmetric shift invariant linear operator, f(x) is the function I'm trying to solve for, and g(x) is a given function. If w=0 I could solve the equation in one shot by taking a Fourier transform. If A=0 I could solve the equation in one shot by just dividing. My go to numerical method for this case would be conjugate gradients, since I'm dealing with a positive definite symmetric operator. But it seems to be numerically unstable. Is there a good way of solving this equation that can take advantage of the structure of this problem?
If you had to recommend ONE book for mathematical background?
TL;DR Somewhat experienced practitioner looking for graduate level treatment of topics in math/stats used in ML. Hi, I have a CS background, so sometimes I find myself lacking the math/stats knowledge to understand certain topics. I'm looking for a book to learn things at the graduate/PhD level in these topics:
Statistics
Calculus
Optimization
Linear algebra
Is there ONE book that is along the lines of "necessary mathematical background for ML" kind of thing? I build complex NN's in the ML research department of one of the big-4 tech companies. I give my background to clarify that I am not looking for a book to teach me what Gaussian is, how to apply Bayes rule, or how backpropagation works. To give a few examples, my hope is to gain enough background knowledge to comfortably understand topics such as:
Statistical learning theory
VC dimensions and SVMs
Proof of expressiveness of neural networks.
Optimization procedures such as conjugate gradient methods
Graduate Numerical Linear Algebra, How Best to Get Started?
I'm an Aerospace Engineering student taking Numerical Linear Algebra this semester. Though I haven't taken a sole Linear Algebra course in undergrad or at the graduate level I have gained familiarity with the topic through my coursework in Vibrations, Robotic Systems, Math Methods in Physics, and Graduate Math Methods in Engineering. As an engineer most of my math courses have been very much based in example and while it made things easier and quicker to pick up, my formal knowledge of mathematics is lacking. I have just about every textbook my professor recommended for the course:
Matrix Computations. Johns Hopkins
Accuracy and stability of numerical algorithms. Higham
Analysis of numerical methods. Isaacson and Keller
Introduction to numerical analysis. Stoer and Bulirsch
Linear algebra and its applications. Strang
Numerical linear algebra. Trefethen and Bau
From the outset I want a good resource that can help me pick up on mathematics notation to review as well as fundamental concepts in linear algebra/numerical analysis. Also if anyone can give me advice on what books to start reading now or where to start that would be much appreciated. As you can probably tell I'm nervous about the course, I really want to learn the material and get an A so I'm trying to get started as early as possible learning what I need to know. The topics we will be covering include;
Linear Vector Space; Schur decomposition theorem
Gershgorin Theorem
Min-Max Theorem (Courant-Fischer)
Relations Between Spectral Radius and Norms
LU Decomposition and Partial Pivoting
Choleski and QR decomposition
Idempotent and Projection Matrices
Singular Value Decomposition
Classical Linear Iterations (Jacobi, Gauss-Seidel, SOR)
Krylov subspaces and steepest descent method
Convergence and Conjugate Gradient method and preconditioning
MINRES and GMRES methods
Conditioning of eigen-problems
Power method
Hessenberg reduction and QR algorithms
Krylov space methods for eigenvalue problems
P.S. I apologize if this is inappropriate to post here, I really wasn't certain.
I've been hitting my head against this problem for a while now, and I'm about to give up and just use the method which I have that performs. However, I think I also have evidence that something about my implementation is broken. I'm asking for help here because my options for soliciting feedback/advice are pretty limited, so apologies for the multiple posts on the same subject matter. Here's an album of some simple experimental results based on building a 25 hidden layer unit autoencoder, and training it with 8x8 grayscale images from Bruno Olshausen's whitened natural images dataset: http://imgur.com/a/zuzJO Ideally, such an autoencoder should resolve 25 edge detectors in this configuration. The first image shows this, and it's the result of training the network with "stochastic gradient descent", i.e. simple fixed-step gradient descent wherein the batch size is low (100 training examples), and only one step is taken per batch. The second figure shows the objective function versus the training iteration, and you can see the random walk downwards over 24,000 batch iterations. This took a little over 2 minutes to run. The last picture is a typical example of the results I get from running any of three algorithms in a more typical fashion (i.e. with a batch size equal to the training set size, with multiple steps taken on the batch). Both L-BFGS and Conjugate Gradient Descent manage to quickly (within 50 iterations) find a minima on the order of 0.5 (equivalent to the finishing value of stochastic gradient descent), but the result looks like the third figure. Standard gradient descent with a large batch also does this. L-BFGS in particular (I'm using the implementation from the RISO project) will iterate a few times and then fail when it has a nonzero gradient but ends up taking a step of length 0. My gradient calculation has been tested and I have high confidence that it is working properly. My objective function calculation seems to be the only thing separating CGD and L-BFGS from fixed-step gradient descent, but I've been staring at it for many hours now and it just isn't complex enough to convince me that there's a bug hidden in there. I would blame the data, but this exact experiment is solved using L-BFGS in Andrew Ng's tutorial here. I'm about to use this code on some much larger experiments and I don't want to start off with a buggy implementation, but I can't nail down where my method might be diverging from Ng's example. Any thoughts or suggestions would be appreciated.
[Graduate Numerical Linear Algebra] How Best to Get Started?
I'm an Aerospace Engineering student taking Numerical Linear Algebra this semester. Though I haven't taken a sole Linear Algebra course in undergrad or at the graduate level I have gained familiarity with the topic through my coursework in Vibrations, Robotic Systems, Math Methods in Physics, and Graduate Math Methods in Engineering. As an engineer most of my math courses have been very much based in example and while it made things easier and quicker to pick up, my formal knowledge of mathematics is lacking. I have just about every textbook my professor recommended for the course:
Matrix Computations. Johns Hopkins
Accuracy and stability of numerical algorithms. Higham
Analysis of numerical methods. Isaacson and Keller
Introduction to numerical analysis. Stoer and Bulirsch
Linear algebra and its applications. Strang
Numerical linear algebra. Trefethen and Bau
From the outset I want a good resource that can help me pick up on mathematics notation to review as well as fundamental concepts in linear algebra/numerical analysis. Also if anyone can give me advice on what books to start reading now or where to start that would be much appreciated. As you can probably tell I'm nervous about the course, I really want to learn the material and get an A so I'm trying to get started as early as possible learning what I need to know. The topics we will be covering include;
Linear Vector Space; Schur decomposition theorem
Gershgorin Theorem
Min-Max Theorem (Courant-Fischer)
Relations Between Spectral Radius and Norms
LU Decomposition and Partial Pivoting
Choleski and QR decomposition
Idempotent and Projection Matrices
Singular Value Decomposition
Classical Linear Iterations (Jacobi, Gauss-Seidel, SOR)
Krylov subspaces and steepest descent method
Convergence and Conjugate Gradient method and preconditioning
Question about the behavior of conjugate gradient descent optimization
So I'm playing around with sparse autoencoders, and I'm trying to train a simple example with conjugate gradient descent. I just witnessed some behavior I can't explain and I'm hoping someone here can help me understand what's going on. The neural network I'm training is small, and meant to solve the XOR problem. It has two inputs plus a bias on the input layer, two hidden units (plus a bias), and a single output. This creates 3*2 + 3 = 9 total weights to be trained. I have confidence that my gradient calculations are correct, because they pass the gradient estimation check described here, and are used to generate edge detectors for natural images with the backpropagation algorithm as described here. It should be a short couple of steps to train this network to solve XOR with conjugate gradient descent using my already-coded gradient calculation plus an erf() function that calculates overall network error. I'm using the Polak-Ribiere method to generate the Beta coefficient. My erf() function is more of less exactly as described at the UFLDL site. Finally, the problem: My CGD algorithm seems to be sensitive to the magnitude of the weights that I initialize the network with. When I initialize the weights with uniform random numbers in the range of [-0.1 0.1], the algorithm reliably converges on a bad local minima (all inputs result in an output of 0.5). If I hange the weight initialization to uniform random numbers of the range [-0.3 0.3], then the network converges to a state that solves XOR. What's the principle at work here? Is this kind of weight sensitivity something specific to CGD? Thanks!
position theta velocity angular-velocity force-applied-to-cart value
...Where "value" is a simple objective function whose inputs are all taken from the first four columns (x, theta, v, w). All inputs are scaled such that their mean is 0 and they range more or less within [-3, 3]. The output is scaled such that the mean is 0.5, and all values fall within the interval [0.2, 0.8]. A 5-25-1 feedforward network tasked with learning the value function and trained in Matlab can converge on an almost perfect solution with Levenberg-Marquardt or Scaled Conjugate Gradient descent very quickly. However, using very similar network architecture in my own code (one difference being that my output neuron is a sigmoid, while Matlab's output neuron is linear) SGD and RMSprop fail to converge to a good answer. I've tried minibatches with SGD, using the entire dataset per epoch, and lots of different learning rates and learning rate decay values. I've spent a similar amount of time tweaking hyperparameters with RMSProp. RISO's LBFGS implementation also fails with this dataset, although I haven't put as much time into playing with it. I see three possibilities: 1.) A bug in my code. This is of course the thing I've been most suspicious of, but I'm begining to doubt this is the cause. My code passes this test, and successfully extracts gabor shapes from natural images when used for autoencoding. It also passes simpler tests, like learning XOR* . 2.) Something about this dataset is particularly difficult for stochastic methods. This seems unlikely; if you put together a scatter plot of value-vs-theta-vs-omega, you can see that it's a rather simple structure. 3.) SGD and RMSProp are incredibly sensitive to hyperparameter values, or perhaps weight initialization, and I've just been setting them wrong. Right now I'm initializing weights with a uniform random variable -1 I'm hoping someone can give me some insight into why my SGD and RMSProp are failing here. This should be an easy problem, but I can't find anything to point to that's demonstrably wrong. *: In regards to XOR, my code also seems to be very senstve to hyperparamters when solving this. A 2-3-1 network needs over 10000 iterations to converge, and won't do so if the batch size is anything other than 1. Starting froma configuration that converges, and reducing the learning rate by a decade and also increasing the number of training iterations by a decade does not result in a network that also converges. This seems wrong.
The conjugate Gradient method finds the solution of a linear system of equations by stepping to the solution in conjugate directions. The theory, derivations to the fast implementation and an interactive example are found here. Biconjugate Gradient Method. The conjugate gradient method is not suitable for nonsymmetric systems because the residual vectors cannot be made orthogonal with short recurrences, as proved in Voevodin (1983) and Faber and Manteuffel (1984). The generalized minimal residual method retains orthogonality of the residuals by using long recurrences, at the cost of a larger storage demand. The conjugate gradient method vs. the locally optimal steepest descent method. In both the original and the preconditioned conjugate gradient methods one only needs to set := in order to make them locally optimal, using the line search, steepest descent methods. With this substitution, vectors are always the same as vectors , so there is no need to store vectors . Exact method and iterative method Orthogonality of the residuals implies that xm is equal to the solution x of Ax = b for some m ≤ n. For if xk 6= x for all k = 0,1,...,n− 1 then rk 6= 0for k = 0,1,...,n−1 is an orthogonal basis for Rn.But then rn ∈ Rn is orthogonal to all vectors in Rn so rn = 0and hence xn = x. So the conjugate gradient method finds the exact solution in at most The conjugate gradient method is a mathematical technique that can be useful for the optimization of both linear and non-linear systems. This technique is generally used as an iterative algorithm, however, it can be used as a direct method, and it will produce a numerical solution. Generally this method is used for very large systems where it is Conjugate Gradient Method • direct and indirect methods • positive definite linear systems • Krylov sequence • spectral analysis of Krylov sequence • preconditioning EE364b, Stanford University SolutionofAx = b Keyproperty: A1b 2Kn thisholdsevenwhenKn, Rn fromCayley–Hamiltontheorem, p„A”= An + a1An1 + + anI = 0 wherep„ ”= det„ I A”= n + a1 n1 + + an1 + an multiplyingontherightwithA1b shows A1b = 1 an An1b+ a 1A n2b+ + a n1b Conjugategradientmethod 13.4 The conjugate gradient method can be applied to an arbitrary n-by-m matrix by applying it to normal equations A T A and right-hand side vector A T b, since A T A is a symmetric positive-semidefinite matrix for any A. The result is conjugate gradient on the normal equations (CGNR). A T Ax = A T b The conjugate gradient converges quadratically, which makes it an outstandingly fast. If someone is interested in the theory of conjugate gradient and also in the implementation details I would like to forward you to the amazing paper written by Jonathan Richard Shewchuk called An Introduction to the Conjugate Gradient Method Without the The Conjugate Gradient Method is an iterative technique for solving large sparse systems of linear equations. For the following example for linearizing the one-dimensional heat equation, the Forward Di erence Method is utilized. Note that this process will work for all linear PDEs.
In this tutorial I explain the method of Conjugate Gradients for solving a particular system of linear equations Ax=b, with a positive semi-definite and symm... Video lecture on the Conjugate Gradient Method This is a brief introduction to the optimization algorithm called conjugate gradient. Conjugate gradient method In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is symmetric ... A brief overview of steepest descent and how it leads the an optimization technique called the Conjugate Gradient Method. Also shows a simple Matlab example ... In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix...