Random Thoughts of Mine (Unpolished, but feel free to browse)

How do you operationalize a measurement anyways?
byX on Saturday, January 29, 2011 at 2:15pm
First of all, you have to define your output. It can be a scalar, a vector, or a tensor.

Of course, you can simply determine which criteria are relevant and output these criteria as yet another vector of relevant criteria. That preserves information, and is the most useful metric for people who are attuned to comparing the specific facets of this criteria.

Or you can apply a metric (a distance metric such as the 2-norm – but it can also be a p-norm or a Hamming Distance or whatever) and output a score as a scalar value.

And you can assign different weights to each criteria (there are many different ways to weight, not all as simple as the case where the coefficients of the weights all sum up to 1). This, of course, presupposes that the criteria take the form of scalar quantities. Sometimes, the relevant criteria is more complex than that.

And like what the NRC did with its grad program rankings, you can produce multiple outputs (R and S criterion). Actually the NRC did something more complicated than that. For the R criteria, it asked professors which schools are most prestigious. Then it looked at the criteria characteristic of more prestigious schools, and ranked schools according to which ones had the “highest amount” of the criteria that were most prevalent in the more prestigious schools. Then for the S criteria, it simply asked professors which criteria were most important, and ranked schools according to how much they contained desirable criteria. And of course, users can rank schools in any order they want to simply by assigning weights to each value (although I think this is the simple case where all coefficients sum up to 1). In terms of validity, I’d trust the S metric over the R metric since it is less suspect to cognitive biases. [1]

This, of course, is what’s already done and well-known. There are other ways of operationalizing output too. What’s often neglected – are the effects of the 2nd-order interactions. Maybe there’s a way to use 2nd-order interactions to operationalize output (maybe there are ways that they are already done).

Of course, even 2nd-order interactions are not enough. 2nd-order interactions tend to be commutative (in order words, order does not matter). However, there are situations where the order of the 2nd (and higher) order interactions does matter.

And then, of course, you also have geometric relationships to consider. Geometric relationships may be as simple as taking the “distance” between two criteria (in order words, a scalar value that’s the output of some metric). Or they could be more complex.

And another relationship too: probabilities. Every measurement is uncertain and these uncertainties may also have to be included in our operationalization.

Also, weights are not simply scalars. Weights can also be functions. They can be functions of time, or functions of the particular person, or whatever. These multidimensional weights must still sum to 1, so when the weight of one goes up, the weight of the others goes down.

Also, even scalar outputs are not necessarily scalars. Rather, they can be outputs of probability distributions (again, I was quite impressed with the uncertainty range/confidence interval outputs of the NRC rankings)

Maybe studying ways to *operationalize* things is already a domain of intense interdisciplinary interest.


Anyways, some examples:

The DSM-IV is fundamentally flawed. Of course, every
operationalization is flawed – some are still good enough to do what they do despite being flawed. And why is that? Because in order to get diagnosed with some condition, you must only fulfill at least X of Y different criteria. There’s absolutely nothing about the magnitude of each symptom, or the environmental-dependence of each symptom, or the interactions each symptom can have with each other.

Furthermore, each operationalization (or definition, really) should take in parameters to specify potential environmental-dependence (the operationalization may have VERY LITTLE change when you vary the parameters, OR it could change SIGNIFICANTLY when you vary the parameters). I believe that people are currently systematically underestimating the environmental-dependence of many of their definitions/operationalizations. You can also call these parameters “relevance parameters”, since they depend on the person who’s processing them.

[1] This dichotomy is actually VERY useful for other fields too. For example – rating which forumers are the funniest in a forum


Skepticism regarding coarse mechanisms
by X on Saturday, January 29, 2011 at 1:46pm
I like trying to elucidate neurobiological/cognitive/behavioral mechanisms at a finer structure. Coarse mechanisms are often less robust (we don’t know how they really interact, and in the presence of a DIFFERENT environment, then the preconditions of these mechanisms may uncouple – we may not even know the preconditions of these mechanisms). The reason is this: necessary conditions are inclusive of BOTH preconditions and the GEOMETRY+combinatorial arrangements of these preconditions – where these preconditions are with respect to each other. This is a good reason to be skeptical of the utility of measuring things that we only measure coarsely, such as IQ, since we still don’t know the GEOMETRY+combinatorial arrangements of the preconditions of high IQ (that being said, I still do believe most of the correlations in this particular environment)

In other words, having all the preconditions is not sufficient to explain the mechanism. You have to have the preconditions *arranged* in the *right* way – both geometrically and combinatorially (inclusive of order effects, where the preconditions have to be
temporally/spatially arranged in the right way – not just withe each other – but also with the rest of the environment).


If the mechanism preserves itself even *after* we have updated information of its finer structure, then yes, we can update our posterior probability of the mechanism applying.


Now, how is research in the finer mechanisms done? Is it more likely than others to be Kuhnian normal science or Kuhnian revolutionary science? Studying combinatorial/geometric interactions can be *very* analytically (and computationally) intensive.

What’s also interesting: finer mechanisms tend to be more general. Even though finer = smaller scale. But it often takes large numbers of measurements before someone has enough data to elucidate the finer mechanism.

Is there a way to quantify the relationship between Person1's Map and Person2's Map?

Okay, so maybe you could say this.

Suppose you have an index I. I could be a list of items in belief-space (or a person's map). So I could have these items (believes in evolution, believes in free will, believes that he will get energy from eating food, etc..) Of course, in order to make this argument more rigorous, we must make the beliefs finer.

For now, we can assume the non-existence of a priori knowledge. In other words, facts they may not explicitly know, but would explicitly deduce simply by using the knowledge they already have. 

Now, maybe Person1 has a map in j-space with values of (0,0,0.2,0.5,0,1,...), corresponding to the degree of his belief in items in index I. So the first value of 0 corresponds to his total disbelief in evolution, the second corresponds to total disbelief in free will, and so on.

Person2 has a map in k-space with values of (0,0,0.2,0.5,0,0.8, NaN, 0, 1, ...), corresponding to the degree of his belief in everything in the world. Now, I include a value of NaN in his map, because the NaN could correspond to an item in index I that he has never encountered. Maybe there's a way to quantify NaN, which might make it possible for Person1 and Person2 to both have maps in the same n-space (which might make it more possible to compare their mutual information using traditional math methods).

Furthermore, Person1's map is a function of time, as is Person2's map. Their maps evolve over time since they learn new information, change their beliefs, and forget information. Person1's map can expand from j-space to (j+n)th space, as he forms new beliefs on new items. Once you apply a distance metric to their beliefs, you might be able to map them on a grid, to compare their beliefs with each other. A distance metric with a scalar value, for example, would map their beliefs to a 1D axis (this is what political tests often do). A distance metric can also output a vector value (much like what a MBTI personality test could do) to a value in j-space. If you simply took the difference between the two maps, you cold also output a vector value that could be mapped to a space whose dimension is equal to the dimension of the original map (assuming that the two maps have the same dimension, of course). 

Anyways, here is my question: Is there a better way to quantify this? Has anyone else thought of this? Of course, we could use a distance metric to compare their distances with respect to each other (of course, a Euclidean metric could be used if they have maps in the same n-space.


As an alternative question, are there metrics that could compare the distance between a map in j-space with a map in k-space (even if j is not equal to k)? I know that you have p-norms that correspond to some absolute scalar value when you apply the p-norms to a matrix. But this is sort of difference. And could mutual information be considered a metric?

The "map" and "territory" analogy as it pertains to potentially novel territories that people may not anticipate

So in terms of the "map" and "territory" analogy, the goal of rationality is to make our map correspond more closely with the territory. This comes in two forms – (a) area and (b) accuracy. Person A could have a larger map than person B, even if A’s map might be less accurate than B’s map. There are ways to increase the area of the territory – often by testing things in the boundary value conditions of the territory. I often like asking boundary value/possibility space questions like "well, what might happen to the atmosphere of a rogue planet as time approaches infinity?", since I feel like they might give us additional insight about the robustness of planetary atmosphere models across different environments (and also, the possibility that I might be wrong makes me more motivated to actually spend additional effort to test/calibrate my model more than I otherwise would test/calibrate it). My intense curiosity with these highly theoretical questions often puzzles the experts in the field though, since they feel like these questions aren’t empirically verifiable (so they are considered less "interesting"). I also like to study other things that many academics aren’t necessarily comfortable with studying (perhaps since it is harder to be empirically rigorous), such as the possible social outcomes that could spring out of a radical social experiment. When you’re concerned with maintaining the accuracy of your map, it may come at the sacrifice of dA/dt, where A is area (so your Area increases more slowly with time).

I also feel that social breaching experiments are another interesting way of increasing the volume of my "map", since they help me test the robustness of my social models in situations that people are unaccustomed to. Hackers often perform these sorts of experiments to test the robustness of security systems (in fact, a low level of potentially embarrassing hacking is probably optimal when it comes to ensuring that the security system remains robust – although it’s entirely possible that even then, people may pay too much attention to certain models of hacking, causing potentially malicious hackers to dream up of new models of hacking).

With possibility space, you could code up the conditions of the environment in a k-dimensional space such as (1,0,0,1,0,…), where 1 indicates the existence of some variable in a particular environment, and 0 indicates the absence of such variable. We can then use Huffman Coding to indicate the frequency of the combination of each set of conditions in the set of environments we most frequently encounter (so then, less probable environments would have longer Huffman codes, or higher values ofentropy/information).

As we know from Taleb’s book "The Black Swan", many people frequently underestimate the prevalence of "long tail" events (which are often part of the unrealized portion of possibility space, and have longer Huffman codes). This causes them to over-rely on Gaussian distributions even in situations where the Gaussian distributions may be inappropriate, and it is often said that this was one of the factors behind the recent financial crisis.

Now, what does this investigation of possibility space allow us to do? It allows us to re-examine the robustness of our formal system – how sensitive or flexible our system is with respect to continuing its duties in the face of perturbations in the environment we believe it’s applicable for. We often have a tendency to overestimate the consistency of the environment. But if we consistently try to test the boundary conditions, we might be able to better estimate the "map" that corresponds to the "territory" of different (or potentially novel) environments that exist in possibility space, but not yet in realized possibility space.

The thing is, though, that many people have a habitual tendency to avoid exploring boundary conditions. The fact is, that the space of realized events is always far smaller than the entirety of possibility space, and it is usually impractical to explore all of possibility space. Since our time is limited, and the payoffs of exploring the unrealized portions of possibility space uncertain (and often time-delayed, and also subject to hyperbolic time-discounting, especially when the payoffs may come only after a single person’s lifetime), people often don’t explore these portions of possibility space (although life extension, combined with various creative approaches to decrease people’s time preference, might change the incentives). Furthermore, we cannot empirically verify unrealized portions of possibility space using the traditional scientific method. Bayesian methods may be more appropriate, but even then, people may be susceptible to plugging the wrong values into the Bayesian formula (again, perhaps due to over-assuming continuity in environmental conditions). As in my original example about hacking, it is way too easy for the designers of security systems to use the wrong Bayesian priors when they are being observed by potential hackers, who may have an idea about ways that take advantage of the values of these Bayesian priors.

Deep Ecology and Type-b Theory of Time: (note, I still care about the environment more than others – this is merely an argument against deep ecology)

Note: this post is high on philosophical jargon. There are nice wikipedia (and stanford encyclopedia of philosophy) entries on b-theory, deep ecology, total utility, and utilitarianism.

Hypothesis: Assume b-theory of time *and* assume deep ecology *and* assume utilitarianism (as applied to deep ecology)

If you subscribe to the b-theory of time (which most philosophers of science see to do), then the “deep ecology” conception of environmentalism is flawed.

The reason being that life will inevitably be extinguished. But despite that, the Earth has still enjoyed several billion years of life (life without human intervention). Sure, many habitats are currently being destroyed. But the environment will only be destroyed for a very small portion of time compared to the total amount of time that life has enjoyed without human intervention (and according to the b-theory of time, the present is no more “significant” than the past). In fact, this will be true even if life continues on earth for 1 more billion years (and fact is, life as we know it cannot continue for more than ~1 billion years, because by then, the Sun’s luminosity will be sufficiently high enough to boil away the earth’s oceans). By then, human technology will be the only way to ensure that life will continue.

Okay sure, a deep ecologist might want to maximize the total utility of the biosphere (and argue that reducing human activity will reduce it, even though the total impact of human activity will still only be limited to a very small fraction of the total utility of the biosphere integrated over time dating back to 4.6 billion years ago). After all, f(t) = U(t)*(4*10^9 + 1000) is still bigger than f(t) = U(t)*4*10^9. So conclusion: if you subscribe to all three theories, your total impact will be very small (unless you can find a way to migrate Earth’s biosphere into another stellar system before the Sun goes red giant). In any case, if the hypothesis (heat death of the universe) holds [and the evidence for that outcome seems to be rather high], then the impact of any person will be very limited.

Of course, his judgment of total utility is subjective [it depends on how much you weigh factors such as biomass, the well-being of "sentient" creatures, and other factors].

Also, this does in no way argue against environmentalism if you’re an environmentalist due to human concerns. There are many valid reasons for that. It also doesn’t argue against deep ecology *without* the b-theory of time. Most people do not view the present as less significant to the past. It’s antithetical to human survival, after all. The philosophically unsophisticated may, for example, weigh the well-being of charismatic creatures higher than that of uncharismatic creatures. They may also weigh certain intervals of time with higher importance than other intervals of time. Many implications of utilitarianism, in any case, go against people’s moral intuitions.


The b-theory of time also brings up interesting new ways to analyze utilitarianism. Since again, the present is no more “significant” than the past. In any case, assuming b-theory of time, you can then analyze total utility by integrating the total sum of “utility” (experienced by all sentient beings) over time (you can then divide it by the total number of individuals [each weighted by their level of sentience] if you wish).

There are two fundamental ways of learning

(a) general to specific
(b) specific to general (e.g. case studies)

Scientific hypotheses (general) are motivated by experiments (specific). With data, one can hypothesize a trend and see the general hypotheses

One can also try this process mathematically, as specific results can motivate a hypothesis of the general structure, which can then be proved.

Which way is faster? It depends on person. It might be plausible that learning styles are “bunk” and that smarter students are more efficient through learning of type (a), but it’s also quite plausible that this is not true (for one thing, learning is dependent on both intelligence and motivation/interest, and the motivation/interest component can make type b learning more efficient even for geniuses). I, for one, learn best through the “specific to general” method. As such, I believe that I learn math best when it’s motivated by physical phenomena (in other words, learning math “along the way of doing science”) than when pursuing math first and then learning science (which is what I did, which didn’t work as well as I hoped, especially since it killed my motivation). As I’m quite familiar with the climate trends of specific localities, I also learn the generalities of climate best through case studies.

And then after learning the applications of this math field, one is more motivated to learn the specifics of the math behind the math, and one even has more physical intuition through this learning route. It actually means something when one learns through the second route.

It is also true, however, that route (b) can be taken too far, as is evident in the “discovery-based” math curricula, which generally produce poor results. When one is self-motivated, route (b) can be especially rewarding, but the selection of case studies is important, as an improper selection of case studies can result in a very minute exploration of the general structure (it is also true that very few textbooks are written in a way as to make route (b) most exciting to learn about). Generally textbooks present their material as ends, not as means to an end (except in the crappy discovery-based math textbooks). However, one can most certainly learn calculus through physics (especially div/grad/curl), and linear algebra through its applications, and a very smart (or lucky) person can design such a curriculum that would work for many people (it is much easier to design such curriculums for oneself than it is for a wide variety of personalities).

Nonetheless, route (b) is often stultifying. In fact, I sometimes feel impatient and feel like I’d rather learn the math first. A person’s temperament may vary from time to time, and find type a rewarding at some types, and type b rewarding at other times.


learning how it’s done vs why it’s done: what’s the optimal order of learning these things

in grade school, you’re taught how it’s done. why it’s gone come later

in college, opposite happens. but sometimes it’s a lot more confusing that way. and requires absorption of more details and multistep processes

what is optimal? it obviously depends from person to person. it’s more “natural” to learn how it’s done first. but it’s only “natural” for phenomena that are discovered through hypothesis->observation or derivation. However, it’s not “natural” for phenomena that are discovered through serendipity, in which one learns the result/how to get it before one learns why the result is the way it is. and in some cases, like quantum mechanics, one may never learn why it is the way it is. of course, it feels more “natural” and “satisfying” to learn how it’s done, and promotes habits that are helpful to further discovery, but learning the process AFTER learning the result is ALSO curiosity-satisfying, and does not necessarily lead to the sense of “helplessness” that could allegedly come after learning the result first time after time. That “helplessness” could come, but if one has internalized both approaches, then it is far from inevitable, and then learning the result before the explanation can be faster and more efficient.

But in the end, it depends on person and context. Sometimes I feel more stifled when I learn result before explanation; sometimes I feel more stifled when I learn explanation before result. It is much easier to trick oneself into thinking that one has learned the material if one has only learned the result (without learning the explanation); it is also easier to forget the material if one has only learned the result (but learning the explanation along with the result shouldn’t take too much more time); and learning only the result is also less challenging (so familiarity with the process carries better “signalling” value and makes one more absorbed into the process so that one internalizes it better ). But again, once one has learned BOTH the process and the result, then the signalling value/internalization value is irrelevant. The only point of relevance is when one has learned one but not the other (which can happen, especially when people are lazy, slow, or time-constrained), or when one has partially learned one and learned another more (although this is very common). So perhaps in an environment where one has partially learned one and learned another more, then learning the process first may be more optimal, especially when people forget easily and quickly. But when one learns things completely, then the order should not matter much (or the order should depend on how much time more time one spends doing it one way vs how much time one spends doing it the opposite way; or on how rewarding the two orders are relative to each other)


so one thing ive always wondered: in pharmacology, is the dose/mg really valid? this assumes that the extra cells (fat or muscle cells) have equal uptake of the drug as all other cells do, AND that blood vessel growth is proportional to weight growth. but i dont think this assumption is totally valid. more fat might spring up new blood vessels, but what is the extra volume of all these blood vessels? well, fat tends to grow on “layers”. fat tends to distribute itself throughout the periphery of the body, so the blood vessel networks on the outer periphery have more surface area to unit volume? (as in, it’s the sort of thing that has a high surface area to volume ratio). so that would make blood vessels from fat cells act in a way that overrepresents the blood vessel growth from fat growth. of course there is an opposite trend too – the blood vessels from new fat cells are just new blood vessels, but they’re not major new blood vessels – there’s a fixed amount of blood vessel mass in every person of some arbitrary height.

there’s another assumption to – that new blood vessel growth will trigger the production of plasma + blood cells that increases in proportion to increased blood vessel growth


normal science: adding entries to the database of scientific knowledge (entries are categorized and analyzed according to current rules)

revolutionary science: change the rules. or design a totally different new system of rules that makes the entries in the database more consistent with each other.

there are several levels of normal science. a researcher/principal investigator can allocate most of the “normal science” work to graduate students/others as he can usually expect them to follow the procedures of normal science. occasionally the researcher chooses to look at the data more closely/examine the data, as specific instances can provide “schema/prototypes” to help clarify theory (and make the theory more salient and comprehensible)

Also, the “4 paradigms” – theory, experiment, simulations, data mining/pattern recognition. technically the latter two can be considered subsets of experiments. But at the same time, they also share some characteristics with theory. there’s a new computer program that can derive physical laws from mass datasets. how does one categorize that? it’s clearly the fourth paradigm, but at the same time, it creates theory (and theory is inspired by the desire to find a model/equation to make the data consistent). simulations can be analyzed too (simulations are pretty much experiments – the only difference is that simulations do not have to conform to the real world). in fact, a lot of the “normal science work” consists of the analysis of simulation output, as many simulations are ad hoc and haven’t been independently investigated by numerous people.

Also, we now know that crowdsourcing can also inspire research (usually through categorization/data mining/etc). Some sorts of crowdsourcing were implicitly used in empirical research (e.g. many fossils are not discovered by people who are actively looking for them – they just accidentally dig them up). now crowdsourcing is more active (galaxy identification, etc).

Technically, there are other important steps to further research too. developing the infrastructure of research (through engineers and programmers). a lot of the infrastructure is general and must be converted to scientific uses.

so there probably IS such a thing as social intelligence

intelligence consists of the ability to “select from a number of possible configurations of items that one then (through intelligence) finds most appropriate for one’s given environment and task”. Creativity consists of the ability to generate such possible configurations of items. Creativity demands intelligence since the generation of such configurations is not random.

Social intelligence demands selecting the social responses most appropriate for a given social situation (assuming that one desires to be socially tactful of course).


Better def: As the word “intelligence” is a pretty loaded term (although perhaps most people agree on their basic conceptions of intelligence – disagreements tend to arise over issues of “multiple intelligences” and such).

Here’s my definition:

Creativity seems to be based on the capacity to imagine unique possibilities based on the perceptual environment one lives in (and intelligence is based on the capacity to judge the actions most conducive to a “desired outcome”. (a perceptual environment is inclusive of hallucinations). Creativity/intelligence only involve the possession of capacity, not the acting upon of such capacity. In biological organisms, of course, the possession of capacity is closely tied into the acting upon of such capacity, for the acting upon of such capacity helps develop the possession of capacity (from an evolutionarily point of view).

Intelligence is highly conducive to creativity since it allows one to “select” the “most appropriate” possibilities that one imagines – and to “use” those possibilities to achieve a “desired outcome”. It is possible to be intelligent without being creative, as one can still be a perfect judge without being creative. It is also possible to be creative without being intelligent (in a sense, the “infinite monkey typewriter” is creative if not intelligent). It seems that traditional definitions of creativity do incorporate elements of intelligence – as creative individuals are able to produce possibilities that “fit” in the environment – rather than possibilities that involve random number generators. An intelligent animal is able to use its sensory information to produce actions most appropriate to its desires in its particular perceptual environment (for example, an intelligent orca is able to use its sensory information to imagine the actions most likely to capture and eat a seal – a seal that happens to be on an iceberg. It has to be able to judge which actions are more likely to produce results in its particular environment).

Of course, there is also “learning speed”. Is there a logical connection between learning speed and intelligence? Or do they happen to be highly correlated in humans but not necessarily in other organisms or machines? Bear in mind that intelligence is useless without accumulated knowledge and so it makes sense from an adaptive standpoint for intelligence and learning speed to have high correlation with each other.

So in this case, there is (a) learning speed and (b) creativity. It’s possible that one can learn fast without being creative. Or one can lose all ability to form new memories (but can still be creative or rely on one’s stockpile of memories). In this, intelligence is perhaps defined by the ability to judge from what one learns or creates.

An intelligent organism is intelligent irrespective of its environment, and it may act in ways that are unlikely to produce desired results in a completely different environment (especially if the new environment has no patterns whatsoever – or if the organism happens to be hallucinating). However, in MOST cases, we can effectively define intelligence as the capacity to envision cause-and-effect in a particular environment.

There is definitely space for “multiple intelligences” (in the human brain, it may happen that the “multiple intelligences” happen to be correlated with one factor g – but this may just be an accident of evolution if anything). One can be analytically intelligent but not socially intelligent, in that one is able to select the “most appropriate” actions for a desired analytical outcome but totally unable to select the “most appropriate” actions for a desired social outcome.