Tuesday, November 20, 2007

Zachriel's EA and Dembski's quote

Continuation from here

Hello again Zachriel,

Dr.s Dembski and Marks:

“Such assumptions, however, are useless when searching to find a sequence of say, 7 letters from a 26-letter alphabet to form a word that will pass successfully through a spell checker, or choosing a sequence of commands from 26 available commands in an Avida type program to generate a logic operation such as XNOR [16]. With no metric to determine nearness, the search landscape for such searches is binary -- either success or failure. There are no sloped hills to climb. We note that such search problems on a binary landscape are in a different class from those useful for, say, adaptive filters. But, in the context of the NFLT, knowledge of the class in which an optimization problem lies provides active information about the problem.”

Zachriel:

“Dembski has defined the same problem I did above. He claims that without some information built into the algorithm about how words are distributed through sequence space, evolutionary algorithms will be no better than random search. Is this what Dembski is saying? And is he right?”

Dembski is saying precisely what he is stating: “Such assumptions are useless when searching to find a sequence of say, 7 letters from a 26-letter alphabet to form a word that will pass successfully through a spell checker ... with no metric to determine nearness, the search landscape for such searches is binary – either success or failure. There are no slopped hills to climb.”

First, what assumptions is he talking about? In the previous paragraph, he refers to the assumptions:

“Within HÄaggstrÄom's familiarity zone, search structures have “links" in the optimization space and smoothness constraints allowing for use of “hill-climbing" optimization. Making such assumptions about underlying search structures is not only common but also vital to the success of optimizing searchers (e.g., adaptive filters and the training of layered perceptron neural networks | [22]).”

So, basically, Dembski and Marks refer to assumptions of links and hill climbing structures in the search space. The point is that merely assuming that there are these link structures within a non-uniform search space , without actual knowledge of these structures incorporated into the search algorithm, does nothing to affect the probability of a *random* search. What is needed is an algorithm that is matched to and programmed to take advantage of a *known* (so that it can provide accurate guidance) non-uniform search structure which actually contains hill climbing optimizations, etc. If the proper active information in the form of the proper type of adaptive search is not applied, there are no slopped hills to climb by *random* search no matter the underlying search structure.

We know that Dembski and Marks are referring to *random* search because in the first quoted paragraph above, he states that “We note that such search problems on a binary landscape are in a different class from those useful for, say, adaptive filters.”

Your algorithm does make use of at least one adaptive filter such as a “stepping stone search” which is defined as “building on more probable search results to achieve a more difficult search” and one instance of stepping stone search is discussed in “Conservation of Information in Search: Measuring the Cost of Success” and a formula for measuring the active information of the discussed stepping stone search function is provided.

Your algorithm is far from the random search of a 7 letter word that Dembski refers to. First, it builds upon the success of previous results (stepping stone search), sorts fit from unfit (also discussed and measurement of active information given in the aforementioned paper) before lesser probable words (any 7 letter words) are achieved, and may actually include other instances of active information. Thus, since it builds off of previous filtered successes, it fits into the category of adaptive searches which Dembski and Marks note are in a separate class than “success or failure” random searches.

Thus, your EA provides an example of Dembski's point ... that *adaptive filters* (which are in a different category than purely random search) must be *properly* applied to take advantage of *known* (as opposed to merely assumed) search space structure -- the non-random dictionary -- in order to provide better than chance performance and arrive at any targeted 7 letter word that will pass the spell checker test. Try the EA on a dictionary that only contains the 7 letter words.

Dembski also states that “Following HÄaggstrÄom [18], let the search space  be partitioned into a set of acceptable solutions T and a set of unacceptable solutions T. The search problem is to find a point in T as expediently as possible. With no active information, the distribution of the space is assumed uniform. This assumption dates to the 18th century and Bernoulli's principle of insufficient reason [2] which states, \in the absence of any prior knowledge, we must assume that the events [in ] ... have equal probability." This is equivalent to an assumption of maximum (information theoretic) entropy on the search space [21]. Many criticisms of the NFLT are aimed at the uniformity assumption [28]. Knowledge of a deviation from uniformity, however, is a type of active information that is consistent with the premise of the NFLT.

So, you have separated acceptable solutions from unacceptable solutions by creating a dictionary, but as already mentioned, your search algorithm is designed to take advantage of the non-random aspects of your dictionary. IOW, the algorithm is matched to the solutions. Knowledge of the solutions and the type of algorithm which will work is used to create the correct algorithm which builds upon previous results and also sorts these results in order to achieve lesser probable results in the future at better than chance performance. All of this complementary knowledge is extraneous to the evolutionary search itself and is essential to the success of the search and is thus active, problem specific, guiding information.

You do realize that the most that your EA shows is that the targets for life, and search algorithm to necessarily travel those pathways and to reach the targets are set before the search ever begins. Is that the hallmark of a system built upon a random set of variables?

Now, let’s discuss NFL Theorem a bit:

NFL Theorem states that no method of search (algorithm) will, on average, perform better than any other algorithm [to produce consistently better than chance results] without problem specific information guiding it. The example of finding the Ace of Spades in "Active Information in Evolutionary Search" excellently illustrates the point of the NFL Theorem.

IOW, from what I understand, if you have no knowledge of the search space structure, and you separate acceptable from unacceptable targets, any method (search algorithm) of finding an acceptable target will perform as well as any other method and will on average discover the acceptable target at no better than chance probability. That is, unless active information regarding the search space structure and/or information helping to determine the location of any of the acceptable targets is programmed into the search algorithm in the form of "warmer/colder hints," “stepping stone searches,” “partitioned searches,” “fitness selectors” etc. (Many types of adaptive algorithms are discussed in "Conservation of Information in Search: Measuring the Cost of Success.")

So, what separates an EA from random search? According to NFL Theorem, if the EA is to find a target better on average than random search there must be some problem specific information (knowledge about the targeted problem) guiding it. “Concerning the NFLT, Ho and Pepyne write “unless you can make prior assumptions about the ... [problems] you are working on, then no search strategy, no matter how sophisticated, can be expected to perform better than any other" [15]. According to Wolpert and Macready search can be improved only by “incorporating problem-specific knowledge into the behavior of the [optimization or search] algorithm" [30].

Anticipating the NFLT, Schaefer [24] notes "a learner [without active information] ... that achieves at least mildly better-than-chance performance ... is like a perpetual motion machine." The "prior assumptions" and "problem specific knowledge" required for "better-than-chance performance" in evolutionary search derives from active information that, when properly fitted to the search algorithm, favorably guides the search.” Have you discovered, within your program, the information theoretic equivalent to perpetual motion machines? (Of course, understanding this point, you will see why I am so sceptical re: processes allegedly creating information at averages exceeding probability without guiding, problem specific, active information.)

Now, back to the “knowledge of a deviation from uniformity of the search space” that was briefly mentioned earlier.

The conclusion of “Active Information in Evolutionary Search:”

“Active information, when properly prescribed, successfully guides an evolutionary search to a solution by incorporating knowledge about the target and the underlying structure of the search space. That structure determines how the search deviates from the uniformity assumption of the NFLT. HÄaggstrÄom's "geographical structure[s]," "link structure[s]," search space "clustering," and smooth surfaces conducive to "hill climbing" are examples that reinforce rather that refute the conclusion that intelligent design proponents draw from the NFLT, namely, that the success of evolutionary search depends on the front-loading of active information. Thus, in biology as in computing, there is no free lunch [8].”

Since the NFL Theorem assumes a uniform search space (maximum information entropy), we can conclude that any random search spaces being acted upon by any chosen method of search (search algorithm) will not, on average, arrive at targets at better than random chance, nor will one random set of search space and search algorithm consistently perform better than chance over time. This is consistent with what I understand to be part of the basis for Conservation of Information:

A "learner... that achieves at least mildly than better-than-chance performance, on average, ... is like a perpetual motion machine - conservation of generalization performance precludes it.”
--Cullen Schaffer on the Law of Conservation of Generalization Performance. Cullen Schaffer, "A conservation law for generalization performance," in Proc. Eleventh International Conference on Machine Learning, H. Willian and W. Cohen. San Francisco: Morgan Kaufmann, 1994, pp.295-265.

That can be tested by generating a random set of variables (laws), which causes a random set of functional targets within a search space, and applying any method of search to discover those targets.

According to my understand of the NFL Theorem and Conservation of Generalization Performance, there are two types of results that will not occur:

1. The search algorithm performing consistently better than chance over a lengthy run time.

2. After many shorter runs, of different random searches on random targets, the averages of finding the targets being better than chance.

“Although commonly used evolutionary algorithms such as particle swarm optimization [9] and genetic algorithms [11] perform well on a wide spectrum of problems, there is no discrepancy between the successful experience of practitioners with such versatile algorithms and the NFLT imposed inability of the search algorithms themselves to create information [4, 10]. The additional information often lies in the experience of the programmer who prescribes how the external information is to be folded into the search algorithm. The NFLT takes issue with claims that one search procedure invariably performs better than another or that remarkable results are due to the search procedure alone [1, 3, 13, 14, 17, 19, 23, 25, 26].”

"The NFLT puts to rest the inflated claims for the information-generating power of evolutionary simulations such as Avida [16] and ev [25]. The NFLT gauges the amount of active information that needs to be introduced to render an evolutionary search successful [18]. Like an athlete on steroids, many such programs are doctored, intentionally or not, to succeed [17]."

"Christensen and Oppacher note the NFLT is \very useful, especially in light of some of the sometimes outrageous claims that had been made of specific optimization algorithms" [4].

"Search algorithms do not contribute information to the search, and the NFLT exposes the inappropriateness of such claims."

"The NFLT shows that claims about one algorithm outperforming another can only be made in regard to benchmarks set by particular targets and particular search structures. Performance attributes and empirical performance comparisons cannot be extrapolated beyond such particulars. There is no all-purpose \magic bullet" search algorithm for all problems [5, 27]."

IOW, from what I understand, a non-random, non-uniform search space structure must be coupled to the correct search algorithm, utilizing the proper filter in accordance with prior knowledge of the target, in order to produce better than chance performance when searching for a target.

Now, let’s apply this to real life. Within life, a target is that informational sequence which functions within the living organism. Similarly, if you were to put the dictionary of targets within your EA together using random variables, in order to produce anything worth comparing to the evolution of life, the targets would need to conform to a system of rules and interact with each other to create some type of function. In other words, the targets would need to be algorithmically complex and specified, yet generated by a random set of laws and attained through a random method of search at better than chance. That is what you’d have to demonstrate to even begin to show that life has evolved from non-planned, non- intelligently designed laws and information.

As Dembski and Marks write in “Active Information in Evolutionary Search,” In the field of evolutionary computing, to which the weasel example belongs, targets are given extrinsically by programmers who attempt to solve problems of their choice and preference. But in biology, not only has life come about without our choice or preference, but there are only so many ways that matter can be configured to be alive and, once alive, only so many ways it can be configured to serve biologically significant functions. Most of the ways open to biological evolution, however, are dead ends. It follows that survival and reproduction sets the intrinsic targets for biological evolution.

Evolutionary convergence, the independent evolution of similar features (such as the camera eye of humans and squids), provides a striking example of intrinsic biological targets. Cambridge paleobiologist Simon Conway Morris [20] finds that evolution converges to only a few endpoints. He therefore theorizes that if the evolutionary process were restarted from the beginning, the life forms of today, including humans, would re-evolve. From the perspective of the NFLT, these limited number of endpoints on which evolution converges constitute intrinsic targets, crafted in part by the environment and by laws of physics and chemistry.”

This provides evidence that our universe will necessarily arrive at pre-set living targets which are guided by problem specific, active information at the foundation of its natural laws. This is consistent with any natural teleological hypothesis referencing the origin of our universe and the subsequent evolution of life. Evolutionary algorithms, NFL Theorem, and the Law of Generalization Performance also provide evidence against any assertions that just any random set of information and laws will cause evolution (extremely basic: the, on average, better than chance generation of algorithmically complex and specified information) to occur.

So, your example operates off of some active information and it seems to show the lower capability of evolutionary algorithms. If your EA was to even simplistically model the evolution of life, you need to show that the pathway from useable protein target to the next less probable useable proteins has a high enough probability to be achieved in the amount of time available and this would have to be consistent with known mutation rates (‘x’ mutations per generation, ‘x’ generations per year, 4 x 10^9 years available and largest protein = 26,926 amino acids). Of course, the probabilities between targets would also have to apply to the generation of instructions for assembling machines and systems from those proteins.

Furthermore, your program would also need to model the further evolution of information processing systems, the evolution of the different functions of RNA, the evolution of logic gates, the evolution of complex machinery, the evolution of IC systems (attained through indirect pathways), redundant systems, repair systems, and convergent evolution (the independent evolution of similar features), to name a few systems and results which have evolved within life, not to mention intelligent beings and conscious systems. I wonder how much and the type(s) of fine tuned, problem specific, active information would be necessary in order to evolve all of those results. And "answers" about how our EA’s won’t create such systems just because there isn’t enough time don't impress me. I’m perfectly fed up with anti-scientific, progression crushing, “chance of the gaps” non-explanations. I want knowledge of how something is accomplished. I don’t want “pat” answers about how things “just happen” to self- assemble given enough random variation and time. I want to know *how* in a manner consistent with our knowledge of the flow of information. This truly sounds like some excellent ID research.

The conclusion. Evolution does not create any new information, it only converts it from one program to another -- from the problem specific, active informational structure of our natural laws at the foundation of our universe to the information processing system of life. Enter the Law of Conservation of Information. As well, since evolution generates information at greater than random chance, it must make use of the problem specific information to find the informational targets. Furthermore, evolution by natural selection provides evidence of teleological foresight and programming of the active, problem specific information necessary for the consistently better than chance performance of evolution.

"Search algorithms, including evolutionary searches, do not generate free information. Instead, they consume information, incurring it as a cost. Over 50 years ago, Leon Brillouin, a pioneer in information theory, made this very point: .The [computing] machine does not create any new information, but it performs a very valuable transformation of known information. [3] When Brillouin’s insight is applied to search algorithms that do not employ specific information about the problem being addressed, one finds that no search performs consistently better than any other. Accordingly, there is no magic-bullet search algorithm that successfully resolves all problems [7], [32]."

Next question: How does one information processing system create another information processing system within it (ie: universe creates life)? I predict: not by unguided, random accident. Sounds like some more ID research.