Comps-2011-Spring
Contents
|
Programming Languages
Question 1.1 Combinatorics
A k-combination of a set S is a subset of k distinct elements of S. Give the definition of a Haskell function combinations which, given an integer k and a set S represented as a sorted list without duplicates xs returns a list of all k-combinations of S (where each combination is itself represented as a sorted list without duplicates). For example,
*Main> combinations 3 [’a’..’d’]
combinations 3 [’a’..’d’]
["abc","abd","acd","bcd"]
*Main>
In your solution, specify the type of the function combinations precisely.
This solution is easier to think through with an example. Consider a list of [a,b,c,d] and we want every unique subset of size 3.
- First we need all subsets of size 2 which do not include an 'a', and we prepend an 'a' to these.
- This requires a recursive call to the function which will give us back all subsets of size 2 which do not include an 'a', i.e.
combinations (k-1) xs - These subsets are [b, c] [b, d] and [c, d]. Our simple recursive call will give us [b, c] and [b, d], but what about [c, d]?
- The second call on the stack will be
combinations 2 [b, c, d]with the base casecombinations 1 (x:xs)giving us [b, c] and [b, d]. - What code would produce the [c, d] which we need to prepend an 'a' to?
- combinations 2 [c,d] would, so I add the code
++ combinations (k) xs
- combinations 2 [c,d] would, so I add the code
- This requires a recursive call to the function which will give us back all subsets of size 2 which do not include an 'a', i.e.
- Now that we have all subsets of size 3 which include an 'a' we need all subsets of size 3 that include a 'b' but not an 'a', e.g. [b,c,d]
- This too is covered by the code
++ combinations (k) xs.
- This too is covered by the code
The complete function would be:
combinations :: Int -> [a] -> [[a]]
combinations 1 (x:xs) = map (:[]) (x:xs)
combinations _ [] = []
combinations k (x:xs) =
map ([x]++) (combinations (k-1) xs) ++ combinations (k) xs
Question 1.2 String Editing
A DNA molecule is a sequence made of four bases which can be represented using the characters ’A’, ’G’, ’C’, and ’T’. Genomes represented by DNA molecules are subject to four different types of point mutations:
- insertions - A base is inserted between two adjacent points in a genome.
- deletions - A point is deleted from a genome.
- substitutions - A base at a point is replaced with another base.
- transpositions - The bases at two adjacent points are exchanged.
Give definitions forHaskell functions insertions, deletions, substitutions, and transpositions which take a genome represented as a string and return a list of all genomes produced by single point mutations of the specifed kind. For example.
*Main> insertions "GC"
insertions "GC"
["AGC","GAC","GCA","GCA","GGC","GGC","GCG","GCG","CGC",
"GCC","GCC","GCC","TGC","GTC","GCT","GCT"]
*Main> deletions "AGCT"
deletions "AGCT"
["GCT","ACT","AGT","AGC"]
2
*Main> substitutions "ACT"
substitutions "ACT"
["ACT","AAT","ACA","GCT","AGT","ACG","CCT","ACT","ACC",
"TCT","ATT","ACT"]
*Main> transpositions "GATC"
transpositions "GATC"
["AGTC","GTAC","GACT"]
*Main>
In your solution, specify the type of each function precisely.
insertion :: [Char] -> [[Char]]
insertion [] = [['A']] ++ [['T']] ++ [['G']] ++ [['C']]
insertion (x:xs) = [['A']++(x:xs)] ++ [['T']++(x:xs)] ++
[['G']++(x:xs)] ++ [['C']++(x:xs)] ++ (map (x:) (insertion xs))
deletion :: [Char] -> [[Char]]
deletion [] = [[]]
deletion (x:[]) = [[]]
deletion (x:xs) =
[xs] ++ map (x:) (deletion xs)
substitution :: [Char] -> [[Char]]
substitution [] = [[]]
substitution (x:xs) = [['A']++(xs)]++[['T']++(xs)] ++
[['G']++(xs)] ++ [['C']++(xs)] ++ map (x:) (substitution xs)
transposition :: [Char] -> [[Char]]
transposition [] = [[]]
transposition (a:[]) = [[a]]
transposition (x:y:[]) = [(y:x:[])]
transposition (x:y:xs) = [(y:x:xs)] ++ map (x:) (transposition (y:xs))
Question 1.3 Lists
We need to write a function that takes in a list of Ints and returns the smallest Int not in the list. But first, we need to prove the the example solution given takes at worst quadratic time. Example Solution:
f :: [Int] -> Int
f xs = head ([0..] // xs)
- Given a list xs of size n
- For each element y in [0..],
- We must search at most n elements in the list xs.
- If the element is found we move to the next element in [0..].
- If the element is not found we have found the smallest int not in the list.
- This process is repeated at most n times, giving a worst case quadratic time.
A data-structure-centric way to solve this problem would be to create an array, zs, of size n and for each element less than (n-1) in xs, mark the index location in zs.
Then search through zs to find the first unmarked location, this would take linear time. In Haskell this is harder than it initially seems.
Indexing and functions like splitAt are pretty expensive for lists.
Correct Solution
f :: [Int] -> Int
f xs = f' 0 xs
f' n [] = n
f' n xs = if (length as) < (m-n) then f' n as else f' m bs
where as = filter (< m) xs
bs = filter (>= m) xs
m = div (length xs) 2 + n + 1
f :: [Int] -> Int
f xs = findLst (acc 0 xs) 0
acc ans [] = ans
acc ans (x:xs) = acc (ans+2^x) xs
findLst ans i = case mod ans 2 of
0 -> i
otherwise -> findLst (div ans 2) (i+1)
The 1st method above is a clever divide and conquer approach.
The 2nd method above accumulates an intermediate answer by adding 2 to the power of each number in the input list.
After generating the intermediate answer the number is divided by 2 until we find that it is even. This the current index i is returned as the answer.
Question 1.4 Lists, Infinite
Problem Description
This question asks for a Haskell function h :: [Integer] -> [Integer] such that the result of evaluating h as , where as is a list of positive numbers, is an infinite list of numbers with the following properties:
- it is in increasing order;
- it begins with 1;
- for every number x that is in the list and for every number a in as, the product ax is also in the list; and
- it is the smallest such list (i.e., it contains no other numbers than those required by (1)-(3)).
We are then to discuss the efficiency of the implementation in terms of the:
- the length of
as - time taken to produce the first n elements of
h as
Notes
- Consider the example where as = [1,2,3,4], then
- initially our solution is [1], then
- our solution is [1], [1,2,3,4], then
- our solution is [1], [1,2,3,4], [2,4,6,8] [3,6,9,12] and [4,8,12,16]
- our solution is everything before plus
- ...
- We need to guarantee that we return the smallest integers in the solution since the answer is required to be sorted.
- Further more, we cannot call sort on the list since it is infinite.
Solution
The naive approach is to generate the members of the output list as seen below. Unfortunately, some of the numbers can get out of order for some inputs, where the output list is not correct. Also, we cannot sort the list because it is infinite.
Incorrect Solution:
import Data.List
h :: [Integer] -> [Integer]
h as = helper [] (sort as)
helper [] as = helper [1] as
helper ans as =
ans ++ helper ( nub [ i*j | i <- ans, j <- as, i*j `notElem` ans]) as
The correct way to go about writing this function is to check (in order) which numbers are composed wholly of factors from the input list.
Correct Solution:
import Data.List
h :: [Integer] -> [Integer]
h as = helper as [1..20] where
helper as [] = []
helper as (t:ts) = case isDiv as t of
True -> [t] ++ helper as ts
False -> helper as ts
isDiv :: [Integer] -> Integer -> Bool
isDiv [] ts = False
isDiv (a:as) t = case div t a of
0 -> isDiv as t
x | x == t -> isDiv as t
otherwise -> case mod t a of
0 -> True
otherwise -> isDiv as (mod t a)
Question 2.1 Program Analysis
- Data Flow Analysis
- Reaching Definitions
- A good source from Scott [1]
- Define the Reaching Definitions problem as a problem in a data-flow analysis framework.
- State its value domain, the direction of the analysis, and the equations corresponding to the “transfer function” and the “meet” operator.
- Then use the given control flow graph as an example and show how the reaching definitions analysis proceeds, step by step.
- Finally, suggest how the results of a reaching definitions analysis may be used for program optimization.
- The reaching definitions problem is defined as trying to determine the "reach" of definitions at different points of the program. Where, reach is defined as the location for which a definition previously defined holds -- it is said the definition reaches the location.
- Transfer function <math>[x:=a]^l -> (x,l)</math>
- Meet
- First find the possible predecessors for each statement in the figure.
- {1}
- {2}
- {3, 7}
- {4}
- {5}
- {5, 6}
- Create a table that iterates through <math> RD_o(i) </math>
- All empty for first two columns
- {} | (i,1)
- (i,1) | {(i,1), (j,2)}
- {(i,1),(j,2)} | {(i,1), (j,2), (a,3)}
- (i,1), (j,2) (a,3) | (i,4), (j,2), (a,3)
Variant: Liveness
Definitions taken from: http://www.diku.dk/hjemmesider/ansatte/torbenm/Basics/
Consider question 2.1 above but instead of reaching definitions use liveness definitions.
Liveness rules:
- (liveness generation) a variable used by an instruction is live at the start of the instruction.
- (liveness killed) if a variable is assigned a value in an instruction, but not used by the instruction then that variable is dead at the start of the instruction.
- (liveness propagation) if a variable is live at the end of an instruction, and the variable is not assigned a value by the instruction, then the variable is live at the start of the instruction.
- (liveness propagation) a variable is live at the end of an instruction if it is live at the start of any immediately succeeding instructions.
Definitions:
- For every instruction i in the program we have a set of instructions that immediately follow generated by the function succ(i).
- For an instruction i the set gen(i) is the set of variables that may be read by the instruction and are therefore live at the start of i.
- The set of variables which may be killed at instruction i is returned by kill(i). These are the variables which are assigned a value by instruction i.
- in(i) contains the variables live at the start of instruction i.
- Formally, $in(i)\;=\;gen(i)\cup \;(out(i)\backslash kill(i))$
- out(i) contains the variables live at the end of instruction i.
- Formally, $out(i)\;=\;\cup_{j=succ(i)} in(j)$
The sets generated by in and out are recursive and we solve them by fixed point iterations - repeatedly applying
Question 2.2 Program Transformations
Problem Statement
- Describe the continuation-passing style (CPS) transformation. Explain its possible uses in programming language implementation.
- Then, completely specify an algorithm to transform direct-style functional programs to CPS.
In your description, for the direct-style language to be transformed into CPS use the core lambda language whose abstract syntax is:
t ::=
c
x
\lambda x.t
let x = t1 in t2
if t1 then t2 else t3
t1 t2
primop t1 . . . tn
- Finally, provide one or more examples of programs before and after CPS transformation, such that all the syntactic forms above are covered by the examples, and, using those examples, illustrate the steps or phases of the transformation algorithm.
Description
Instead of implicitly relying on a stack a continuation argument is associated with a function/term, which essentially is a return address.
- This allows a closer translation to assembly code than the lambda calculus, thereby providing and intermediate representation.
- Unusual control structures are better represented, e.g. catch, throw, etc.
Transformation Algorithm
A transformation is denoted by $ \llbracket\;\rrbracke $, which produces a continuation. The continuation is represented by a functional $k$ which given a value returns that value to the proceeding functions address.
- $t\; ::=$
- $\llbracket$ $c$ $\rrbracket \;\Rightarrow\; \; \lambda k . k\; c$
- $\llbracket$ $x$ $\rrbracket \;\Rightarrow\; \lambda k.k x$
- $\llbracket$ $\lambda\; x.t$ $\rrbracket \;\Rightarrow\; \lambda k.k\;(\lambda x.\llbracket t \rrbracket)$
- $\llbracket$ let $x\; =\; t_1$ in $t_2$ $\rrbracket \;\Rightarrow\;$ $\llbracket ( \lambda x . t_2 ) t_1 \rrbracket$
- $\llbracket$ if $t_1$ then $t_2$ else $t_3$ $\rrbracket \;\Rightarrow\;$ $\lambda k . \llbracket t_1 \rrbracket ( \lambda y .$ if $y$ then $\llbracket t_2 \rrbracket\; k$ else $\llbracket t_3 \rrbracket k $
- $\llbracket$ $t_1\; t_2$ $\rrbracket \;\Rightarrow\; \lambda k . \llbracket t_1 \rrbracket\; (\lambda f. \llbracket t_2 \rrbracket ( \lambda v. f v k ))$
- $\llbracket$ primop $t_1\; . . .\; t_n$ $\rrbracket \;\Rightarrow$
- $(^1 \lambda k . \llbracket t_1 \rrbracket (^2 \lambda v_1.\llbracket t_2 \rrbracket (^3 \lambda v_2.\llbracket t_3 \rrbracket ... (^4 \lambda v_{n-1}. \llbracket t_n \rrbracket (^5 \lambda v_n.k\;$ op'$(^6v_1,v_2,...v_n)^6)^5)^4)^3)^2)^1 $
Example
- $\llbracket 2 + 3 \rrbracket$
- $=\; \lambda k.\llbracket t_1 \rrbracket(\lambda v_1. \llbracket t_2 \rrbracket\ (\lambda v_2 . \;k\; $op'$(v_1, v_2)))$ by applying rule primop
- $=\; \lambda k.((\lambda k'.k'\;2)(\lambda v_1. (\lambda k.k\; 3) (\lambda v_2 . \;k\; $op'$(v_1, v_2))))$ by applying rule const
Now that we have this fully converted to CPS form we can get the final value by applying the identity function
- $=\; (\lambda k'.k'\;2)(\lambda v_1. (\lambda k.k\; 3) (\lambda v_2 . \;(\lambda x .x)\; $op'$(v_1, v_2)))\;$
- $=\; (\lambda v_1. (\lambda k.k\; 3) (\lambda v_2 . \;(\lambda x .x)\; $op'$(v_1, v_2)))\;2$
- $=\; (\lambda k.k\; 3) (\lambda v_2 . \;(\lambda x .x)\; $op'$(2, v_2))\;$
- $=\; (\lambda v_2 . \;(\lambda x .x)\; $op'$(2, v_2))\;3$
- $=\; \lambda x .x)\; $op'$(2, 3)\;$
- $=\;5$
Question 2.4 Code Generation
Generating code for arithmetic expressions.
- Go down and then generate code on the way back up
- Decide is the minReg on the left or right deeper?
- Take the deeper branch
Write minReg
minRegs'(Sum t1 t2) = minRegs' t1 + minRegs' t2
minRegs' Const _ = 1
compile' (Sum t1 t2 ) i = case (minRegs t1 < minRegs t2) of
True -> compile' t1 ++ compile' t2 ++ [Add i i (i+1)]
...
compile' (Const x) i = [LoadImm i]
- For the sum it's the number of registers that it takes to compute each term individually.
- Number of registers needed for constant invariant is 1 each.
Question 3.3 Loop Invariant
Consider the following program which is supposed to perform binary search in a sorted array. Check whether it indeed meets its specification (so first write this specification precisely as a pre and post condition) using axiomatic semantics. Find the bug and show all the steps used in the axiomatic semantics to identify the bug. Fix the bug, and then prove again using axiomatic semantics that the modified program meets its specification.
1. low = 0, high = L-1; {L>0}
2. mid = ceil((low+high)/2);
3. while (low <= high) do
{l_0}
3.1 if(A[mid] == query) then
3.2 return mid;
3.3 else if (A[mid] > query) then
3.4 high = mid - 1;
3.5 else
3.6 low = mid+1;
3.7 End if
3.8 mid = ceil((low + high)/2);
4. End while
- {l_0}
Theory
Question 1 Regularity of Switch
Suppose I have a language <math>L</math> over the alphabet <math>{a,b,c,#}</math>, where <math>#</math> is a special marker symbol, and where <math>L</math> only includes words where <math>#</math> appears exactly once. In other words, <math>L</math> only contains words of form <math>u#v</math> where <math>u,v \in \{a,b,c#\}*</math>. Define the function switch on such words as
and define
for instance, if
then
Prove that L is regular if and only if switch(<math>L</math>) is regular.
I will prove this by principle of composition.
- Suppose that, we can build a machine which accepts any word in the language L, and we divide this machine into three subparts, <math> X,\; Y\;,\;Z </math> -- where <math>X</math> is the portion of the machine that has a path from <math> s_0 </math> to the state which preceeds <math>#</math>; <math>Y</math> is the state which accepts <math>#</math>; <math>Z</math> is the path which goes from <math>\delta(Y,\{a|b|c\}*)</math> to the accepting state.
We can take <math>Z</math> and connect what was the accepting state to <math>Y</math> and then take <math>Y</math> and connect it to what was <math>s_0</math> in <math>X</math>.
- If L is not regular, then at least one subpart <math> X,\; Y,\; Z </math> is not regular. Switch does not change anything but the order of these subparts, therefore at least one subpart <math>X,\;Y,</math> or <math>Z</math> is still not regular and by composition, switch(<math>L</math>) is not regular.
Question 2 Proving Early Birds is NP-Complete
The problem considers a forest of birds which have friends (each connection represented <math>(u,v)</math>). A bird is considered an early bird if it wakes up before its friend. Given some value <math>k</math>, can we determine if there is an ordering that produces k or more early birds?
The most obvious solution is a reduction Early Birds < Independent Set, however Independent Set is not listed as an option. Fortunately Vertex Cover is.
In this case the transformation works like this:
- Each bird becomes a vertex <math> v\; \in\; V </math>, with friendships becoming edges <math> e\; \in\; E </math>.
- Does this new graph <math> G(V,E) </math> contain a vertex cover of size (V-k) or less?
- If it does, we can simply take the set of vertices not in the solution and we have the independent set/early birds.
Question 3 Donut Buying
This question is more challenging than it appears. The problem asks whether given three quantities <math>(x_1,x_2,x_3)</math> that donuts can be bought in, can you buy exactly <math>n</math> donuts? You are then asked to come up with an algorithm and respond whether it is polynomial in time. This is somewhat related to knapsack and subset sum problems which are NP-complete, however in subset sum each integer is available only once. Additionally, in the donut buying problem complexity is measured in terms of the amount you want to buy, <math>n</math>.
- Dynamic programming can provide a more efficient solution but is harder to analyze the complexity.
- A brute force approach would have complexity of <math>3^n</math> which is exponential.
Question 4 Yakutsk Wedding
Design an algorithm which given a graph <math>G(V,E)</math> and two vertices <math>s \in U,\;t \in V</math> -- determines whether two paths exist from <math> s </math> to <math> t </math>, such that there are no shared edges. The general form of this question is: does there exist 2 or more edge disjoint paths from vertex <math> s </math> to <math> t </math>. Formally, edge disjoint means that each edge in a given graph appears in at most a single path -- but, the same vertex may be shared by multiple paths.
- Scott and Phil suggested Max Flow.
- In a directed graph this is pretty simple: set the capacity of each edge to 1 and perform a max flow algorithm such as Ford Fulkerson. The flow value at the end will be equal to the number of unique paths. Overall running time is <math> O(V,E) </math>.
- When the starting graph is undirected as it is in the case of this problem, we transform each undirected edge into two opposing directed edges with capacity=1. Again Ford Fulkerson works as the max flow algorithm.
- If for any two opposing edges <math> \{v,u\},\{u,v\} </math> we can remove both edges from the flow without changing its value.
- Reference [2]
- Another approach (incorrect)
- Dijkstra's algorithm comes to mind: performing the algorithm twice, the 2nd time on an altered graph G' where G' is the graph G with edges followed in the first iteration removed.
- Is this the most efficient way?
- O(|E| + |V| log |V|) each iteration.
- Shortest path guarantees that we are not using excessive edges -- is there any odd case which selecting the shortest path would block a otherwise viable secondary path.
- The odd case is when there are two shortest paths of equivalent length.
- Dijkstra's algorithm comes to mind: performing the algorithm twice, the 2nd time on an altered graph G' where G' is the graph G with edges followed in the first iteration removed.
Question 5 Threshold Program
<math>\Pi</math> is a program that takes an integer <math>x</math> as input. <math>\Pi</math> is a threshold program if there is an integer <math>t</math> such that <math> \Pi(x)</math> returns "yes" for all <math> x\; \geq\; t</math>, and "no" for all <math> x\; \lt\; t</math>. THRESHOLD is the problem, given <math>\Pi</math>'s source code as input, of telling whether or not <math>\Pi</math> is a threshold program. Show THRESHOLD is undecidable.
Halting(Pi, x){
Pi'(x'){
Pi(x)
if (x >= t) then "yes"
else "no"
}
return Threshold (Pi')
}