The Impossibility Conjecture of Humanoid Artificial Intelligence and the Non-Benign Singularity

Abstract

[A Rough Draft of a Work-in-progress.]

The idea of machines which are almost identical to human beings has been so seductive that it has captured the imaginations of the best minds as well as laypeople for at least a century and half, perhaps more. Right after Artificial Intelligence (AI) came into being, it was almost taken for granted that soon enough we will be able to build Humanoid Robots. This has also led to some serious speculation about ‘transhumanism’. So far, we do not seem to be anywhere near this goal. It may be time now to ask whether it is even possible at all. We present a set of arguments to the effect that it is impossible to create or build Humanoid Robots or Humanoid Intelligence, where the said intelligence can substitute human beings in any situation where human beings are required or exist.

1. Humanoid Intelligence, the Singularity and Transhumanism

Before we proceed to discuss the terms of the title of this section and the arguments in the following sections, we first define the foundational terms to some degree of conciseness and preciseness:

1. Human Life: Anything and everything that the full variety of human beings are capable of, both individually and collectively. This includes not just behaviour or problem solving, but the whole gamut of capabilities, emotions, desires, actions, thoughts, consciousness, conscience, empathy, creativity and so on within an individual, as well as the whole gamut of associations and relationships, and social, political and ecological structures, crafts, art and so on that can exist in a human society or societies. This is true not just at any given moment, but over the life of the planet. Perhaps it should include even spiritual experiences and ‘revelations’ or ‘delusions’, such as those hinted at in the Philip K. Dick story, Holy Quarrel [Dick et al., 1985].

2. Humanoid: A living and reproducing entity that is almost identical to humans, either with a human-like body or without it, on a different substrate (inside a computer).

3. Intelligence: Anything and everything that the full variety of human beings are capable of, both individually and collectively, as well as both synchronically and diachronically. This includes not just behaviour or problem solving, but the whole of life as defined.

4. The Singularity: The technological point at which it is possible to create (or have) intelligence that is Humanoid or better than Humanoid.

5. Transhumanism: The idea that, after the singularity, we can have a society that is far more advanced, for the better, than the current and past human societies. From 1910 to 1927, in the three volumes of Principia Mathematica [ 1925–1927], Whitehead and Russell set out to prove that mathematics is, in some significant sense, reducible to logic. This turned out to be impossible when Godel published his incompleteness theorems in 1931 [Sheppard, 2014, Nagel et al., 2001]. During the days of origins of modern Computer Science, before and in early 1930s, it would have been easy to assume that a computing machine would ultimately solve any problem at all. This also proved to be impossible with Turing’s undecidability theorem [Hopcroft et al., 2006] and the Church-Turing thesis of computability [Copeland and Shagrir, 2018]. Since then, other kinds of problem have been shown to be undecidable.

Now that we are supposed to close be enough to the Singularity [Kurzweil, 2006] so that it may happen within the lifetime of a large number of human beings, perhaps it is time to ask ourselves whether real intelligence, in particular Humanoid Intelligence (as defined above) is possible at all. We suggest that there are enough arguments to ‘prove’ (in an informal sense) that it is impossible to build, to create or to have Humanoid Intelligence. We argue that even though the Singularity is indeed possible, perhaps even very likely (unless we stop it), it may not be what it is supposed to be. The conjecture presented here is that the Singularity is not likely to be even benign, however powerful or advanced it may be. This follows from the idea of the impossibility of Humanoid Intelligence.

2 Some Notes about the Conjecture

We have not used the term theorem for the Impossibility and the reasons for this should be evident from the arguments that we present. In particular, we do not, and perhaps cannot, use formal notation for this purpose. Even the term conjecture is used in an informal sense. The usage of terms here is closer to the legal language than to the mathematical language, because that is the best that can be done here. This may be clearer from the Definition and the Story arguments. It is due to a similar reasoning that the term ‘incompleteness’ is not used and, instead, impossibility is used, which is more appropriate for our purposes here, although Godel’s term ‘essentially incomplete’ is what we are informally arguing for about Humanoid AI, and perhaps AI in general. No claim is made as to whether or not a formal proof is possible in the future at all. What we present is an informal proof. This proof has to be centred around the distinction between Micro-AI (AI at the level of an intelligent autonomous individual entity) and Macro-AI (very large intelligent autonomous systems, possibly encompassing the whole of humanity or the world). To the best of our knowledge, such a distinction has not been proposed before. While there has been some work in this direction [Brooks, 1998, Signorelli, 2018, Yampolskiy, 2020], for lack of space, we are unable to explain how this work differs from previous such works, except by noting that the argumentation and some of the terms are novel, a bit like in the case of arguments for or against the existence of God, which question has been debated by the best of philosophers again and again over millennia, which as we will see at the end, is relevant to our discussion.

3 The Arguments for the Impossibility Conjecture for Micro-AI

The Definition Argument): Even the Peano Arithmetic [Nagel et al., 2001] is based on three undefined terms (zero, number and is successor of ), which are relatively trivial terms compared to the innumerable terms required for AI (the core terms like intelligence and human, or terms like the categories of emotions, leave alone the terms like consciousness).

The Category Argument: A great deal of AI is about classifying things into categories, but most of these categories (e.g. anger, disgust, good or bad) have no scientifically defined boundaries. This is related to the following argument.

The Story Argument: It is almost established now that many of the essential concepts of our civilisation are convenient fictions or stories [Harari, 2015] and these often form categories and are used in definitions.

The Cultural Concept Argument: Many of the terms, concepts and stories are cultural constructs. They have a long history, most of which is unknown, without which they cannot be modelled.

The Individuality, or the Nature Argument: An individual intelligent autonomous entity has to be unique and distinct from all other such entities. It originates in nature and we have no conception of how it can originate in machines. We are not even sure what this individuality exactly is. However, all through history, we have assigned some degree of accountability to human individual and we have strict provisions for punishment of individuals based on this, that indicates that we believe in the concept of the ‘self’ or the ‘autonomous individual’, even when we deny its existence, as is becoming popular today.

The Genetic Determinism Argument: Individuality is not completely determined by nature (e.g. by our genes) at birth or creation once and for all. It also develops and changes constantly as it interacts with the environment, preserving its uniqueness.

The Self-organising System Argument: Human beings and the human societies are most likely self-organising [Shiva and Shiva, 2020] and organic systems, or they are complex, non-equilibrium systems [Nicolis and Prigogine, 1977]. If so, they are unlikely to be modelled for exact replication or reproduction. The Environment, or the Nurture Argument: Both intelligence and individuality depend on the environment (or on nature). Therefore, they cannot be modelled without completely modelling the environment, i.e., going for Macro-AI. The Memory, or the Personality Argument: Both intelligence and individuality are aspects of personality, which is known to be dependent on the complete life-memory (conscious and unconscious) of an intelligent being. There is not enough evidence that it is possible to recover or model this complete temporal and environmental history of memory. A lot of our memory, and therefore our individuality and personality is integrally connected with our bodily memories.

The Susbstrsate Argument: It is often taken for granted that intelligence can be separated from the substrate and planted on a different substrate. This may be a wrong assumption. Perhaps our intelligence is integrally tied with the substrate and it is not possible to separate the body from the mind, following the previous argument.

The Causality Argument: There is little progress in modelling causality. Ultimately, the cause of an event or occurrence is not one but many, perhaps even the complete history of the universe.

The Consciousness Argument: Similarly, there is no good enough theory of consciousness even for human understanding. It is very unlikely that we can completely model human consciousness, nor is there a good reason to believe that it can emerge spontaneously under the right conditions (which conditions?).

The Incompleteness/Degeneracy of Learning Source and Representation Argument: No matter how much data or knowledge we have, it will always be both incomplete and degenerate, making it impossible to completely model intelligence.

The Explainability Argument: Deep neural networks, which are the state-of-the-art for AI, have serious problems with explainability even for specific isolated problems. Without it, we cannot be sure whether our models are developing in the right direction.

The Test Incompleteness Argument: Perfect measures of performance are not available even for problems like machine translation. We have no idea what will be the overall measure of Humanoid Intelligence. It may always be incomplete and imperfect, leading to uncertainty about intelligence.

The Parasitic Machine Argument: Machines completely depend for learning on humans and on data and knowledge provided by humans. But humans express or manifest only a small part of their intelligent capability. So machines cannot completely learn from humans without first being as intelligent as humans.

The Language Argument: Human(oid) Intelligence and its modelling depend essentially on human language(s). There is no universally accepted theory of how language works.

The Perception Interpretation Argument: Learning requires perception and perception depends on interpretation (and vice-versa), which is almost as hard a problem as modelling intelligence itself.

The Replication Argument: We are facing a scientific crisis of replication even for isolated problems. How could we be sure of replication of Humanoid Intelligence, preserving individual uniqueness?

The Human-Human Espitemic Asymmetry Argument: There is widespread inequality in human society not just in terms of money and wealth, but also in terms of knowledge and its benefits. This will not only reflect in modelling, but will make modelling harder.

The Diversity Representation Argument: Humanoid Intelligence that truly works will have to model the complete diversity of human existence in all its aspects, most of which are not even known or documented. It will have to at least preserve that diversity, which is a tall order.

The Data Colonialism Argument: Data is the new oil. Those with more power, money and influence (the Materialistic Holy Trinity) can mine more data from others, without sharing their own data. This is a classic colonial situation and it will hinder the development of Humanoid Intelligence.

The Ethical-Political Argument: Given some of the arguments above, and many others such as data bias, potential for weaponisation etc., there are plenty of ethical and political reasons that have to be taken into account while developing Humanoid Intelligence. We are not sure whether they can all be fully addressed.

The Prescriptivastion Argument: It is now recognised that ‘intelligent’ technology applied at large scale not only monitors behaviour, but changes it [Zuboff, 2018]. This means we are changing the very thing we are trying to model, and thus laying down new mechanical rules for what it means to be human.

The Wish Fulfilment (or Self-fulfilling Prophecy) Argument: Due to prescriptivisation of life itself by imperfect and inadequately intelligent machines, the problem of modeling of Humanoid Intelligence becomes a self-fulfilling prophecy, where we end up modeling not human life, but some corrupted and simplified form of life that we brought into being with ‘intelligent’ machines.

The Human Intervention Argument: There is no reason to believe that Humanoid Intelligence will develop freely of its own and will not be influenced by human intervention, quite likely to further vested interests. This will cripple the development of true Humanoid Intelligence. This intervention can take the form of secrecy, financial influence (such as research funding) and legal or structural coercion.

The Deepfake Argument: Although we do not yet have truly intelligent machines, we are able to generate data through deepfakes which are not recognisable as fakes by human beings. This deepfake data is going to proliferate and will become part of the data from which the machines learn, effectively modeling not human life, but something else.

The Chain Reaction Argument (or the Law of Exponential Growth Argument): As machines become more ‘intelligent’ they affect more and more of life and change it, even before achieving true intelligence. The speed of this change will increase exponentially and it will cause a chain reaction, leading to unforeseeable consequences, necessarily affecting the modelling of Humanoid Intelligence.

4 The Implications of the Impossibility

It follows from the above arguments that Singularity at the level of Micro-AI is impossible. In trying to achieve that, and to address the above arguments, the only possible outcome is some kind of Singularly at Macro-AI level. Such a Singularity will not lead to replication of human intelligence or its enhancement, but something totally different. It will, most probably, lead to extinction (or at least subservience, servitude) of human intelligence. To achieve just Humanoid Intelligence (Human Individual Micro-AI), even if nothing more, the AI system required will have to be nothing short of the common notion of a Single Supreme God. Singularity at the macro level will actually make the AI system, or whoever is controlling it, individual or (most probably small) collective, a Single Supreme God for all practical purposes, as far as human beings are concerned. But this will not be an All Powerful God, and not a a Kind God, for it will be Supreme within the limited scope of humanity and what humanity can have an effect on, and it will be kind only to itself, or perhaps not even that. It may be analogous to the God in the Phiilip K. Dick story Faith of Our Fathers [Dick and Lethem, 2013], or to the Big Brother of Orwell’s 1984 [Orwell, 1950]. We cannot be sure of the outcome,
of course, but those as likely outcomes as any others. That is reason enough to be very wary of
developing Humanoid Intelligence and any variant thereof.

References

Philip K. Dick, Paul Williams, and Mark. Hurst. I hope I shall arrive soon / Philip K. Dick ; edited by Mark Hurst and Paul Williams. Doubleday New York, 1st ed. edition, 1985. ISBN 0385195672.

Alfred North Whitehead and Bertrand Russell. Principia Mathematica. Cambridge University Press, 1925–1927.

Barnaby Sheppard. Gödel’s Incompleteness Theorems, page 419–428. Cambridge University Press, 2014. doi: 10.1017/CBO9781107415614.016.

E. Nagel, J.R. Newman, and D.R. Hofstadter. Godel’s Proof. NYU Press, 2001. ISBN 9780814758014. URL https://books.google.co.in/books?id=G29G3W_hNQkC.

John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. Introduction to Automata Theory, Languages, and Computation (3rd Edition). Addison-Wesley Longman Publishing Co., Inc., USA, 2006. ISBN 0321455363.

B. Jack Copeland and Oron Shagrir. The church-turing thesis: Logical limit or breachable barrier? Commun. ACM, 62(1):66–74, December 2018. ISSN 0001-0782. doi: 10.1145/3198448. URL https://doi.org/10.1145/3198448.

Ray Kurzweil. The Singularity Is Near: When Humans Transcend Biology. Penguin (Non-Classics), 2006. ISBN 0143037889.

Rodney Brooks. Prospects for human level intelligence for humanoid robots. 07 1998. Camilo Miguel Signorelli. Can computers become conscious and overcome humans? Frontiers in Robotics and AI, 5:121, 2018. doi: 10.3389/frobt.2018.00121. URL https://www.frontiersin. org/article/10.3389/frobt.2018.00121.

Roman V. Yampolskiy. Unpredictability of ai: On the impossibility of accurately predicting all actions of a smarter agent. Journal of Artificial Intelligence and Consciousness, 07(01):109–118, 2020. doi: 10.1142/S2705078520500034.

Y.N. Harari. Sapiens: A Brief History of Humankind. Harper, 2015. ISBN 9780062316103. URL https://books.google.co.in/books?id=FmyBAwAAQBAJ.

V. Shiva and K. Shiva. Oneness Vs. the 1 Percent: Shattering Illusions, Seeding Freedom. CHELSEA GREEN PUB, 2020. ISBN 9781645020394. URL https://books.google.co.in/books?
id=4TmTzQEACAAJ.

G. Nicolis and I. Prigogine. Self-Organization in Nonequilibrium Systems: From Dissipative Structures to Order Through Fluctuations. A Wiley-Interscience publication. Wiley, 1977. ISBN 9780471024019. URL https://books.google.co.in/books?id=mZkQAQAAIAAJ.

Shoshana Zuboff. The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. 1st edition, 2018. ISBN 1610395697.

P.K. Dick and J. Lethem. Selected Stories of Philip K. Dick. Houghton Mifflin Harcourt, 2013. ISBN 9780544040540. URL https://books.google.co.in/books?id=V1z9rzfTb2EC.

George Orwell. 1984. Tandem Library, centennial. edition, 1950. ISBN 0881030368. URL http://www.amazon.de/1984-Signet-Classics-George-Orwell/dp/0881030368.

Shooting Oneself in the Foot

A few years ago I had received some feedback from someone about a research paper that I was going to submit to a major conference. Paraphrasing the feedback (repeating the exact words, even with the reference, will be copying: won’t it?), I was told that there was something that I had put in the paper, which, if I insisted on retaining, might make the reviewer look at my paper in a negative light. So, if I didn’t remove that part, I would be shooting myself in the foot.

This is beside the point, but I thought what I had added was correct and so I retained it. The paper was rejected, but I would like to believe that the reason for rejection was not that I had shot myself in the foot.

Getting back to the point, this is an expression that I have come across innumerable times, mostly directed at others, but sometimes directed at me. As a person who claims to be a writer, translator as well as a researcher in a language related discipline (among other things), I can’t help obsessing about how such expressions are used and what they mean, what they show and what they hide.

But I am not interested in writing an academic paper about that. So I write something here. And you are not supposed to review this piece when I submit the next Computational Linguistics paper which might come to you for review. (See the comment functionality below?).

Recently, Chomsky used this expression in a speech, saying ‘those who are being harmed are shooting themselves in the foot’. Now, most of the time that I have come across this expression, I have thought it was being used cynically to show something which wasn’t there and to hide something that was there. Or for some other questionable purposes. However, the people using this expression were mostly respectable well meaning people. Most probably they hadn’t thought about this expression in the way that I had done. May be because if they were to do it, they would be shooting themselves in the foot.

But when Chomsky uses this expression, I can’t but believe that he is using it to mean something sensible, not cynical (if this last part looks strange to you, look up the meanings and histories of these two words, especially the second one).

I do believe that what Chomsky said was basically correct. That is, there are some people who are being harmed and they are indeed shooting themselves in the foot (I am not sure whether I am one of them or not).

The reason I am writing this is that I also believe (based on evidence, not on faith) that such people are (relatively) so few that ridiculing them or offering them advice is hardly going to matter. I must add here that Chomsky did actually caution against ridiculing such people (who have realized that they are being systematically harmed). He only expressed his disappointment that instead of doing something to stop this systematic harming, they are shooting themselves in the foot.

You see, there are also people who are being harmed and are shooting themselves in the head (or ‘consuming pesticide’). You might say that they belong to the same category because the expression is metaphorically wide enough to cover them. That might be true. But then there are also a far larger number of people who are being harmed and they are doing something very different.

They are not shooting themselves in the foot (or in the head). They are shooting others (who are also being harmed) in the foot*. Often they are also shooting others (who are also being harmed) in the head. Sometimes they are doing it for a few extra peanuts, sometimes just for the fun of it and sometimes because they have been led to believe that these targets are their enemies (or the enemies of the nation, or the enemies of the society, or of the religion, or of the community etc.). And since doing it openly is a bit problematic (not cool anymore, baby!), they often have to make it appear as if their target shot himself in the foot (or in the head), whether deliberately or accidentally.

* Perhaps they are programmed in Concurrent Euclid.

So, my take on the matter is that we should be talking about people who are being harmed and who are (literally or metaphorically) shooting others who are also being harmed, whether in the foot or in the head. Because without them, the whole shooting machinery probably won’t be able to operate. In fact, to visualize a grisly scenario, if all such people stopped shooting others (who are being harmed) and started only to shoot themselves in the foot, even then the shooting machinery will probably become dysfunctional. Fortunately, most of the people will not be interested in shooting themselves in the foot (or in the head) if they are just able to find any feasible alternative. Unfortunately, no one from above can tell a person what such an alternative means in practical terms in that person’s circumstances and it’s very hard to find it out for oneself. It’s very hard to even be sure that such an alternative exists. If it does, it’s very hard to translate it into any meaningful action. Compared to a a few decades earlier, it is infinitely harder now, given the extraordinary consolidation of the global power structure (going far beyond what Foucault had studied up to his time), to a great extent due to the techno-administrative ‘advances’ (mostly in the name of security).

There are, surely, people who are being harmed but are not shooting others (being harmed or not being harmed). I won’t say anything about them right now.

(To academic busybodies and surface-style junkies: don’t bother to count the number of times the said expression has been used in this short piece: it has been done very deliberately. Perhaps the author was trying to shoot …).

 

 

For having read the above, here is a bonus link: Fascism then. Fascism now?

Ptypho

I had then recently joined the center. As is quite fashionable (it wasn’t when I did my graduation at some other institution), the young members of the center decided to have T-shirts made with the center’s name. The student who took up the responsibility of preparing the design for the T-shirts was earlier associated with the center but had shifted to some other more respectable center.

The design was created, T-shirts were made and they were paid for and worn by almost all the members of the center. The text on them said ‘The Langauge Cookers’ or ‘The Lagnuage Cookers’ (more likely the latter), with the Language part in a very large size.

One day I was returning to the lab, along with a couple of other graduate students. An undergraduate student (most probably from a more respectable center) came from the opposite side and stopped. He stood in front of the one who was wearing that newly made T-shirt. He put his finger on the misspelled text on the T-shirt and said the following in a tone that is used to point out the incredible stupidity of someone:

– You know that this spelling is wrong?

He was from a center not dealing in mere language.

The T-shirt wearer couldn’t say anything because he hadn’t realized that there was a spelling error. I had noticed the error and had thought that the designer of the T-shirt had chosen a smart and humorous way to say something positive about the mission of the center and the discipline. I was too shocked to reply immediately, but I found the words in time:

– It’s deliberate.

Now it was his turn to be dumbstruck.

– It’s deliberate?

– Yeah, of course it’s deliberate.

I couldn’t resist being scornful. He was still dumbstruck.

– But why?

I didn’t have time to formulate a reply because he left soon after that.

I narrated the incident once or twice to others and they seemed to share my feelings.

Well, time passed (as they say), and I came to know that there were many others in the center who had not noticed the spelling error recreated in such a large size. Or they hadn’t thought about it.

Then I found out that the general consensus outside the center was that the designer of the T-shirt (along with others) had great fun at the expense of the whole center and that the typo was indeed deliberate (what else could it be?), but the designer had wanted to say something very different from what I had imagined.

He was a well liked member of the center and later moved to an Ivy League U.S. university. He remained a well liked (albeit former) member.

My head still hurts from thinking about it. But I can’t escape it because every day something reminds me of this, especially in academics.

Do I hear someone saying that there really are some typos in many posts on this blog?

Milk as Karma

Someone called someone milk
Milk as noun or milk as verb?
Milk as the subject or milk as the object?
Milk as the karta or milk as the karma?

The answer appears as a vision
Of huge torrents of something
(It could very well be milk
Of, you know, something)
Flowing from one end
Of the Zipf’s Law curve
To the other end

How Many Grams?

There is an automatically (intelligently) generated blog which I have read recently.

It appears to be (let’s give ‘seems’ some rest) quite a popular one in a certain section.

I know the corpus on which it was trained.

And the corpus on which it was retrained.

(Including most of the quotes and the comments, especially the long ones).

But I wonder whether the order of n-grams was five or six.

It is definitely better than four grams.

It could even be Se7en.

This brings up a new idea.

What about writing a paper on automatically guessing the order of n-grams, given some generated text?

It may be difficult in the general case, but in our case we know the corpus on which it was trained.

Any takers?

Accepted, but not Published

Academicians or researchers list their publications prominently on their home pages. After all, it is supposed to represent the best of their work. They also quite often (especially those who have a large number of publications) categorize them according to some criteria like the venue (workshop, conference, journal or book: in the reverse order of prominence) or peer review (unrefereed and refereed).

In this post we propose that there should be a new category of publications. This category is needed because a lot of researchers (for good or for bad) now come from underprivileged countries. For most of these researchers, traveling abroad to attend a conference, even if their paper has been accepted, is something very hard to do. In some sense even more than getting a paper accepted, which is relatively harder too, given the lack of certain privileges — whether you like the word or not — generous research grants, infrastructure, language resources etc., combined with the prejudice (it is there: I am not inventing it, whoever might be blamed for it). To these problems can be added the problem of compulsory attendance at a conference or a workshop. It is partly these conditions which have prompted suggestions from certain quarters that researchers from these countries should concentrate on journal papers (never mind the delay and difficulties involved or the unfairness of the proposition, even though it has some practical justification).

But you can never be sure while submitting that you certainly won’t be able to attend. Also, hope is said to be a good thing. Therefore, the event of a researcher submitting a paper and hoping to attend but not being able to attend cannot be ruled out.

This bring us to the proposal mentioned earlier. One solution to this problem is that there should be another category of papers: accepted but not published, because the author couldn’t afford to attend the conference or the workshop. (By the way, workshops are the most happening places nowadays: more on that later).

The author of this post must know because he has authored more than one such publications.

Of course, the condition will be that if and when such a paper is resubmitted (with or without modifications, but without any substantial new work), accepted again and finally published, the entry marked as ‘accepted’ should be removed and replaced by an entry marked as ‘published’.

After all, if we are serious about research, then the work (which has been peer reviewed and accepted) should be given somewhat more importance than some pages printed in some proceedings (or attendance in a conference for that matter).

This, of course, doesn’t mean that you can get basically the same thing published (or accepted) in more than one places.

(Sorry for the Gory Details)

P.S.: May be there is no need for the above apology as the depiction of the Gory Details of the Indian Reality is now getting multiple Oscars (The Academy Awards: the keyword is Academy). But may be there is because some researchers have a more (metaphorically) delicate constitution which can be hurt by the Gory Details.

Queen’s P.S.: Off with his head!

Hundred (Fictitious) Dollar Oscars

This has been said by someone else, but I will repeat it anyway: if the new ‘Indian’ craze in the West, Slumdog Millionaire, wins one (or possibly more) Oscars, it will be due, to a large extent, to one particular scene in the movie. After the protagonist plays foul with an American (actually the US, lest we forget altogether) tourist couple and is being beaten brutally by an Indian, the American couple rescues him, only to get the retort that ‘you wanted to see a bit of real India’. And the lady’s answer is to get a hundred dollar note (we don’t call it a ‘bill’, nor a ‘bank note’) from her husband and hand it to the offending boy with what we call a dialogue in Hindi, ‘well, here is a bit of real America, son’. As the person who mentioned this scene earlier (although I had thought of the scene in more or less the same way) also pointed out, this American lady (that’s what we now call a woman in Hindi) is shown to be the only really good person in the whole movie.

But the movie is supposed to be all about Indians, so there are no real people other than Indians except this lady. The only Western (White and presumably Christian) person in the whole movie can hardly not be a representative of the average Westerner (let alone the US Americans) as opposed to the wretched, written-to-be-wretched, Indians, especially when she makes such a grand gesture accompanied by a solid dialogue.

Since there still are people out there who are going to (or already have) criticize this movie for some crappy reason like selling India’s poverty to the West etc., one has to give out the mandatory disclaimer that one is most certainly not against this movie for any such reason. In fact, one is not really against this movie at all.

I most probably wouldn’t have commented on this movie had it not become such a sensation and also given that a lot of insightful commentators have already written about it. But now it looks very much likely that the movie is going to get that most-prestigious-in-the-globe-but-actually-the-US-American movie award named Oscar, and probably more than one. This means that the movie will be taken seriously by a lot of non-Indians and perhaps even by some Indians. And, as I indicated earlier, it is not really such a bad movie. The problem is that it is not a great movie at all, which is what it is being made out to be outside India.

And like one other commentator (pardon me for not giving references, but I am tired right now: though I can provide them on need), I find it hard to believe that it is directed by the same person who directed that movie which is in my list of Very Good Movies (in the company of movies by Bergman, Fellini, De Sica, Kubrick and the like), namely Trainspotting. Whereas that movie was exactly what it wanted to be, this movie almost fails completely, although it is still entertaining.

There are so many things which are fundamentally and very clearly wrong with this movie. Accent is, of course, one of them. I wonder whether Danny Boyle knows that the knowledge of English (and even more so its use with a particular accent) is the single most reliable indicator of one’s socio-economic status in the Indian subcontinent. And the movie shows the ‘slumdog’ using the highest caste accent whereas the elite TV show host using a pretty low caste accent (yes, Anil Kapur’s accent is not very ‘good’ and he would usually be looked down upon among a circle of people speaking in almost British accent, as does the protagonist).

I would urge Danny and his crew to go and see Tashan, which has some similarities with this movie and also stars Anil Kapur.

The movie could have been so much better if it was made in Hindi and had better casting and had hired some accent tutors like they do in Hollywood even for the all-(US)-American movies.

The second big problem is that the novel on which it is based doesn’t talk Karma-Varma at all. And the movie resolves everything at the end by saying ‘because it is written’. And Danny Boyle himself in an interview (roughly) said that you simply can’t resolve the complexities of India: they are just there. Then he said ‘they even have a philosophy for this’, which says to me that he seems to know very little about India. Yes, there is a philosophy of that kind, but there are innumerable other philosophies too.

Come on, Danny, no one in India actually says ‘I don’t know, I have got a sort of Karmic feeling about this’ or something like that (as the TV show host does). This Karmic terminology is more used in the West, than in the ‘real India’. No one really talks about ‘Karma’ here. (Even when they do, they don’t do it in this way). Though they do talk of Bhaagya and Taqdeer and Maathe Ki Lakeer etc. Which is not the same thing. And which is the reason this movie can be accused of being indulgent in post-modern Orientalism (someone else said that too).

In many parts of India, if you spoke out the word ‘Karma’ in the way Danny Boyle (or any Westerner talking about India) does, people would think you were talking about a patriotic movie starring an old Dilip Kumar pairing with one of my favorite (favourite for the less dominant party to which Danny belongs) female actors, Nutan. This ‘Karma’ is, of course, not the same word. In fact, it’s not a word at all: it’s a name.

It’s an ambiguous Named Entity that I would classify as either a Person or as an Object-Title, depending on the context.

In the same interview, Danny Boyle says about Mumbai (which we still quite often call Bambai – बंबई in Hindi and Bombay in English) that ‘they call it the Maximum City’. Well, it’s actually Suketu Mehta who calls Mumbai that. A lot can be said about that book too, but I won’t say it now.

Now the music. Well, the simple and solid fact is that A. R. Rehman has given much better music before, right from his very first hit, Roja. If some Indians start respecting him now because he wins an Oscar or two, I can only pity them. And I pity the non-Indians too: for being completely unaware of such great music even in this .mp3 era. Music which has been heard and liked by hundreds of millions of people for more than one and a half decade now.

But let me reiterate. This is not such a bad movie. Your money won’t be wasted if you go and see it. But it is definitely not ‘a gritty and realistic’ movie about India, except in some ways which are of no use to an Indian and could be misleading for a non-Indian.

Let me reiterate something else. The Indian ‘reality’ is much worse than what is depicted in the movie, which is basically a lived-happily-ever-after fantasy.

And featuring the US American lady in the movie with her fictitious hundred dollars is a cheap (pun intended) trick to win over the Western (especially American) audiences whose senses will be offended by what is shown in the movie (for the dummies: this is a deliberate but slight exaggeration). Because if the truth were told, a big share (not all, of course) of the responsibility of this worse reality of India (as of other colonized or near-colonized countries) rests with the West.

Overall, Slumdog Millionaire is in the same league as Baz Luhrmann’s Moulin Rouge. Both movies are inspired by the ‘Bollywood’ style of film making and both have directors who seem to know precious little about India but who wanted to pay some tribute to the country and its films, just as the earlier Orientalist artists paid their own tributes to the seductive, exotic East as imagined by them with their artistic temperament. But as an Indian I feel that the latter movie has a definite edge. That could be partly because it doesn’t pretend to know (and, therefore, tell) much about India.

Slumdog Millionaire’s only connection to Trainspotting, ironically, happens to be a scene that was hard to watch even for the hardened Indians: the jump in and out of the shitpot. And even this scene was done much better in Trainspotting.

There is also a serious matter that is concerned with both the style as well as the content. It’s a very tricky matter to mix realism with fantasy, which is what Slumdog Millionaire tries to do. And it does quite a bad job of it.

As it happens, Danny Boyle came and lived in India for some time for making this movie. One gets the impression that he was overwhelmed by what he saw and didn’t quite know what to make of it. And in such cases the easiest resort is to the Karmic poppycock that the movie ends at. Small mercy that it is done with the tongue at least lightly in the cheek.

P.S.: Also for the dummies, the word ‘caste’ above has been used metaphorically, not literally. Knowledge of English and the accent is a big (perhaps the biggest) determinant of the metaphorical caste in India. Even in the India of Call Centres. Or should it be ‘especially in that India’?

सांगणिक भाषाविज्ञान

जैसा मैंने पिछली प्रविष्टी (‘पोस्ट’ के लिए यह शब्द इस्तेमाल हो सकता है?) में लिखा था, अगले कुछ हफ्तों में मैं संचय के बारे में लिखने जा रहा हूं।

लेकिन क्योंकि संचय खास तौर पर (आम उपयोक्ताओं के अलावा) सांगणिक भाषाविज्ञान या भाषाविज्ञान के शोधकर्ताओं के लिए बनाया गया है, इस बात को साफ कर देना ठीक रहेगा कि सांगणिक भाषाविज्ञान या भाषाविज्ञान के माने क्या है, या अगर आप इनके माने जानते ही हैं तब भी इनसे मेरा अभिप्राय क्या है। यह दूसरी बात इसलिए कि इन विषयों (सांगणिक भाषाविज्ञान या भाषाविज्ञान) के अर्थ के बारे में आम लोगों में तो तमाम तरह की ग़लतफ़हमियाँ हैं ही, पर इन विषयों के शोधकर्ताओं में भी इनकी परिभाषा पर एक राय नहीं है।

सच तो यह है कि हिंदी जगत में तो अब भी अधिकतर लोग भाषाविज्ञान का अर्थ उस तरह के अध्ययन से लगाते हैं जो पिछली सदी के शुरू में लगाया जाता था। लेकिन बहस की इस दिशा में अभी मैं नहीं जाना चाहूंगा क्योंकि इसके बारे में कहने को इतना अधिक है कि अभी जो उद्देश्य है वो पीछे ही रह जाएगा।

वैसे सांगणिक भाषाविज्ञान या भाषाविज्ञान की परिभाषा या उनकी सीमाओं के बारे में भी कहने को बहुत-बहुत कुछ है, पर फिलहाल थोड़े से ही काम चलाया जा सकता है।

तो छोटे में कहा जाए तो भाषाविज्ञान शोध या अध्ययन का वह विषय है जिसमें किसी एक भाषा के व्याकरण का ही अध्ययन नहीं किया जाता बल्कि नैसर्गिक या मानुषिक (यानी कृत्रिम नहीं) भाषा का वैज्ञानिक रूप से अध्ययन किया जाता है। अब यह धारणा व्यापक रूप से स्वीकृत है कि मानव मस्तिष्क की संरचना का भाषा की संरचना से सीधा संबंध है और क्योंकि सभी मानवों के मस्तिष्क की संरचना मूलतः एक ही जैसी है, तो सभी नैसर्गिक या मानुषिक भाषाओं में भी सतही लक्षणों को छोड़ कर बाकी सब एक ही जैसा है। इसीलिए, जैसा कि इन विषयों के आधुनिक साहित्य में प्रसिद्ध है, अगर किसी अमरीकी के शिशु को जन्म के तुरंत बाद कोई चीनी परिवार गोद ले ले और वह बच्चा चीन में ही पले तो वह उतनी आसानी से चीनी बोलना सीखेगा जितनी आसानी से कोई चीनी परिवार का बच्चा। ऐसी ढेर सारी और बातें हैं, पर मुख्य बात है कि भाषाविज्ञान नैसर्गिक या मानुषिक भाषा का वैज्ञानिक अध्ययन है।

कम से कम कोशिश तो यही है कि अध्ययन वैज्ञानिक रहे, पर वो वास्तव में रह पाता है या नहीं, यह बहस का विषय है।

अब सांगणिक भाषाविज्ञान पर आएं तो इस विषय में हमारा ध्यान मानवों की बजाय संगणक यानी कंप्यूटर पर आ जाता है, पर पिछली शर्त फिर भी लागू रहती है: नैसर्गिक या मानुषिक भाषा का वैज्ञानिक अध्ययन। अंतर यह है कि हमारा उद्देश्य अब यह हो जाता है कि कंप्यूटर को इस लायक बनाया जा सके कि वो नैसर्गिक या मानुषिक भाषा को समझ सके और उसका प्रयोग कर सके। जाहिर है यह अभी बहुत दूर की बात है और इसमें कोई आश्चर्य भी नहीं होना चाहिए क्योंकि अभी भाषाविज्ञान में ही (पिछली सदी की असाधारण उपलब्धियों के बाद भी) वैज्ञानिक ढेर सारी बाधाओं में फंसे हैं।

फिर भी, सांगणिक भाषाविज्ञान में काफ़ी कुछ संभव हो चुका है और काफ़ी कुछ आगे (निकट भविष्य में) संभव हो सकता है। लेकिन इसमें कंप्यूटर का मानव जैसे भाषा बोलना-समझना शामिल नहीं है। जो शामिल है वो हैं ऐसी तकनीक जो दस्तावेजों को ज़्यादा अच्छी तरह ढूंढ सकें, उनका सारांश बना सकें, कुछ हद तक उनका अनुवाद कर सकें आदि।

लेकिन हिंदुस्तानी परिप्रेक्ष्य में परेशानी यह है कि हम अभी इस हालत में भी नहीं पहुंचे हैं कि आसानी से कंप्यूटर का एक बेहतर टाइपराइटर की तरह ही उपयोग कर सकें। इस दिशा में कुछ उपलब्धियाँ हुई हैं, पर अंग्रेज़ी या प्रमुख यूरोपीय भाषाओं की तुलना में हम कहीं भी नहीं हैं। जैसा कि आपमें से अधिकतर जानते ही हैं, यह एक लंबी कहानी है जिसे अभी छोड़ देना ही ठीक है।

पर संचय का विकास इसी परिप्रेक्ष्य में किया गया है, जिसके बारे में आगे बात करेंगे।

संचय का परिचय

पिछली पोस्ट (शर्म के साथ कहना पड़ रहा है कि पोस्ट के लिए कोई उपयुक्त शब्द नहीं ढूंढ पा रहा हूं) में मैंने (अंग्रेज़ी में) संचय के नये संस्करण के बारे में लिखा था। मज़े की बात है कि संचय के बारे में मैंने अभी हिंदी में शायद ही कुछ लिखा हो। इस भूल को सुधारने की कोशिश में अब अगले कुछ हफ्तों में संचय के बारे में कुछ लिखने का सोचा है।

तो संचय कौन है? या संचय क्या है?

पहले सवाल का तो जवाब (अमरीकी शब्दावली में) यह है कि संचय एक सिंगल पेरेंट चाइल्ड है जिसे किसी वेलफेयर का लाभ तो नहीं मिल रहा पर जिस पर बहुत सी ज़िम्मेदारियाँ हैं।

दूसरे सवाल का जवाब यह है कि संचय सांगणिक भाषाविज्ञान (कंप्यूटेशनल लिंग्विस्टिक्स) या भाषाविज्ञान के क्षेत्र में काम कर रहे शोधकर्ताओं के लिए उपयोगी सांगणिक औजारों का एक मुक्त (मुफ्त भी कह सकते हैं) तथा ओपेन सोर्स संकलन है। पर खास तौर से यह कंप्यूटर पर भारतीय भाषाओं का उपयोग करने वाले किसी भी व्यक्ति के काम आ सकता है। इसकी एक विशेषता है कि इसमें नयी भाषाओं तथा एनकोडिंगों को आसानी से शामिल किया जा सकता है। लगभग सभी प्रमुख भारतीय भाषाएं इसमें पहले से ही शामिल हैं और संचय में उनके उपयोग के लिए ऑपरेटिंग सिस्टम पर आप निर्भर नहीं है, हालांकि अगर ऑपरेटिंग सिस्टम में ऐसी कोई भी भाषा शामिल है तो उस सुविधा का भी आप उपयोग संचय में कर सकते हैं। यही नहीं, संचय का एक ही संस्करण विंडोज़ तथा लिनक्स/यूनिक्स दोनों पर काम करता है, बशर्ते आपने जे. डी. के. (जावा डेवलपमेंट किट) इंस्टॉल कर रखा हो। यहाँ तक कि आपकी भाषा का फोंट भी ऑपरेटिंग सिस्टम में इंस्टॉल होना ज़रूरी नहीं है।

संचय का वर्तमान संस्करण 0.3.0 है। इस संस्करण में पिछले संस्करण से सबसे बड़ा अंतर यह है कि अब एक ही जगह से संचय के सभी औजार इस्तेमाल किए जा सकते हैं, अलग-अलग स्क्रिप्ट का नाम याद रखने की ज़रूरत नहीं है। कुल मिला कर बारह औजार (ऐप्लीकेशंस) शामिल किए गए हैं, जो हैं:

  1. संचय पाठ संपादक (टैक्सट एडिटर)
  2. सारणी संपादक (टेबल एडिटर)
  3. खोज-बदल-निकाल औजार (फाइंड रिप्लेस ऐक्सट्रैक्ट टूल)
  4. शब्द सूची निर्माण औजार (वर्ड लिस्ट बिल्डर)
  5. शब्द सूची विश्लेषण औजार (वर्ड लिस्ट ऐनेलाइज़र ऐंड विज़ुअलाइज़र)
  6. भाषा तथा एनकोडिंग पहचान औजार (लैंग्वेज ऐंड एनकोडिंग आइडेंटिफिकेशन)
  7. वाक्य रचना अभिटिप्पण अंतराफलक (सिन्टैक्टिक ऐनोटेशन इंटरफेस)
  8. समांतर वांगमय अभिटिप्पण अंतराफलक (पैरेलल कोर्पस ऐनोटेशन इंटरफेस)
  9. एन-ग्राम भाषाई प्रतिरूपण (एन-ग्राम लैंग्वेज मॉडेलिंग टूल)
  10. संभाषण वांगमय अभिटिप्पण अंतराफलक (डिस्कोर्स ऐनोटेशन इंटरफेस)
  11. दस्तावेज विभाजक (फाइल स्प्लिटर)
  12. स्वचालित अभिटिप्पण औजार (ऑटोमैटिक ऐनोटेशन टूल)

अगर इनमें से अधिकतर का सिर-पैर ना समझ आ रहा हो तो थोड़ा इंतज़ार करें। आगे इनके बारे में अधिक जानकारी देने की कोशिश रहेगी।

शायद इतना और जोड़ देने में कोई बुराई नहीं है कि संचय पिछले कुछ सालों से इस नाचीज़ के जिद्दी संकल्प का परिणाम है, जिसमें कुछ और लोगों का भी सहयोग रहा है, चाहे थोड़ा-थोड़ा ही। उन सभी लोगों के नाम संचय के वेबस्थल पर जल्दी ही देखे जा सकेंगे। ये लगभग सभी विद्यार्थी हैं (या थे) जिन्होंने मेरे ‘मार्गदर्शन’ में किसी परियोजना – प्रॉजेक्ट – पर काम किया था या कर रहे हैं।

उम्मीद है कि संचय का इससे भी अगला संस्करण कुछ महीने में आ पाएगा और उसमें और भी अधिक औजार तथा सुविधाएं होंगी।

Good News and Bad News on the CL Front

First, as the saying goes, the bad news. We had submitted a proposal for the Second Workshop on NLP for Less Privileged Languages for the ACL-affiliated conferences. That proposal has not been accepted. Total proposals submitted were 41 and 34 out of them were accepted. Ours was among the not-accepted seven (euphemisms can be consoling).

Was is that bad? I hope not.

Don’t those capital letters look silly in the name of a rejected proposal?

Now the good news. The long awaited new version of Sanchay has been released on Sourceforge. (Well, at least I was awaiting). This version has been named (or numbered?) 0.3.0.

The new Sanchay is a significant improvement over the last public version (0.2). It now has one main GUI from which all the applications can be controlled. There are twelve (GUI based) applications which have been included in this version. These are:

  • Sanchay Text Editor that is connected to some other NLP/CL components of Sanchay.
  • Table Editor with all the usual facilities.
  • A more intelligent Find-Replace-Extract Tool (can search over annotated data and allows you to see the matching files in the annotation interface).
  • Word List Builder.
  • Word List FST (Finite State Transducer) Visualizer that can be useful for anyone working with morphological analysis etc.
  • One of the most accurate Language and Encoding Identifier that is currently trained for 54 langauge-encoding pairs, including most of the major Indian languages. (Yes, I know there is a number agreement problem in the previous sentence).
  • A user friendly Syntactic Annotation Interface that is perhaps the most heavily used part of Sanchay till now. Hopefully there will be an even more user friendly version soon.
  • A Parallel Corpus Annotation Interface, which is another heavily used component. (Don’t take that ‘heavily’ too seriously).
  • An N-gram Language Modeling Tool that allows you to compile models in terms of bytes, letters and words.
  • A Discourse Annotation Interface that is yet to be actually used.
  • A more intelligent File Splitter.
  • An Automatic Annotation tool for POS (Part Of Speech) tagging, chunking and Named Entity Recognition. The first two should work reasonably well, but the last one may not be that useful for practical purposes. This is a CRF (Conditional Random Fields) based tool and it has been trained for Hindi for these three purposes. If you have annotated data, you can use it to train your own taggers and chunkers.

All these components use the customizable language-encoding support, especially useful for South Asian languages, that doesn’t need any support from the operating system or even the installation of any fonts, although these can still be used inside Sanchay if they are there.

More information is available at the Sanchay Home.

The capitals don’t look so bad for a released version.

The downside of even this good news is that my other urgent (to me) work has got delayed as I was working almost exclusively on bringing out this version for the last two weeks or so.

But then you need a reason to wake up and Sanchay is one of my reasons. And I can proudly say that a half-hearted attempt to generate funding for this project by posting it on Micropledge has generated 0$.

Sanchay is still alive as a single parent child without any welfare but with a lot of responsibilities.

Now I can have nightmares about the bugs.