Data Stat: 2008

Monday, November 24, 2008

Statistic Assumptions

• Normal distribution of data (which can be tested by using a normality test, such as the Shapiro-Wilk and Kolmogorov-Smirnov tests).

• Equality of variances (which can be tested by using the F test, the more robust Levene's test, Bartlett's test, or the Brown-Forsythe test).

• Samples may be independent or dependent, depending on the hypothesis and the type of samples:

o Independent samples are usually two randomly selected groups

o Dependent samples are either two groups matched on some variable (for example, age) or are the same people being tested twice (called repeated measures)

Since all calculations are done subject to the null hypothesis, it may be very difficult to come up with a reasonable null hypothesis that accounts for equal means in the presence of unequal variances. In the usual case, the null hypothesis is that the different treatments have no effect — this makes unequal variances untenable. In this case, one should forgo the ease of using this variant afforded by the statistical packages. See also Behrens–Fisher problem.
One scenario in which it would be plausible to have equal means but unequal variances is when the 'samples' represent repeated measurements of a single quantity, taken using two different methods. If systematic error is negligible (e.g. due to appropriate calibration) the effective population means for the two measurement methods are equal, but they may still have different levels of precision and hence different variances.

Determining type

For novices, the most difficult issue is often whether the samples are independent or dependent. Independent samples typically consist of two groups with no relationship. Dependent samples typically consist of a matched sample (or a "paired" sample) or one group that has been tested twice (repeated measures).
Dependent t-tests are also used for matched-paired samples, where two groups are matched on a particular variable. For example, if we examined the heights of men and women in a relationship, the two groups are matched on relationship status. This would call for a dependent t-test because it is a paired sample (one man paired with one woman). Alternatively, we might recruit 100 men and 100 women, with no relationship between any particular man and any particular woman; in this case we would use an independent samples test.
Another example of a matched sample would be to take two groups of students, match each student in one group with a student in the other group based on an achievement test result, then examine how much each student reads. An example pair might be two students that score 90 and 91 or two students that scored 45 and 40 on the same test. The hypothesis would be that students that did well on the test may or may not read more. Alternatively, we might recruit students with low scores and students with high scores in two groups and assess their reading amounts independently.
An example of a repeated measures t-test would be if one group were pre- and post-tested. (This example occurs in education quite frequently.) If a teacher wanted to examine the effect of a new set of textbooks on student achievement, (s)he could test the class at the beginning of the year (pretest) and at the end of the year (posttest). A dependent t-test would be used, treating the pretest and posttest as matched variables (matched by student).

Statistic Uses

Among the most frequently used t tests are:

* A test of whether the mean of a normally distributed population has a value specified in a null hypothesis.
* A test of the null hypothesis that the means of two normally distributed populations are equal. Given two data sets, each characterized by its mean, standard deviation and number of data points, we can use some kind of t test to determine whether the means are distinct, provided that the underlying distributions can be assumed to be normal. All such tests are usually called Student's t tests, though strictly speaking that name should only be used if the variances of the two populations are also assumed to be equal; the form of the test used when this assumption is dropped is sometimes called Welch's t test. There are different versions of the t test depending on whether the two samples are
o unpaired, independent of each other (e.g., individuals randomly assigned into two groups, measured after an intervention and compared with the other group[4]), or
o paired, so that each member of one sample has a unique relationship with a particular member of the other sample (e.g., the same people measured before and after an intervention[4]).

If the calculated p-value is below the threshold chosen for statistical significance (usually the 0.10, the 0.05, or 0.01 level), then the null hypothesis which usually states that the two groups do not differ is rejected in favor of an alternative hypothesis, which typically states that the groups do differ.

* A test of whether the slope of a regression line differs significantly from 0.

Once a t value is determined, a p-value can be found using a table of values from Student's t-distribution.

Sunday, November 23, 2008

Correlation

From the free encyclopedia

This article is about the correlation coefficient between two variables. For other uses, see Correlation (disambiguation).

Several sets of (x, y) points, with the correlation coefficient of x and y for each set. Note that the correlation reflects the noisiness and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom). N.B.: the figure in the center has a slope of 0 but in that case the correlation coefficient is undefined because the variance of Y is zero.

Look up Correlation in
Wiktionary, the free dictionary.
In probability theory and statistics, correlation (often measured as a correlation coefficient) indicates the strength and direction of a linear relationship between two random variables. That is in contrast with the usage of the term in colloquial speech, denoting any relationship, not necessarily linear. In general statistical usage, correlation or co-relation refers to the departure of two random variables from independence. In this broad sense there are several coefficients, measuring the degree of correlation, adapted to the nature of the data.
A number of different coefficients are used for different situations. The best known is the Pearson product-moment correlation coefficient, which is obtained by dividing the covariance of the two variables by the product of their standard deviations. Despite its name, it was first introduced by Francis Galton.[1]

Statistical assumptions

When the number of measurements, N, is larger than the number of unknown parameters, k, and the measurement errors εi (see below) are normally distributed then the excess of information contained in N - k) measurements is used make the following statistical predictions about the unknown parameters:
• confidence intervals of unknown parameters.

Independent measurements

Quantitatively, this is explained by the following example: Consider a regression model with, say, three unknown parameters β0, β1 and β2. An experimenter performed 10 repeated measurements at exactly the same value of independent variables X. In this case regression analysis fails to give a unique value for the three unknown parameters: the experimenter did not provide enough information. The best one can do is to calculate the average value of the dependent variable Y and its standard deviation.
If the experimenter had performed five measurements at X1, four at X2 and one at X3, where X1, X2 and X3 are different values of the independent variable X then regression analysis would provide a unique solution to unknown parameters β.
In the case of general linear regression (see below) the above statement is equivalent to the requirement that matrix XTX is regular (that is: it has an inverse matrix).

Regression equation

It is convenient to assume an environment in which an experiment is performed: the dependent variable is then outcome of a measurement.

The regression equation deals with the following variables:
• The unknown parameters denoted as β. This may be a scalar or a vector of length k.
• The independent variables, X.
• The dependent variable, Y.

Regression equation is a function of variables X and β.

The user of regression analysis must make an intelligent guess about this function. Sometimes the form of this function is known, sometimes he must apply a trial and error process.
Assume now that the vector of unknown parameters, β is of length k. In order to perform a regression analysis the user must provide information about the dependent variable Y:

• If the user performs the measurement N times, where N < k, regression analysis cannot be performed: there is not provided enough information to do so.

• If the user performs N independent measurements, where N = k, then the problem reduces to solving a set of N equations with N unknowns β.

• If, on the other hand, the user provides results of N independent measurements, where N > k regression analysis can be performed. Such a system is also called an overdetermined system;

In the last case the regression analysis provides the tools for:

1. finding a solution for unknown parameters β that will, for example, minimize the distance between the measured and predicted values of the dependent variable Y (also known as method of least squares).

2. under certain statistical assumptions the regression analysis uses the surplus of information to provide statistical information about the unknown parameters β and predicted values of the dependent variable Y.

Regression diagnostics

Once a regression model has been constructed, it may be important to confirm the goodness of fit of the model and the statistical significance of the estimated parameters. Commonly used checks of goodness of fit include the R-squared, analyses of the pattern of residuals and hypothesis testing. Statistical significance can be checked by an F-test of the overall fit, followed by t-tests of individual parameters.

Interpretations of these diagnostic tests rest heavily on the model assumptions. Although examination of the residuals can be used to invalidate a model, the results of a t-test or F-test are sometimes more difficult to interpret if the model's assumptions are violated. For example, if the error term does not have a normal distribution, in small samples the estimated parameters will not follow normal distributions, which complicates inference. With relatively large samples, however, a central limit theorem can be invoked such that hypothesis testing may proceed using asymptotic approximations.

Regression analysis

From the free encyclopedia

In statistics, regression analysis is a collective name for techniques for the modeling and analysis of numerical data consisting of values of a dependent variable (also called response variable or measurement) and of one or more independent variables (also known as explanatory variables or predictors). The dependent variable in the regression equation is modeled as a function of the independent variables, corresponding parameters ("constants"), and an error term. The error term is treated as a random variable. It represents unexplained variation in the dependent variable. The parameters are estimated so as to give a "best fit" of the data. Most commonly the best fit is evaluated by using the least squares method, but other criteria have also been used.

Regression can be used for prediction (including forecasting of time-series data), inference, hypothesis testing, and modeling of causal relationships. These uses of regression rely heavily on the underlying assumptions being satisfied. Regression analysis has been criticized as being misused for these purposes in many cases where the appropriate assumptions cannot be verified to hold.[1][2] One factor contributing to the misuse of regression is that it can take considerably more skill to critique a model than to fit a model

Friday, August 29, 2008

The Content Side

Blog info

Logic With Markov
" Technology markov for images...."
Other Markov Data
" At the intersection of statistical physics and probability theory .... "
All About Hardware
"..... the technologi hardware is always ....."
MathType
"MathTypeTM is an intelligent mathematical equation editor designed for personal computers running Microsoft Windows ...." -
E-Commerce
Have you ever thought about working an eCommerce business?
Dreamweaver
"Macromedia Dreamweaver MX 2004 is a professional HTML editor for designing,..... "
Notebook Buying Tips?

Sunday, August 3, 2008

Dasar Teori Bab II

BAB II
TINJAUAN PUSTAKA

2.1 Teori Dasar Komunikasi
2.1.1 Definisi Dasar Komunikasi
2.1.2 Kajian Teori Komunikasi
2.2 Pengertian Komunikasi Kelompok
2.2.1 Karakteristik Komunikasi Kelompok
2.2.2 Pengertian Kelompok
2.2.3 Timbulnya Kelompok
2.2.4 Klasifikasi Kelompok
2.2.5 Tujuan Kelompok
2.2.6 Karakteristik Kelompok
2.2.7 Diskusi Kelompok
2.2.8 Komposisi Kelompok
2.3 Kohesivitas Kelompok
2.3.1 Pengertian Kohesivitas
2.3.2 Aspek-aspek Kohesivitas
2.3.3 Pengertian Kohesivitas Kelompok
2.3.4 Kekuatan Kohesivitas Kelompok
2.4 Teori ( Pembahasan )
2.4.1 Dasar ( Pembahasan )
2.4.2 Pengertian ( Pembahasan )
2.4.3 Komponen ( Pembahasan )
2.4.4 Pembentukan dan Perubahan ( Pembahasan )

Teori Validitas Ordinal

TEORI VALIDITAS DATA ORDINAL

This is information about the teori of validitas with ordinal data that you can use for your more information referensi, this is only for my clien in indonesia so i made it in 'bahasa'. This is the data ;

Validitas menunjukkan ukuran yang benar-benar mengukur apa yang akan diukur. Jadi dapat dikatakan semakin tinggi validitas suatu alat test, maka alat test tersebut semakin mengenai pada sasarannya, atau semakin menunjukkan apa yang seharusnya diukur. Suatu test dapat dikatakan mempunyai validitas tinggi apabila test tersebut menjalankan fungsi ukurnya, atau memberikan hasil ukur sesuai dengan makna dan tujuan diadakannya test tersebut. Jika peneliti menggunakan kuesioner di dalam pengumpulan data penelitian, maka item-item yang disusun pada kuesioner tersebut merupakan alat test yang harus mengukur apa yang menjadi tujuan penelitian.
Salah satu cara untuk menghitung validitas suatu alat test yaitu dengan melihat daya pembeda item (item discriminality). Daya pembeda item adalah metode yang paling tepat digunakan untuk setiap jenis test. Daya pembeda item dalam penalitian ini dilakukan denan cara : “ korelasi item-total ”.
Korelasi item-total yaitu konsistensi antara skor item dengan skor secara keseluruhan yang dapat dilihat dari besarnya koefisien korelasi antara setiap item dengan skor keseluruhan, yang dalam penelitian ini menggunakan koefisien korelasi Rank – Spearman dengan langkah-langkah perhitungan sebagai berikut :

Koefisien Korelasi Rank Sperman
Apabila item yang dihadapi berbentuk skala ordinal (skala sikap), maka untuk nilai korelasi rank spearman pada item ke-i adalah :

Rumus diatas digunakan apabila tidak terdapat data kembar, atau terdapat data kembar namun sedikit. Apabila terdapat banyak data kembar digunakan rumus berikut ini

dimana : R(X) = Ranking nilai X
R(Y) = Ranking nilai Y

Bila koefisien korelasi untuk seluruh item telah dihitung, perlu ditentukan angka terkecil yang dapat dianggap cukup “ tinggi ” sebagai indikator adanya konsistensi antara skor item dan skor keseluruhan. Dalam hal ini tidak ada batasan yang tegas. Prinsip utama pemilihan item dengan melihat koefisien korelasi adalah mencari harga koefisien yang setinggi mungkin dan menyingkirkan setiap item yang mempunyai korelasi negatif (-) atau koefisien yang mendekati nol (0,00).
Menurut Friedenberg (1995) biasanya dalam pengembangan dan penyusunan skala-skala psikologi, digunakan harga koefisien korelasi yang minimal sama dengan 0,30. Dengan demikian, semua item yang memiliki korelasi kurang dari 0,30 dapat disisihkan dan item-item yang akan dimasukkan dalam alat test adalah item-item yang memiliki korelasi diatas 0,30 dengan pengertian semakin tinggi korelasi itu mendekati angka satu (1,00) maka semakin baik pula konsistensinya (validitasnya).

Or you can download it Download

Teori Reliabilitas Ordinal (Metode alfa cronbach)

RELIABILITAS DATA ORDINAL

Reliabilitas artinya adalah tingkat keterpercayaan hasil suatu pengukuran. Pengukuran yang memiliki reliabilitas tinggi, yaitu pengukuran yang mampu memberikan hasil ukur yang terpercaya (reliabel). Reliabilitas merupakan salah satu ciri atau karakter utama intrumen pengukuran yang baik. Kadang-kadang reliabilitas disebut juga sebagai keterpercayaan, keterandalan, keajegan, konsistensi, kestabilan, dan sebagainya, namun ide pokok dalam konsep reliabilitas adalah sejauh mana hasil suatu pengukuran dapat dipercaya, artinya sejauh mana skor hasil pengukuran terbebas dari kekeliruan pengukuran (measurement error).
Tinggi rendahnya reliabilitas, secara empiris ditunjukkan oleh suatu angka yang disebut koefisien reliabilitas. Walaupun secara teoritis, besarnya koefisien reliabilitas berkisar antara 0,00 – 1,00; akan tetapi pada kenyataannya koefisien reliabilitas sebesar 1,00 tidak pernah dicapai dalam pengukuran, karena manusia sebagai subjek pengukuran psikologis merupakan sumber kekeliruan yang potensial. Di samping itu walaupun koefisien korelasi dapat bertanda positif (+) atau negatif (-), akan tetapi dalam hal reliabilitas, koefisien reliabilitas yang besarnya kurang dari nol (0,00) tidak ada artinya karena interpretasi reliabilitas selalu mengacu kepada koefisien reliabilitas yang positif.
Teknik perhitungan koefisien reliabilitas yang digunakan disini adalah dengan menggunakan Koefisien Reliabilitas Alpha yang dihitung dengan menggunakan rumus sebagai berikut :

dimana :
k adalah banyaknya belahan item
Si2 adalah varians dari item ke-i
S2total adalah total varians dari keseluruhan item
Bila koefisien reliabilitas telah dihitung, maka untuk menentukan keeratan hubungan bisa digunakan kriteria Guilford (1956), yaitu :
1. kurang dari 0,20 : Hubungan yang sangat kecil dan bisa diabaikan
2. 0,20 - < 0,40 : Hubungan yang kecil (tidak erat)
3. 0,40 - < 0,70 : Hubungan yang cukup erat
4. 0,70 - < 0,90 : Hubungan yang erat (reliabel)
5. 0,90 - < 1,00 : Hubungan yang sangat erat (sangat reliabel)
6. 1,00 : Hubungan yang sempurna

SUMBER :
Guilford ,J.P., Psychometric Methods , Tata McGraw-Hill Publishing Company Limited 1979.
Friedenberg, Lisa, Psychological Testing, Design, Analysis and Use, Allyn and Bacon 1995

Teori Reliabilitas Nominal (Metode Kuder-Richardson /KR-20)

RELIABILITAS DATA NOMINAL

Reliabilitas artinya adalah tingkat keterpercayaan hasil suatu pengukuran. Pengukuran yang memiliki reliabilitas tinggi, yaitu pengukuran yang mampu memberikan hasil ukur yang terpercaya (reliabel). Reliabilitas merupakan salah satu ciri atau karakter utama intrumen pengukuran yang baik. Kadang-kadang reliabilitas disebut juga sebagai keterpercayaan, keterandalan, keajegan, konsistensi, kestabilan, dan sebagainya, namun ide pokok dalam konsep reliabilitas adalah sejauh mana hasil suatu pengukuran dapat dipercaya, artinya sejauh mana skor hasil pengukuran terbebas dari kekeliruan pengukuran (measurement error).
Tinggi rendahnya reliabilitas, secara empiris ditunjukkan oleh suatu angka yang disebut koefisien reliabilitas. Walaupun secara teoritis, besarnya koefisien reliabilitas berkisar antara 0,00 – 1,00; akan tetapi pada kenyataannya koefisien reliabilitas sebesar 1,00 tidak pernah dicapai dalam pengukuran, karena manusia sebagai subjek pengukuran psikologis merupakan sumber kekeliruan yang potensial. Di samping itu walaupun koefisien korelasi dapat bertanda positif (+) atau negatif (-), akan tetapi dalam hal reliabilitas, koefisien reliabilitas yang besarnya kurang dari nol (0,00) tidak ada artinya karena interpretasi reliabilitas selalu mengacu kepada koefisien reliabilitas yang positif.
Teknik perhitungan koefisien reliabilitas yang digunakan disini adalah dengan menggunakan Koefisien Reliabilitas Kuder-Richardson (KR-20), metode ini merupakan koefisien reliabilitas yang dapat menggambarkan variasi dari item-item untuk jawaban benar/salah yang diberi skor 0 atau 1 (Guilford and Benjamin, 1978).

Koefisien Reliabilitas Kuder-Richardson (KR-20) dapat dihitung dengan menggunakan rumus sebagai berikut :

dimana : n = jumlah item
S2 = Varians total
p = Proporsi dari orang yang menjawab benar pada item ke-i.
1- p = Proporsi dari orang yang menjawab salah pada item = q

Bila koefisien reliabilitas telah dihitung, maka untuk menentukan keeratan hubungan bisa digunakan kriteria Guilford (1956), yaitu :
1. kurang dari 0,20 : Hubungan yang sangat kecil dan bisa diabaikan
2. 0,20 - < 0,40 : Hubungan yang kecil (tidak erat)
3. 0,40 - < 0,70 : Hubungan yang cukup erat
4. 0,70 - < 0,90 : Hubungan yang erat (reliabel)
5. 0,90 - < 1,00 : Hubungan yang sangat erat (sangat reliabel)
6. 1,00 : Hubungan yang sempurna

SUMBER :
Guilford ,J.P., Psychometric Methods , Tata McGraw-Hill Publishing Company Limited 1979.
Friedenberg, Lisa, Psychological Testing, Design, Analysis and Use, Allyn and Bacon 1995

Teori Validitas Nominal

TEORI VALIDITAS DATA NOMINAL

A. VALIDITAS
Validitas menunjukkan ukuran yang mengukur apa yang akan diukur. Jadi dapat dikatakan semakin tinggi validitas suatu alat test, maka alat test tersebut semakin mengenai pada sasarannya, atau semakin menunjukkan apa yang seharusnya diukur. Suatu test dapat dikatakan mempunyai validitas tinggi apabila test tersebut menjalankan fungsi ukurnya, atau memberikan hasil ukur sesuai dengan makna dan tujuan diadakannya test tersebut. Jika peneliti menggunakan kuesioner di dalam pengumpulan data penelitian, maka item-item yang disusun pada kuesioner tersebut merupakan alat test yang harus mengukur apa yang menjadi tujuan penelitian.
Salah satu cara untuk menghitung validitas suatu alat test yaitu dengan melihat daya pembeda item (item discriminality). Daya pembeda item adalah metode yang paling tepat digunakan untuk setiap jenis test. Daya pembeda item dalam penalitian ini dilakukan denan cara : “ korelasi item-total ”. Korelasi item-total yaitu konsistensi antara skor item dengan skor secara keseluruhan yang dapat dilihat dari besarnya koefisien korelasi antara setiap item dengan skor keseluruhan, yang dalam penelitian ini menggunakan koefisien korelasi Point Biserial dengan langkah-langkah perhitungan sebagai berikut :

Koefisien Korelasi Point Biserial
Apabila bentuk item adalah dichotomous (correct/incorrect, true/false). Rumus untuk korelasi point-biserial pada item ke-i adalah :

dimana : X =Rata-rata pada test untuk semua orang
Xi =Rata-rata pada test hanya untuk orang-orang yang menjawab benar pada item ke-i
p = Proporsi dari orang yang menjawab benar pada item ke-i.
1- p = Proporsi dari orang yang menjawab salah pada item ke-i.
SDx = Standar deviasi pada test untuk semua orang

Bila koefisien korelasi untuk seluruh item telah dihitung, perlu ditentukan angka terkecil yang dapat dianggap cukup “ tinggi ” sebagai indikator adanya konsistensi antara skor item dan skor keseluruhan. Dalam hal ini tidak ada batasan yang tegas. Prinsip utama pemilihan item dengan melihat koefisien korelasi adalah mencari harga koefisien yang setinggi mungkin dan menyingkirkan setiap item yang mempunyai korelasi negatif (-) atau koefisien yang mendekati nol (0,00).
Menurut Friedenberg (1995) biasanya dalam pengembangan dan penyusunan skala-skala psikologi, digunakan harga koefisien korelasi yang minimal sama dengan 0,30. Dengan demikian, semua item yang memiliki korelasi kurang dari 0,30 dapat disisihkan dan item-item yang akan dimasukkan dalam alat test adalah item-item yang memiliki korelasi diatas 0,30 dengan pengertian semakin tinggi korelasi itu mendekati angka satu (1,00) maka semakin baik pula konsistensinya (validitasnya).

Wednesday, June 11, 2008

Markov

Markov

Markov Random Fields and Images
by
Patrick P_erez

At the intersection of statistical physics and probability theory, Markov random_elds and Gibbs distributions have emerged in the early eighties as powerful tools for modeling images and coping with high-dimensional inverse problems from low-level vision. Since then, they have been used in many studies from the image processing and computer vision community. A brief and simple introduction to the basics of the domain is proposed.

1. Introduction and general framework
With a seminal paper by Geman and Geman in 1984 [18], powerful tools known for long by physicists [2] and statisticians [3] were brought in a com-prehensive and stimulating way to the knowledge of the image processing and computer vision community. Since then, their theoretical richness, their prac-tical versatility, and a number of fruitful connections with other domains, have resulted in a profusion of studies. These studies deal either with the mod-eling of images (for synthesis, recognition or compression purposes) or with the resolution of various high-dimensional inverse problems from early vision (e.g., restoration, deblurring, classi_cation, segmentation, data fusion, surface reconstruction, optical ow estimation, stereo matching, etc. See collections of examples in [11, 30, 40]).
The implicit assumption behind probabilistic approaches to image analysis is that, for a given problem, there exists a probability distribution that can capture to some extent the variability and the interactions of the di_erent sets of relevant image attributes. Consequently, one considers the variables of the problem as random variables forming a set (or random vector) X = (Xi)ni=1 with joint probability distribution PX 1.
1 PX is actually a probability mass in the case of discrete variables, and a probability density
function when the Xi's are continuously valued. In the latter case, all summations over
states or con_gurations should be replaced by integrals.

Tuesday, June 10, 2008

Logic Of Triplet Markov Fields

Unsupervised image segmentation using triplet Markov fields
by
Dalila Benboudjema, Wojciech Pieczynski

Abstract
Hidden Markov fields (HMF) models are widely applied to various problems arising in
image processing. In these models, the hidden process of interest X is a Markov field and must be estimated from its observable noisy version Y. The success of HMF is mainly due to the fact that the conditional probability distribution of the hidden process with respect to the observed one remains Markovian, which facilitates different processing strategies such as Bayesian restoration. HMF have been recently generalized to ‘‘pairwise’’ Markov fields (PMF), which offer similar processing advantages and superior modeling capabilities. In PMF one directly assumes the Markovianity of the pair (X,Y). Afterwards, ‘‘triplet’’ Markov fields (TMF), in which the distribution of the pair (X,Y) is the marginal distribution of a Markov field (X,U,Y), where U is an auxiliary process, have been proposed and still allow restoration processing. The aim of this paper is to propose a new parameter estimation method adapted to TMF, and to study the corresponding unsupervised image segmentation methods. The latter are validated via experiments and real image processing.
@ 2005 Elsevier Inc. All rights reserved.

Keywords: Hidden Markov fields; Pairwise Markov fields; Triplet Markov fields; Bayesian classification; Mixture estimation; Iterative conditional estimation; Stochastic gradient; Unsupervised image segmentation

1. Introduction
Hidden Markov fields (HMF) are widely used in solving various problems, comprising two stochastic processes X = (Xs)s2S and Y = (Ys)s2S, in which X = x is unobservable and must be estimated from the observed Y = y. This wide use is due to the fact that standard Bayesian restoration methods can be used in spite of the large size of S: see [3,12,19] for seminal papers and [14,33], among others, for general books. The qualifier ‘‘hidden Markov’’ means that the hidden process X has a Markov law. When the distributions p (y|x) of Y conditional on X = x are simple enough, the pair (X,Y) then retains the Markovian structure, and likewise for the distribution p(x|y) of X conditional on Y = y. The Markovianity of p(x|y) is crucial because it allows one to estimate the unobservable X = x from the observed Y = y, even in the case of very rich sets S. However, the simplicity of p (y|x) required in standard HMF to ensure the Markovianity of p(x|y) can pose problems; in particular, such situations occur in textured images segmentation [21]. To remedy this, the use of pairwise Markov fields (PMF), in which one directly assumes the Markovianity of (X,Y), has been discussed in [26]. Both p(y|x) and p(x|y) are then Markovian, the former ensuring possibilities of modeling textures without approximations, and the latter allowing Bayesian processing, similar to those provided by HMF. PMF have then been generalized to ‘‘triplet’’ Markov fields (TMF), in which the distribution of the pair Z = (X,Y) is the marginal distribution of a Markov field T = (X,U,Y), where U = (Us)s2S is an auxiliary random field [27]. Once the space K of possible values of each Us is simple enough, TMF still allow one to estimate the unobservable X = x from the observed Y = y. Given that in TMF T = (X,U,Y) the distribution of Z = (X,Y) is its marginal distribution, the Markovianity of T does not necessarily imply the Markovianity of Z; and thus a TMF model is not necessarily a PMF one. Therefore, TMF are more general than PMF and thus are likely to be able to model more complex situations. Conversely, a PMF model can be seen as a particular TMF model in which X = U. There are some studies concerning triplet Markov chains [18,28], where general ideas somewhat similar to those discussed in the present paper, have been investigated. However, as Markov fields based processing is quite different from the Markov chains based one, we will concentrate here on Markov fields with no further reference to Markov chains.

Data Stat

Free Download Contoh Hitung