From be1747da49c6645f0894813623ee35470d937769 Mon Sep 17 00:00:00 2001 From: xiangyubo Date: Tue, 14 Jul 2020 10:43:15 +0800 Subject: [PATCH 1/3] add Chinese character handwriting dataset --- doc/doc_ch/datasets.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/doc/doc_ch/datasets.md b/doc/doc_ch/datasets.md index 81314a8e..a9779bdd 100644 --- a/doc/doc_ch/datasets.md +++ b/doc/doc_ch/datasets.md @@ -5,6 +5,7 @@ - [中文街景文字识别](#中文街景文字识别) - [中文文档文字识别](#中文文档文字识别) - [ICDAR2019-ArT](#ICDAR2019-ArT) +- [中科院自动化研究所-手写中文数据集](#中科院自动化研究所-手写中文数据集) 除了开源数据,用户还可使用合成工具自行合成,可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)、[SynthText](https://github.com/ankush-me/SynthText)、[SynthText_Chinese_version](https://github.com/JarveeLee/SynthText_Chinese_version)、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。 @@ -57,6 +58,12 @@ https://aistudio.baidu.com/aistudio/datasetdetail/8429 ![](../datasets/ArT.jpg) - **下载地址**:https://ai.baidu.com/broad/download?dataset=art + +#### 6、中科院自动化研究所-手写中文数据集 +- **数据来源**:http://www.nlpr.ia.ac.cn/databases/handwriting/Download.html +- **数据简介**:包含在线和离线两类手写单字数据,包含GB2312-80中的3755个一级汉字,共由720人手写完成。在线部分(HWDB)总共包含约210万个训练样本,53万个测试样本;离线部分(OLHWDB)总共包含约210万个训练样本,53万个测试样本。 +- **下载地址**:http://www.nlpr.ia.ac.cn/databases/handwriting/Download.html + ## 参考文献 **ICDAR 2019-LSVT Challenge** ``` From b4c2abc3bce20cdf92c6879274ec2519c112283d Mon Sep 17 00:00:00 2001 From: xiangyubo Date: Tue, 14 Jul 2020 17:07:48 +0800 Subject: [PATCH 2/3] add Chinese character handwriting dataset --- doc/datasets/CASIA_0.jpg | Bin 0 -> 14296 bytes doc/doc_ch/handwritten_datasets.md | 13 +++++++++++++ 2 files changed, 13 insertions(+) create mode 100644 doc/datasets/CASIA_0.jpg create mode 100644 doc/doc_ch/handwritten_datasets.md diff --git a/doc/datasets/CASIA_0.jpg b/doc/datasets/CASIA_0.jpg new file mode 100644 index 0000000000000000000000000000000000000000..d65924b2e08504bf56cc20d53ee81dd0dd064322 GIT binary patch literal 14296 zcmch;2UJsCw=TRRB}fY$sSyD|stAbE0xC_Usq_*MkR~8akQO@91r!vOB2@(<(h=!J zlwPG10@4H$fe;|&#;?5R`_4W8KhFP;bMH>}7(2<{*=w#f*PPFK=CjE^$V&jrO>I4G z00aU7_LLVuo&~N0RHsh;c~drO%A1CchK8D&h8_&2rDLRLWMrUcU|?cqV_{-uWoBSt z;bLKB=iubzWQ1^^;o>;M#=*()=TAUXlxwJIPSenw=3ruA;`sl(k=p=PI^f=^3sj(U zz$sP`6)T9`1wa7+L_=xqp9cS91D&F@krqsMnx27jLM;n$3PeS9ikj+At0`xPQGN%g zS!vjWq_ul>i z?ht=;{Kqa3K=pUC{<7>p*~Ln+>l8IL6*c&eU7%Bel%0x|nnp;DmQB+H?BK(G?ouQj z$F=l=s`k^u^0zUZ_k2g_xkMCDqS!yC{mrufo?%h{zbyO9u>Y`Y24JKDQ3j8S6@UT6 zJ8u)kf&Ztyq}sFh-O`=+SJesdUBq<_+6_cQ-r+PEpjXl-c9DUWA^e@b%IjA6DpU$p zFRyHjOsxnRh>148&xP|z=?An!Th!NunB_GD?*A88?hV)nt@Y#tMuaK_mT&q}*T*;* z`;IoAlod9L9p^?xTpg>|spy*b|5Z)e_CzAFA!>PGc+j?{8+Fi>1> zX7+KmElT{H#*?IxyNntakO>h>4GWHd84rxM(uq zAW8;CZV!VEv^b1G|8KkQK#dEgxNPktnz%~37s2@bZ=IZmH~Hs%ii+}?b-JN!`@7&R zE-v0=ZaSH-CXbR{oD)G^uQ$nDXA00n= z#HCa3z{n}i-4>?LIVT-firww~zh`djgjwVd0PZ{4CkbRAEDuxY-#TY>@{0__>S+Nw z|Gam~C>ij9?N*aGML9r%e|6Mfo^bb>F^x5owzP)9D*uZj*SzUxPZ1_&A6{Z~<|h%q zME%G>p;ePRApSNHNDF~Byw~>m?s)u{Wlz*mWZ-e@%Xi|Z&izhX`upu&Eb*_5#ucr< zT$(xlhBm1I=B%_H@>)c$Tj~g_AC%pF`ndf7nc3=6#D%EI?xN1QA+MSd@$2GmhdpAj zF?iynn6!*obRmfUb9ir?Xs~)`{=>>jXfK?n1{yzwn^Yo&O&s)Gp0yCl(V~vbxYBDi zi|u*yVRMFI#(y3DH!Ij6$})*#g!earm2j~wr4}yC`Kk|Y6=yzINA#DV zX77W=u)RpmXp9n?`3<(PqwscJs=mipy3`%EeolDHSZ4Ze&hEjR^RqUiUi&8vw?jN$ zE_{Y^$;RNRg`3B%6u0h46Agn? zrUm*>za2?#l7T;SpI7zy0pgE$r07|Yq^Alny>S6*W~hmCHT92A-=EC@f0N%d1fw-L z_mSM|%0H_^iY?qf&Tn$Tx3-sGWEnmT!Kp%T&iyu@AOjRD6|ud)IP*8H;=Z5}-0m#j zS(+MLIr{aa`cHl^9{Xf2D&lMbofYH_y*2|0$x_(x_6~Fl=k*{ZT!x67!&M84d~>>|mO*dQ{4PyKkD~_i4fl zW6;TC{H_T`sa+{dnG|*v>1DQH>1mm9`$Zr9*0)^c328o502z1^3VXX5esOYUu`~GY zxt*)BJB)E=a#0l-+8kc0b$Wo$pcWJ_?pe`Oj8Se!ORgs27p{fT=RV&GXTrq2#7Ztf zWB1cov;!DaDkR3gXP>QNMP7xb0HM}P@LB6_79S3oNA0;rDjI-auE!{W6EnN?JE+bR ztcxjrJ=OFm&_GoM*PxpADp$>8yx9niWG1Nb2DB9Tab_+WX4TbUBO9kvX*nK|0V)mX zmClok1ao|fZweN&tXVtuJV`M%*(M8IhmKHh9*6Rg0Vg;UUKo{Cx-3g)>#pqe9ghoADn#tj@m zqH{0QH_eEKNjRJ}eEGvK?ZGfu@bxx3mLD^ zl7S$|xG~jwSwAacNP&{!v`oTu^i$VoYA4>#eULa~b%U9wWqo!^tRCqOjJ_>k(MGK1 zAQlzdLUpk_A5ct2`kzKsB?9tlGWO>6m95*^QhL39Gh45!HGErBX#n=EB+(E~MEmuc zZAq)9F@;5H+aVuk`Pq4?(F>jL_+^IuM#zAWO$r&<_Q0ks3nHBzQ4ktFk>))PsYh|w z%*&s|L=}ahHZ-8&BkA5S(Yn;b2AG_M1(g@COz(7j}7u32&Y4y3PW?NtX3VQ2T0e7D}$6g0b7i+-^6hNP~ zE^$e-v5%K290L+pNlTQ#jCk?xucQ3a7cOM8DO}Xuj59NpVhyh9)r2bs4WYfeQtoaZ zBjTH-)l%lz%1yG<{8GcQ+kX;e+DT9(@g*wj@!cEuB^+JOLSOhL9>Y46i!*|(b>X$M zYrm=(PlIg>7g{Mv-!+Xf)Mdhd`?3-5ChKfbHS`?IQf@ONeFc_#37JedjSA!a3FEf4 z^T_^mO?rRUEkUB87kNejbevczaQy-!E-zEc?-tL(qGkK%HLOmgSkuS+~zvYrY9t=?iLx1G1zpYnimsP0wvGWQy z?sjq!ZbyJ`UCSJ|7~bxNrQIDKQn&fLQSU>3%bjJqEU`w5RxOE00@M z0QKQWs@90rV%kvh)NjIk3m*!yst zwP^TkUc0iuuSwo(10QU9Gn{8XNkOiU@4>HxD&SL=^Pr*T;{meF7Prplnmzr}EWp6e zujDXfHx4@1%Q;=ij>?M()kNWUyqd2gjo#bZ#CA7cO%)`B;Luc@Ypq+Nt1M?y8%aD@9ycXC76dEe0Ug$tIZ8xL8GC%#* z*U_s>aB1qjF<2h6NV+kgL~4Yr+A8|Nn8?7&tue&bQg^QdEM5=sAcLaNOCJ$EB)>Jr}a>&ve61&q?Fuqvt)rtecWm z$h=lZSiCP@1r_YBglo}$_g>f8#k+X~0m2R0m=7xe&sXzm5Kn|h!p|&fd=u24O69bn z`W=6}7n}^htPXBs=Z!k1;M~pjGP1t8TpyiTN?-P;7VUgXyOY*N4{Wut;DZ-C;gA}S zAfW$aQoglDGsNUuw|cg8JJpr!JTU zjq1+Zo;g)sDimL)021Xe(cu42IrYQKVZ_QqtP(LGo($aRwN(T-vzOIEj4&JDM(RP~TD=oC(l#+&z1CeLD*!M({@bG+NKzKL5{f$sMQ z)WQ7(Gnnwm`SiYRdc2o#5n_X1|y}k@|-qHT&pm zJ_I^mn`9K*TBlZS6}wVLNoAnlH}mG3%hk8SuOiuqWzEhd_#s>uW+20(SmCoAHAF%` zCWg@cRZV`tlvogc0skErWj4N#WM+bN9t+9Kyz>+6)7-9|<)PgD6^RXhk)V&?EuF@{ z$GQGm%kKZ&kkUx^A@m z*vva+aT)nUm30}(vX$+F_r4x|>v`raW7WQJvq9A0$k&yo2Gijtl_hE`e@> zKX=?1ln}2te}!K@nSl^vvdMs!zFoixzw4N5YxPttI&as2Z;mloo=tKj)bp?gdU9Hp zl$aHv>@NN`f<6_|09%0nl(_ki192+(E}t!V%kp5!>28&aY-&FV)L#eZTf39dbw2Qy ztu?#tVIE5g!2t^e%b({xOtIiR;Q@wE+G%`3V!1|au#FA4H9?eEdXScoCvady^&N#R zY%dZ0(YWWtJe_FpE;;Zj&yZkLb2W+H)BCd_=GM1?pCa{8nYm<|JH4yN^uP(MBS9gY zdt`JOwX??cFhNG-s++a_cedaRUq7pf!BCZEYl`JTUaJB{Y;KclegR_NZPpeQp- z@raHYQs&#u;Adttsqe(&X4rf!6FQOuRUVbYxPeF-xzBdk$>rw{VRPpP|f1C zQ?AkZW^@~@=5i)FFR5uQLSdWt_uM*eTXuE@bT8BK$q#D&_bV79lT=#wwe1UNwd?J$ z(~2H%1QIt7ogY-30Uqv-|At`V4hbHwh#BELb9j|DPo&2!7wqMhpCRuSvsh`pFTjNJ zx~N7YRM|=NdR~W3yD@0Xem%~jPr5zAwU@|%qzgWw{Yd9l$kg1-&8Ce<_w-b!sl|D3 zIr_|8?QOL>g5$o8C_-8Ai|vmtdn%0R@zF-nMSp61ZfdDK>vZ(ycylbe2uc|E~BS+#?wW2e_%DIgz0OZ?2p7UXy> z)|>*bbNlmT!r?}P4Me}5a41G5t=Ne1B0jsfUhr1dWO7VJmWj@ltI|(>W^-Trb&@6# ze2YEG$y~}8A%374=aj>8k=Fnn|JvjiBuVzq;?|;sJ;kudI$w~%)^F*MoXi;A z?nN30EyHPHE@Xfw^ahr=P`mu*nj|={Klh4F{Yz#z^?Ski(J#GSWT09$N|m8hBx`}Y ze#%>PUUabF{f+x{Nqr*B2ZQ0Wf+An=YGrQfvLl|AEh93F)i+jdRK{>JD*IlNoKV%o z*58_Jy}!}y!I89TX8}CAB4|_I>VFbF6CpONPRUK!lH~fj7WPr^b5H43d;sc8xU>G} zTD*v#5H6Ic#6%EOOpn`)@RZxXC<0x>kp+~N7C|q(F=7@N5G00!V)ZUA(+(XOWu$f*BR-eM~}ERIfauQRY94X zhRT!kfS)?uNz)K9rV_F5VQyeE6s5jXP2oc%^N73?V}Ij9X!N#QrX77~aY+Mg=l8ET z4{B>3AYYcl9GJLIjLcq{W5a7d%)t9aFZSQP(*BiZP;Uj-9i|DM`-71b2=o9J1_|$j zGZ0v*vyIm;1gM(8j6qF@t(IakF&Ma|K9m`nVH)aZ=IgsqX<0ftTlQ;VR7 zpQRgDRub=9gNl)_a@)n}>ZX%BOBK`2m7+ z;ydPBU594$Z&f`nq;P^Ey3Y%O>UYE(L%SZWBp#P8dnb7|^DPZrI}VAYPT|u5u1Jku zYy$$3iM2y@1R>1Ao3fd?g&Fu(wp>+HmUN7^4cm1@(0)FoWxE1hQ=fX~sfbCLsVP68 zOy4uwY#4xGald?V_G{gwhOmtupSas4Uyk%BFn?k{rZuJFVDVICS#}kem18f7_Nc8* z4BPz>&O?$gz`SbDaboQbP_GWXo;~K-30AX6X7t#gbc7Z%VLb$vonSeZfpdKQkP%RN_K=(I=|%mJZf(qK#a%czW%m;l!9Iu^dn)rgTq4aK$aZJ0|h!+6|H_^x>4=tgIR zdSA2&(h10KT>-E8$RO6cr`f(Q`L`;KwD-PvY?4_2TT?M|b|&Noa>wYou5 zCs}4vn5E^lpSRy{tk-=Oxe3jH(F+sf!o~2Tdxnizc#3qszMyB}lj_>WdBRzH(7pl) zXB1nxZv@6)zc~gs`B;8RSS|~>0Hi!8Wm}vg$vK>=jd0U7os+Cuy3L-J-13S zO}z~WfE7Ty<0e`FCqEf@h-hx@zx&+CBK_^?jxC>~1RLM*@i4fP3{(a5O;PwKh~So= z^{bE%WnQZRX5T8eolj!SS#3jnsd|yQ_NTE!PR``CGgtzm0*ai)C z&wP9d|5k4yLVwZnO*Ad+u-525ylEh))!Z&6R3< zH`W2%2|X?K_(zk$TVP&W$Rzs6e2tX7Jl!Henw#7^l`{Bb@8&@qws5cIPi|-OT!xl)gO?vMa(-H6o*0od?u=d!ThCjpdfJpwii} zI?`MVh)@Ac|2DLE5uAaoA)iI%CkF=ze+;e7d-X_2#%!$RRy9^*NoTaOP}q2Xk%|&qG$QB zNvHabpGC*cFEZtul{iotxC|yFj*)?>5Cjn_;BPue&>#cpR*2(WZ_2V5ozJv|IAA3M zO|a*BCKT9KmPQ=;LpcA%_52I#!yK}#^Q0C+A0%u(;a{O_U~N(t)6A95Sd%vF182c| zpuj!e2`xe`6Sbxx3|~xGY!=|MeeOjWW}>(=1drW~opW`u#W^<;6?t6J>&gUIriNf= z$bct=E>tDQQ&#)As$S)bcHIho1^3>w&T+ol@`=D+S{n3pvl!+&D&@7=A!6fqe~Ri@ ztn7K^I-jTVcIHdo6rzkJTwzmP0An6^-b0ds;YCcKu@EySo#^Xh5Me78Utf%`?wXOr zzv|^qW3$WlP&xmss_)*-?d|R`9x^~Z^$FA68+EX?rK^GE4-B{Ywc?~z<2j?kxYLKr zhqx2V371Gh#5V-}2U}}H-VdsGW;34)`;Y>#T(6dAV~C0OhbzigULZV`2hK(H2t{<3 z9w3JoQ(2|R0G+A>-}4#&zUs!hw5E`$?@)~45VPCyP&C2Q`Q40W*=)t{vbcbSe1#x~ z=uL5|PJhUZI)yA%(HL%4TRPf(WfauE9U{1&aVrB&RcUd!VfUpV2#khoi%Z2nL1%W5 zp!E$W*KUkFw*T7*52nk5yF2&FNGQ9jjAGLIHfQSiHQ&Jy0QRo!KoQ<6cp^i@ z)pK$ui0Vv1;E&4gkDv$qkdBu7cvDQLEq)a}+sYgCQ#7%s1CtbD$&w&L+s~{$CqCb6 z2X4`zjqj;IbS<)~^KQ-j`c)^$p!?GTHNW~@jFql$tJ$gHt5-psq#y3=Sj9Hsbosm% zebLx_HZbT_%fsxZtWW2IitK!<4^5a=}F7C!Lp7S8#aP1|R*8Tk9di{E_KJ$@o* z4|^{)46AaxlL3X!?Xi4&D&ne!&I1-B0-gC&($FGTfYG$%vh!|a=-vC2kPy7QkzuTP zZhr9cx5n*MW);WLFMOK6TV3;JO8AH$jfytby`DIUvEKMv0NPw3e?jCQ5bqaN#n^G!1va{3zl|5~M*=qD&q;)lEe8K}H|WNx6rI`_{aRVtcz zBX1{f)y@aIwOkITZ)Q7m{d!zHTk+*JKmiOU3`*Yg1c}q!pB%v@NabePYu7_>uJ5m_l_!woufOqex~l zD0glcd>TuIbH>LmrE$h$)N3U*gO**TcD$}t!Ghj_SHG%;ESF#*y|SrLCi|NBo^Tf4 z=f$-h`nXcH_;W2K_=olC^4Qc;v|AAwNC@euQ}*XeqA~W}flyJOgez{0 z&35~->qxmtJ-BE(&Vrx^jRTV%aw zlF+5`+YY+Y`}!ul@73C8`reGbUhr~;5vWT_AXenKLU%6x-4XB%ywT_D?BdfS> zq4AT#e8B(R7AlM#U78BJ59KS3SM_u8^I7`YaPhjI2y>UrX(q;(oy?lTB_Mv@6S(Ho zJ_Yb@T%IX;9>QEuI#;<~DZRp+EFb+99YrTxbta#Qm%{8^eFG-xK-dve7Kr^@0b}O! z`DEZLYT5gE3qndrrBr?Y2_J9Y^$o+fE|B6P1PvwIS}u@Aas(5~;Txs>4Yi{{Lo?TJ zhgIj5bLg_KKfl12m69DWvN5{>mh`(RbP3j=g43 zh{hzUmRwUe)?8~)0Cnn|=!M?H4|1wQ@ps+0*c(oV`SINQkhUubyXlpF!V|Q$Oj|Eg zk7ek&q~#}MfBVNvDEOlqfJ1(&*_Ij8tx28yHY2I;+QTMpEK!C#Ri2*eO0}xyjnl_F zWBl5hdY{yuDW0gT5RzS6sQ+eU&*GEwlMLK5iVO}IiabE@erfh`53X(w4(+D(oX|Up zpcEXBcqth;Dk^AZSr!r$(w?Q&GE!kva-iY+y+IuM5?P|oi`SlQYdJMsBWV#M_hUwf z)r7cI=tD51+*n#4mxxb4xq{z4fi27ADtgn+>k8Ml)Eo1P4$9}ouX}7kse{OXCO+xy z_4n(%$;z=siRHH*Fq{vj70{#zHIP_7RB1d(BFHO@z02%Eh_kFP>@i=}ogY&jBq=-u zdmX2XRiXIL>cnI{YGm&_hb!FO3niKieT6hu83wOExmxMbvlv48tB)BJTIZcd4~d#I zEy#<1>CBn^#3`wRBurg6Ys;1{Czv`%^~>W&sN*F=RdsArAD_Fp>9ZBqRsG>%@SQ)Z z2aDXxI@uk(qgt}*!d5rAY+M{>9{Oxs2V^b}sIQq}dfquYU#V?wtP65@O$NRzISX00 zUenBnyhPX=(Ao-L!zPb@bw1G=2RXF^v6gaaCzPd@O88<iQL` zht!uX#85}Cr#_edI>LIuF?R)_$|IP*J*xTdxT2sPuvnU`2>*eId0OZ#VoIo|BNovy zZ24mzpRN7oZR*+`>LzUv&G*bhTKUNc^?5kgj~@hD&qt#=IfJv;U9|mYrgamSqLY2) z2-R?DM7v6jrfhPv^M|jN#fg)Wrr#Xn!shbe&wSuIq!D8+6mt4|_oFO5eiO}-Bi6#u zLa)2fULM8eR*XS~V2BY_P_=Q+1x_#I+m!S7X`FtKL<+Y)grGE_j<@e+CH^g2O8##= z8+B?`q5MTtlZ>t=TJoK(^%x~8F7~vo0d+w`b?)6dusP_i>>dXQZKsAerBurk$pGW-zJZ5rl40xT zYZ{+uZeP-w7xC2~dPL%7PXvN=mN?(^d|sA}nX09mxY_qGqR31BP5yyMF6sF$f>m8) zy_K;>Pf@^b)lt;?YyBXv3so!>ZO4F`cj;=aQA%0WvzZBUA35hXae7uT@T19(s;@j- zvH1`Q;#&n3N(DW~pf_>nteo5BKy3ygwwT+2zCLq~$y3Qe@-|qbUoV$E!M$8{8FN8! z%f|&d^G4rLPG{fkpJhxd4tsNQp8>{55E_%Vj}(qmDb~dg4ueZWC(XiTPjNDZRhlvmwdT>j5RmQFX{+B1HHlO;9ldy(0P%{vsPPtGKy-V3n~b-dGB%c zZmuK)>2&LZ^ib)kA~}v+gIn28)0!_Qo6BL%aZ(uB4rm-j)S$pVH-B_F`N8&2>TG6@Z@gH*cpwa1d$w^QCo_)P9`ZcokW%?tFB%YfSSSVrXKK&~n|I{ZU;E(( zXWjqsG}m-*x}taH+{vbCn=yig4q4gkjScL8L;U-81HO1oP6)L*LB9MBX8Mph5ULjT z!RW*|)H81HR^$Oqzj5VPzP{=hy|_TJvi1QWtPzueGcfA(wKE_!EidcAlSBKv^44s> z_%49rR6@ec#GmSL1jlx~7jH8bwFfFoBIqfBUEP{k*dn?RDuG9$l8Co2c&Q2?*k)~n zm)e_e&DJ}i4C!CzVUVR#V)e~c^|=#;8Iw04?An^Qi=0?fWYz;XF_ zW~)t^IqP?t;*QJb8>&$wz4cJ|f*`4OXAz{}JRHu9H=MZlTkm74nO&ZIO7GDy?QR#G zt8>Zd)Yih;*Q8y;>I}4CIrl0wp$(Y=TgnaRX`MVUu4dVd3tj=J!(^<+RBduy-ac|$S?mnBSL8YD{diXTG= z?8u%`%TTNF{V>xfO2-eLT7K2-S-*ehV_tx2@V4USVvX9tZjWq=vU!Zcse0Hnqy4k! zt~<(6gF0dPY57CNk6U{eBaSJt5Q#VHAwkVURK(HT)ibnOaj0u?;p*%6n6C;L734#P zk0l34V1nmJixS>6z(7HA;aRVYdo`Mk;^`&sC*#uvBW!{1}8(gu8lrn(Z3k`D*ZZhN3Qov*h?~SIy{<^1HVur#wj#Kp#($84kOkTOA$*p2($)w z`2*Bg#ktzL1cxY5`tqXz7?Mqapd``N(N-#3Tq{BayW1Rz%NPZQ0x42v9>M-rc?k!83twr$LsGy zkC`?W@!U}SF!(Aq38kHmjc+?VX0Gt6+~rfO=uvk)KXdKl20oR7x+xSC4Z#j;{woE2 zI+PDRYoHZ+L28F0ggmD-bo}FG1P!waKTFU+cqE0dx1|8EVGlzx&{jqUdYTW>Vx7D3 zdHkU;!R%!Lvtb3sA{0B_n3aLW8XZvY82?4%Ho59b37VFJO6RMJ0YPrTr|;Ri3M={zxqhB&9{A+e081 z`50Bm!RM&|>do!N{->an!L};3#@Wn4oigw!FNBaPc%HI+_((w$FSbf5#Zdd%=Dcwg zYOCE9t6jN9FMXW+s2buf&O#p239GsYfHh~0K_=f1-~LUxm66|+b%j&p`Q(*}PlIa! zqhLvdx<5q*p(*=B_3;uCu;vyv{%rpFN9j3rq8d#YpT{YySuI9)t}El`%J>0VnxM&g zhXQv5-~O9m+ne*@-5|L+hFH0<(PF(p&{_fVj-c#t6-9ybmnQ*JMQV~_BsLdHX$8BHn4)#-$8Lc20F;t_SRAOn79JEDipDQcAeG-tZ7GS#J9?Ca@2;4=$f zbD}f9GK@4XiJ!VDd4XVE(9>;g zIW`eb%~wPnUXzE(UWf1VjkRKAVAy&GL>&OViANaWL0I`EqbG`zd2tj*p|a-4bi#61$Si2D(e!Ek!cZv>CAM|=EPNzf zr^wmxhCEJ_0pFSA?(x00nUlD92E3D*8zcB;3q=%eTZD#Mh-BO0U%EJRR7A3GRxy(7kgaH2iLW2*H*YpgP7b0(DK)vD=2 zt5fWYhF;80k4LAf2_;oss(#YP9Z%p>9egDxz1=2MJ+)$|Q#FLLi!C61U8LQg=CU4e za?g0{!%^5}!|EE^{F85$H+@D<(j6TfRb zdANa|yO-s&ZkGbNM)n^2w2NIWd^HA7-J9-8jy(uu6(=$|2ld*G9vIMCdr03L+*$n@ zQEqG_+}QQi5V*Q@in5~J@+1S}I;3@ysi+Q^6wU)NInw z+0y&p+!eq;W}CLRTe#iI5}8j9j*{$Uz?>7LuWSRd?P@B6FYinGD8IY&nrSV~8{Shu zENxaPqDU1@$~Ll;D_^OX(#l91&-Z3IpM83DQ$tZ7XE3(iA`p7P7G^OnC561-oVEHk zW0mvcwaBN|(Z5fO!OkaoZrN6?Z4?>VtmUyc9RM1*HnQ_w(n$3Tx4!uw=%=i{Rja8VvU~ zH})MHY0oyV(~M1WJ90%XJ+n>X>hb-SD$G=TI4}O1Xqs+PCrM*{T=gGl+1tR)_cZsUPV4y@snOdVi?Wt;yl{t|EiAj#Tf1z4wBpRf^pI7WSx?khAKOhr0O9tlv_f zTz_wzCrw-L93ijlMI4;ulg_nl;SnK%g-dYext@aCw(9dyk;(TJ(U?l3WpH#jf2cuI zRU?*W@KvCX6Ox}TY255?sFxjIK^x&i#aJz6fwOU$Ytk&I@+D0Lr2uatJ?dbp4e(Rs zA$vwCaN0F|C%ToP&K2<`^nPrW|G2g_iV;ru1%r=FS8=5J+(7bgx+IQMrGR8<_82j~xV$Mda| z_!|0e_sp$8w=C>f^EwuoFvQ$+$8xvt-d>bsl2)^F2n(64OH2A!tkGkqfVV9teHWCf zo>INw6DX4)RrOOr%zL3dRdWoi=J5*>Pg1PGl7Tqs+7OMAcxnFht9*q4Di-|g8|j;L zYMb>5JF3EHCN8x8grb`A^56q6c-`5vQFLeHzMhS=lSUB%0xLE&&<;9@p4=C>hGN^C zmr4J2VbE}Zc$b-PJo$jans)vxdi+Te=otxu8S30F7v`bBd{IDX-6OBDs_yYzZSyU? zP2)d4L(HSBcGZU|V(=;*To{sYa-9r>cT>29{|i6RwmfHi*m1+POpW6n@=pPbX*52v z6KG3~p?t6xP@fVb@$!XzVjPtyPL(e}BTX3tgQrw{r!L89aexN<93I$Ydb3ir!$toGyW`)x zdUmYPq9W>Tj9{XiSs6uaM!_p`2i6w%U-Jm141Ew+)IU+5B83%@f!2GJUi*(`75)!> gpYMiigs%V1Hz{4v>RxbGBcIYYoQfRAVDj|;0yC{gZvX%Q literal 0 HcmV?d00001 diff --git a/doc/doc_ch/handwritten_datasets.md b/doc/doc_ch/handwritten_datasets.md new file mode 100644 index 00000000..abf7a48f --- /dev/null +++ b/doc/doc_ch/handwritten_datasets.md @@ -0,0 +1,13 @@ +## 手写中文OCR数据集 +这里整理了常用手写中文数据集,持续更新中,欢迎各位小伙伴贡献数据集~ +- [中科院自动化研究所-手写中文数据集](#中科院自动化研究所-手写中文数据集) + + +#### 6、中科院自动化研究所-手写中文数据集 +- **数据来源**:http://www.nlpr.ia.ac.cn/databases/handwriting/Download.html +- **数据简介**:包含在线和离线两类手写单字数据,包含GB2312-80中的3755个一级汉字,共由720人手写完成。在线部分(HWDB)总共包含约210万个训练样本,53万个测试样本;离线部分(OLHWDB)总共包含约210万个训练样本,53万个测试样本。 + ![](../datasets/CASIA_0.jpg) + (a) 五张单字图片样例 +- **下载地址**:http://www.nlpr.ia.ac.cn/databases/handwriting/Download.html +- **使用建议**:数据为单字,白色背景,可以大量合成文字行进行训练。白色背景可以处理成透明状态,方便添加各种背景。对于需要语义的情况,建议从真实语料出发,抽取单字组成文字行 + From adb79f9c9fab9da3fc1d51640d5ca837a113766e Mon Sep 17 00:00:00 2001 From: xiangyubo Date: Tue, 14 Jul 2020 17:10:41 +0800 Subject: [PATCH 3/3] add Chinese character handwriting dataset --- doc/doc_ch/datasets.md | 7 ------- doc/doc_ch/handwritten_datasets.md | 4 ++-- 2 files changed, 2 insertions(+), 9 deletions(-) diff --git a/doc/doc_ch/datasets.md b/doc/doc_ch/datasets.md index a9779bdd..81314a8e 100644 --- a/doc/doc_ch/datasets.md +++ b/doc/doc_ch/datasets.md @@ -5,7 +5,6 @@ - [中文街景文字识别](#中文街景文字识别) - [中文文档文字识别](#中文文档文字识别) - [ICDAR2019-ArT](#ICDAR2019-ArT) -- [中科院自动化研究所-手写中文数据集](#中科院自动化研究所-手写中文数据集) 除了开源数据,用户还可使用合成工具自行合成,可参考的合成工具包括[text_renderer](https://github.com/Sanster/text_renderer)、[SynthText](https://github.com/ankush-me/SynthText)、[SynthText_Chinese_version](https://github.com/JarveeLee/SynthText_Chinese_version)、[TextRecognitionDataGenerator](https://github.com/Belval/TextRecognitionDataGenerator)等。 @@ -58,12 +57,6 @@ https://aistudio.baidu.com/aistudio/datasetdetail/8429 ![](../datasets/ArT.jpg) - **下载地址**:https://ai.baidu.com/broad/download?dataset=art - -#### 6、中科院自动化研究所-手写中文数据集 -- **数据来源**:http://www.nlpr.ia.ac.cn/databases/handwriting/Download.html -- **数据简介**:包含在线和离线两类手写单字数据,包含GB2312-80中的3755个一级汉字,共由720人手写完成。在线部分(HWDB)总共包含约210万个训练样本,53万个测试样本;离线部分(OLHWDB)总共包含约210万个训练样本,53万个测试样本。 -- **下载地址**:http://www.nlpr.ia.ac.cn/databases/handwriting/Download.html - ## 参考文献 **ICDAR 2019-LSVT Challenge** ``` diff --git a/doc/doc_ch/handwritten_datasets.md b/doc/doc_ch/handwritten_datasets.md index abf7a48f..7b4be96c 100644 --- a/doc/doc_ch/handwritten_datasets.md +++ b/doc/doc_ch/handwritten_datasets.md @@ -6,8 +6,8 @@ #### 6、中科院自动化研究所-手写中文数据集 - **数据来源**:http://www.nlpr.ia.ac.cn/databases/handwriting/Download.html - **数据简介**:包含在线和离线两类手写单字数据,包含GB2312-80中的3755个一级汉字,共由720人手写完成。在线部分(HWDB)总共包含约210万个训练样本,53万个测试样本;离线部分(OLHWDB)总共包含约210万个训练样本,53万个测试样本。 - ![](../datasets/CASIA_0.jpg) - (a) 五张单字图片样例 + ![](../datasets/CASIA_0.jpg) + (a) 五张单字图片样例 - **下载地址**:http://www.nlpr.ia.ac.cn/databases/handwriting/Download.html - **使用建议**:数据为单字,白色背景,可以大量合成文字行进行训练。白色背景可以处理成透明状态,方便添加各种背景。对于需要语义的情况,建议从真实语料出发,抽取单字组成文字行